matlab - 向量化这个 strfind 循环

Question

我正在寻找矢量化这个循环：

needle = [1 2 3];

haystack  = [0 0 1 2 3 0 1 2 3;
             0 1 2 3 0 1 2 3 0;
             0 0 0 1 2 3 0 0 0];

for ii = 1:3

    indices{ii} = strfind (haystack(ii,:), needle);

end

indices{:}

indices然后包含needle每行中的起始位置（每行haystack可能有不同的次数）：

3 7
2 6
4

任何命令都可以，不必是strfind，只要它是矢量化的。

score 0 · Accepted Answer

如果您不想使用 for 循环，您可以执行以下操作：

 result = cellfun(@(row) strfind(row, needle), num2cell(haystack, 2), 'UniformOutput', 0);

score 0 · Accepted Answer

如果您可以接受不同格式的结果（更适合矢量化）：

[m n] = size(haystack);
haystackLin = haystack.';
haystackLin = haystackLin(:).'; %// linearize haystack row-wise
ind = strfind(haystackLin,needle); %// find matches
[jj ii] = ind2sub([n m],ind); %// convert to row and column
valid = jj<=n-numel(needle)+1; %// remove false matches (spanning several rows)
result = [ii(valid).' jj(valid).'];

结果格式为

score 0 · Accepted Answer

如果您可以按照 Luis 的建议在不同的向量中找到具有相应行号的列号，您也可以使用它 -

%// Main portion
haystack_t = haystack';
num1 = strfind(num2str(haystack_t(:))',num2str(needle(:))');
col = rem(num1,size(haystack,2));
ind = floor(num1/size(haystack,2))+1;

%// We need to remove indices that get into account because of concatenation of all the numbers into one big string
rm_ind = col> (size(haystack,2) - numel(needle))+1;
col(rm_ind)=[];
ind(rm_ind)=[];

运行各种针输入 -

RUN1 (Original values):
needle =
     1     2     3
haystack =
     0     0     1     2     3     0     1     2     3
     0     1     2     3     0     1     2     3     0
     0     0     0     1     2     3     0     0     0
col =
     3     7     2     6     4
ind =
     1     1     2     2     3

RUN2 :
needle =
     1     2     3     0     1
haystack =
     0     0     1     2     3     0     1     2     3
     0     1     2     3     0     1     2     3     0
     0     0     0     1     2     3     0     0     0
col =
     3     2
ind =
     1     2

score 0 · Accepted Answer

可以连接整个haystack变量，然后needle在其中找到如下：

totalWhiteSpaces=isspace(haystack); %finds white space locations
totalWhiteSpaces=sum(totalWhiteSpaces(1,:),2); %Assumes that "haystack" has equal number 
                                     %of characters (including whitespaces) in each row.

realColumns=size(haystack,2)-totalWhiteSpaces; %gets how many characters are 
                                               %there in a row excluding whitespaces
needle(needle==' ')='';
haystack1=haystack';
haystack2=(haystack1(:))';
haystack2(haystack2==' ')='';  %removes whitespace
result=strfind(haystack2,needle);  %find the pattern
rowsOfResult=uint32(result/realColumns)+1; %necessary since we had concatenated the array.
                                          %It is kind of reshaping operation.
resultValue=mod(result,realColumns);

我想你可以从这里形成你的最终矩阵。

haystack计时结果：当您变大时，您可以看到此代码的优势。根据我的实验，300000x9您的代码大约需要 0.38 秒。我的代码大约需要 0.23 秒，使用的代码cellfun需要 2.23 秒。我想那是因为num2cell手术。此外，在内部cellfun使用 a for-loop，因此它不是真正矢量化的。

matlab - 向量化这个 strfind 循环

4 回答 4

Related

Reference