matlab - MATLAB中不等长元胞数组的Strcmp

Question

有没有一种简单的方法可以在较大的字符串中找到较小的字符串单元数组？我有两个列表，一个包含独特元素，一个包含重复元素。我想在较大的数组中找到较小数组的特定模式的全部出现。我知道 strcmp 将比较两个单元格数组，但前提是它们的长度相等。我的第一个想法是使用循环逐步遍历较大数组的子集，但必须有更好的解决方案。

例如，在下面：

smallcellarray={'string1',...
                'string2',...
                'string3'};
largecellarray={'string1',...
                'string2',...
                'string3',...
                'string1',...
                'string2',...
                'string1',...
                'string2',...
                'string3'};

index=myfunction(largecellarray,smallcellarray)

会回来

index=[1 1 1 0 0 1 1 1]

score 9 · Accepted Answer

您实际上可以使用函数ISMEMBER获取largecellarray较小数组中单元格出现位置的索引向量smallcellarray，然后使用函数STRFIND（适用于字符串和数值数组）在较大数组中查找较小数组的起始索引：

>> nSmall = numel(smallcellarray);
>> [~, matchIndex] = ismember(largecellarray,...  %# Find the index of the 
                                smallcellarray);    %#   smallcellarray entry
                                                    %#   that each entry of
                                                    %#   largecellarray matches
>> startIndices = strfind(matchIndex,1:nSmall)  %# Starting indices where the
                                                %#   vector [1 2 3] occurs in
startIndices =                                  %#   matchIndex

     1     6

然后是index从这些起始索引构建向量的问题。这是创建此向量的一种方法：

>> nLarge = numel(largecellarray);
>> endIndices = startIndices+nSmall;  %# Get the indices immediately after
                                      %#   where the vector [1 2 3] ends
>> index = zeros(1,nLarge);           %# Initialize index to zero
>> index(startIndices) = 1;           %# Mark the start index with a 1
>> index(endIndices) = -1;            %# Mark one index after the end with a -1
>> index = cumsum(index(1:nLarge))    %# Take the cumulative sum, removing any
                                      %#   extra entry in index that may occur
index =

     1     1     1     0     0     1     1     1

Amro给出了使用BSXFUN函数创建它的另一种方法。另一种创建它的方法是：

index = cumsum([startIndices; ones(nSmall-1,numel(startIndices))]);
index = ismember(1:numel(largecellarray),index);

score 5 · Accepted Answer

这是我的版本（基于@yuk 和@gnovice 的答案）：

g = grp2idx([S L])';
idx = strfind(g(numel(S)+1:end),g(1:numel(S)));
idx = bsxfun(@plus,idx',0:numel(S)-1);

index = zeros(size(L));
index(idx(:)) = 1;

score 1 · Accepted Answer

在@gnovice 答案中，第一部分可以是

l = grp2idx(largecellarray)';
s = grp2idx(smallcellarray)';
startIndices = strfind(l,s);

score 0 · Accepted Answer

我得到了以下解决方案，但我仍然想知道是否有更好的方法来做到这一点：

function [output]=cellstrcmpi(largecell,smallcell)
output=zeros(size(largecell));
idx=1;
while idx<=length(largecell)-length(smallcell)+1
    if sum(strcmpi(largecell(idx:idx+length(smallcell)-1),smallcell))==length(smallcell)
       output(idx:idx+length(smallcell)-1)=1;
       idx=idx+length(smallcell);       
    else
        idx=idx+1;
    end
end

（我知道，我知道，没有错误检查 - 我是一个可怕的人。）

matlab - MATLAB中不等长元胞数组的Strcmp

4 回答 4

Related

Reference