search - 如何在 Octave 的单元格数组（字符串）中对元素进行排序和高效查找？

Question

是否有内置功能？

score 43 · Accepted Answer

GNU Octave 以线性时间搜索字符串元胞数组`O(n)`：

另一个答案cellidx是八度折旧的，它仍然可以运行，但他们说要ismember改用，如下所示：

%linear time string index search.
a = ["hello"; "unsorted"; "world"; "moobar"]
b = cellstr(a)
%b =
%{
%  [1,1] = hello
%  [2,1] = unsorted
%  [3,1] = world
%  [4,1] = moobar
%}
find(ismember(b, 'world'))   %returns 3

'world' 在索引槽 3 中。这是一个昂贵的线性时间 O(n) 操作，因为无论是否找到它，它都必须遍历所有元素。

要实现对数时间 O(log n) 解决方案，那么您的列表需要预先排序，然后您可以使用二进制搜索：

如果您的单元格数组已经排序，您可以做`O(log-n)`最坏的情况：

function i = binsearch(array, val, low, high)
  %binary search algorithm for numerics, Usage:
  %myarray = [ 30, 40, 50.15 ];        %already sorted list
  %binsearch(myarray, 30,    1, 3)     %item 30 is in slot 1
  if ( high < low )
    i = 0;
  else
    mid = floor((low + high) / 2);
    if ( array(mid) > val )
      i = binsearch(array, val, low, mid-1);
    elseif ( array(mid) < val )
      i = binsearch(array, val, mid+1, high);
    else
      i = mid;
    endif
  endif
endfunction

function i = binsearch_str(array, val, low, high)
  % binary search for strings, usage:
  %myarray2 = [ "abc"; "def"; "ghi"];       #already sorted list
  %binsearch_str(myarray2, "abc", 1, 3)     #item abc is in slot 1
  if ( high < low )
    i = 0;
  else
    mid = floor((low + high) / 2);
    if ( mystrcmp(array(mid, [1:end]), val) == 1 )
      i = binsearch(array, val, low, mid-1);
    elseif ( mystrcmp(array(mid, [1:end]), val) == -1 )
      i = binsearch_str(array, val, mid+1, high);
    else
      i = mid;
    endif
  endif
endfunction
function ret = mystrcmp(a, b)
    %this function is just an octave string compare, it's exactly like 
    %strcmp(str1,str2) in C, or java.lang.String.compareTo(...) in Java.
    %returns 1 if string a > b
    %returns 0 if string a == b
    %return -1 if string a < b
    letters_gt = gt(a, b);      %list of boolean a > b
    letters_lt = lt(a, b);      %list of boolean a < b
    ret = 0;
    %octave makes us roll our own string compare because 
    %strings are arrays of numerics
    len = length(letters_gt);
    for i = 1:len
        if letters_gt(i) > letters_lt(i)
            ret = 1;
            return
        elseif letters_gt(i) < letters_lt(i)
            ret = -1;
            return
        endif
    end;
endfunction

%Assuming that myarray is already sorted, (it must be for binary 
%search to finish in logarithmic time `O(log-n))` worst case, then do

myarray = [ 30, 40, 50.15 ];        %already sorted list
binsearch(myarray, 30,    1, 3)     %item 30 is in slot 1
binsearch(myarray, 40,    1, 3)     %item 40 is in slot 2
binsearch(myarray, 50,    1, 3)     %50 does not exist so return 0
binsearch(myarray, 50.15, 1, 3)     %50.15 is in slot 3

%same but for strings:
myarray2 = [ "abc"; "def"; "ghi"];       %already sorted list
binsearch_str(myarray2, "abc", 1, 3)     %item abc is in slot 1
binsearch_str(myarray2, "def", 1, 3)     %item def is in slot 2
binsearch_str(myarray2, "zzz", 1, 3)     %zzz does not exist so return 0
binsearch_str(myarray2, "ghi", 1, 3)     %item ghi is in slot 3

如果尚未对数组进行排序，请执行以下操作：

排序的复杂性取决于您拥有的数据类型以及 GNU 八度语言编写者选择的任何排序算法，它介于O(n*log(n))和之间O(n*n)。

myarray = [ 9, 40, -3, 3.14, 20 ];        %not sorted list 
myarray = sort(myarray) 

myarray2 = [ "the"; "cat"; "sat"; "on"; "the"; "mat"];       %not sorted list 
myarray2 = sortrows(myarray2)

不要用这个 dob 对胶带感到害羞，它是唯一将单元固定在一起的东西。

score 11 · Accepted Answer

是的，请检查：http ://www.obihiro.ac.jp/~suzukim/masuda/octave/html3/octave_36.html#SEC75

a = ["hello"; "world"];
c = cellstr (a)
     ⇒ c =
         {
           [1,1] = hello
           [2,1] = world
         }
>>> cellidx(c, 'hello')
ans =  1

>>> cellidx(c, 'world')
ans =  2

score 3 · Accepted Answer

cellidx 解决方案不符合 OP 的效率要求，已被弃用（如中所述help cellidx）。

Håvard Geithus 在评论中建议对已排序的字符串元胞数组使用 lookup() 函数，这比 cellidx 效率高得多。虽然它仍然是二进制搜索，但大多数现代语言（甚至许多 20 年前的语言）都让我们可以轻松访问关联数组，这将是一种更好的方法。

虽然 Octave 显然没有关联的数组，但这实际上是解释器用于 ocatve 变量（包括结构）的内容，因此您可以使用它，如下所述：http: //math-blog.com/2011/05/ 09/associative-arrays-and-cellular-automata-in-octave/

Built-in Function: struct ("field", value, "field", value,...)
Built-in Function: isstruct (expr)
Built-in Function: rmfield (s, f)
Function File: [k1,..., v1] = setfield (s, k1, v1,...)
Function File: [t, p] = orderfields (s1, s2)
Built-in Function: fieldnames (struct)
Built-in Function: isfield (expr, name)
Function File: [v1,...] = getfield (s, key,...)
Function File: substruct (type, subs,...)

将 Matlab 转换为 Octave 是否有 container.Map 等价物？建议使用 javaObject("java.util.Hashtable")。这会带来一些设置开销，但如果您经常使用它，那将是性能上的胜利。链接一些用 C 或 C++ 编写的库甚至可能是可行的？请考虑这是否是一个可维护的选项。

警告：我对 Octave 比较陌生，并且在我自己研究它时写了这篇文章（这就是我在这里结束的方式）。我还没有对这些技术的效率进行测试，虽然我对底层算法有相当了解，但我可能对 Octave 中的实际效率做出了不合理的假设。

search - 如何在 Octave 的单元格数组（字符串）中对元素进行排序和高效查找？

3 回答 3

GNU Octave 以线性时间搜索字符串元胞数组O(n)：

如果您的单元格数组已经排序，您可以做O(log-n)最坏的情况：

如果尚未对数组进行排序，请执行以下操作：

Related

Reference

GNU Octave 以线性时间搜索字符串元胞数组`O(n)`：

如果您的单元格数组已经排序，您可以做`O(log-n)`最坏的情况：