matlab - matlab中的就地快速排序

Question

我在 matlab 中编写了一个小型快速排序实现来对一些自定义数据进行排序。因为我正在对一个单元格数组进行排序，并且我需要排序顺序的索引并且不想重组单元格数组本身，所以我需要我自己的实现（也许有一个可用的，但我没有找到它） .

我当前的实现通过划分为一个left和right数组然后将这些数组传递给递归调用来工作。因为我不知道 and 的大小，left我right只是将它们放在一个循环中，我知道这在 matlab 中非常慢。

我知道您可以进行就地快速排序，但有人警告我永远不要修改传递给函数的变量的内容，因为引用调用的实现方式并不像在 matlab 中所期望的那样（或者我被告知）。它是否正确？就地快速排序会在 matlab 中按预期工作，还是我需要处理一些事情？对于实施这种事情，您还有什么其他提示？

score 4 · Accepted Answer

在这篇文章中，我只解释了 MATLAB 函数调用约定，而不是讨论快速排序算法的实现。

调用函数时，MATLAB按值传递内置数据类型，对此类参数所做的任何更改在函数外部均不可见。

function y = myFunc(x)
    x = x .* 2;         %# pass-by-value, changes only visible inside function
    y = x;
end

这对于大数据可能效率低下，特别是如果它们没有在函数内部进行修改。因此 MATLAB 内部实现了写时复制机制：例如，当一个向量被复制时，只复制一些元数据，而数据本身在向量的两个副本之间共享。只有当其中一个被修改时，数据才会真正被复制。

function y = myFunc(x)
    %# x was never changed, thus passed-by-reference avoiding making a copy
    y = x .* 2;
end

请注意，对于单元格数组和结构，只有修改的单元格/字段是按值传递的（这是因为单元格/字段在内部是单独存储的），这使得复制此类数据结构的效率更高。有关更多信息，请阅读此博客文章。

此外，R2007 及更高版本（我认为）检测数据的就地操作并优化此类情况。

function x = myFunc(x)
    x = x.*2;
end

显然，在调用此类函数时，LHS 必须与 RHS ( x = myFunc(x);) 相同。此外，为了利用这种优化，就地函数必须从另一个函数内部调用。

在 MEX 函数中，虽然可以在不复制的情况下更改输入变量，但它不受官方支持，并且可能会产生意想不到的结果......

对于用户定义类型(OOP)，MATLAB 引入了值对象与支持引用语义的句柄对象的概念。

score 4 · Accepted Answer

由于与 Matlab 的内置函数相比，M 级操作的开销，在用户 M 代码中对复杂数据实施排序可能会在性能方面有所损失。尝试根据 Matlab 现有的矢量化函数重新构建操作。

根据您的评论，听起来您正在对单元格结构内的单值键进行排序。通过将排序键提取到原始数字数组并在其上调用内置函数，您可能可以获得很好的加速sort。

%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(@(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(@(s)s.foo, sortedC)  %// for human readability of results

如果要对多个字段值进行排序，请从 n 个单元格中提取所有 m 个键值到一个 n×m 数组中，列按优先级降序排列，然后sortrows在其上使用。

%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
    keyCol = keyCols{i};
    key(:,i) = cellfun(@(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(@(s)s.foo, sortedC)

Matlab 中性能的关键之一是将数据放在原始数组中，并对这些原始数组使用矢量化操作。Matlab 代码看起来像 C++ STL 代码，带有算法和对比较函数等的引用，通常会很慢；即使您的代码在 O(n) 复杂度方面很好，用户级 M 代码操作的固定成本，尤其是在非原始代码上，也可能是一个杀手。

Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.

matlab - matlab中的就地快速排序

2 回答 2

Related

Reference