我的官方回答:
A4 = A(:,4);
R = unique(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(R)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
result = [means(inds) R(inds)];
这是因为以下原因。以下是我们以分析形式提出的所有替代方案:
%# sample data
A = [
1 2 3 4 5
3 7 5 3 2
1 3 2 5 3
4 5 7 5 8
2 4 7 4 4];
%# accumarray
%# works only on positive integers in A(:,4)
tic
for ii = 1:1e4
means = accumarray( A(:,4) ,A(:,2),[],@mean);
count = accumarray( A(:,4) ,ones(size(A(:,4))));
filtered = means(count>1);
end
toc
%# arrayfun
%# works only on integers in A(:,4)
tic
for ii = 1:1e4
B = arrayfun(@(x) A(A(:,4)==x, 2), min(A(:,4)):max(A(:,4)), 'uniformoutput', false);
filtered = cellfun(@mean, B(cellfun(@(x) numel(x)>1, B)) );
end
toc
%# ordinary loop
%# works only on integers in A(:,4)
tic
for ii = 1:1e4
A4 = A(:,4);
R = min(A4):max(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(R)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
filtered = means(inds);
end
toc
结果:
Elapsed time is 1.238352 seconds. %# (accumarray)
Elapsed time is 7.208585 seconds. %# (arrayfun + cellfun)
Elapsed time is 0.225792 seconds. %# (for loop)
普通的循环显然是要走的路。
注意mean
内部循环中没有。这是因为mean
不是 Matlab 内置函数(至少在 R2010 上),因此在循环中使用它会使循环不适合 JIT 编译,这会使其速度减慢 10 倍以上。使用上面的形式会加速循环几乎是accumarray
求解速度的 5.5 倍。
根据您的评论判断,将循环更改为处理所有条目A(:,4)
(不仅仅是整数)几乎是微不足道的:
A4 = A(:,4);
R = unique(A4);
means = zeros(size(R));
inds = false(size(R));
for jj = 1:numel(A4)
I = A4==R(jj);
sumI = sum(I);
inds(jj) = sumI>1;
means(jj) = sum(A(I,2))/sumI;
end
filtered = means(inds);
我将复制粘贴到顶部作为我的官方答案:)