我想我实际上会通过取消矢量化来解决这个问题。也就是说,删除所有高级调用和昂贵的操作,并将其剥离到基本要素,仅使用预定义的数组和简单的操作。
算法的核心是:
确定权重的总和
选择 n 个介于 0 和权重之和之间的随机数,对它们进行排序。
手动实现一个 cumsum 循环。但是,不是存储所有累积和,而是存储累积和从小于当前随机数跳转到大于当前随机数的索引。
在代码中(带有一点计时装置),看起来像这样:
tic
for ixTiming = 1:1000
M = abs(randn(50));
M_size = size(M, 2);
n = 8;
total = sum(M(:));
randIndexes = sort(rand(n,1) * total);
list = zeros(n,1);
ixM = 1;
ixNextList = 1;
curSum = 0;
while ixNextList<=n && ixM<numel(M)
while curSum<randIndexes(ixNextList) && ixM<=numel(M)
curSum = curSum+M(ixM);
ixM = ixM + 1;
end
list(ixNextList) = ixM;
ixNextList = ixNextList+1;
end
[i_list, j_list] = ind2sub(size(M),list);
end
toc; %0.216 sec. on my computer
将此与原始问题中的代码进行比较:
tic
for ixTiming = 1:1000
M = abs(randn(50));
M_size = size(M, 2);
n = 8;
for m = 1:M_size
xMean(m) = mean(M(:, m));
end
[~, j_list] = histc(rand(n, 1), cumsum([0; xMean'./sum(xMean)']));
for c = 1:n
[~, i_list(c)] = ...
histc(rand(1, 1), cumsum([0;, M(:, j_list(c))./sum(M(:, j_list(c)))]));
end
end
toc; %1.10 sec on my computer
警告和优化。