matlab - 重复向量的元素

Question

我有一个A包含元素的值向量i，例如：

A = [0.1 0.2 0.3 0.4 0.5]; 说r = [5 2 3 2 1];

现在我想创建一个新的向量Anew，其中包含r(i)重复的值i，A使得第一个r(1)=5项目Anew 具有值A(1)并且新向量的长度是sum(r)。因此：

Anew = [0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.5]

我确信这可以通过for结合 eg的精心设计的循环来完成repmat，但是有人知道如何以更流畅的方式做到这一点吗？

score 4 · Accepted Answer

据我所知，在 MATLAB 中没有等效的函数可以做到这一点，尽管R它rep可以为你做到这一点......所以嫉妒。

无论如何，我建议的唯一方法是按照您的建议运行for循环。repmat但是，arrayfun如果您想将其作为单行来执行，您也许可以这样做……从技术上讲，将其放入单个向量所需的后处理是两个技术。因此，你可以试试这个：

Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
Anew = vertcat(Anew{:});

这实质上是for用更少的代码完成复制向量的循环和连接。我们遍历每对值，A并r吐出复制的向量。它们中的每一个都将位于一个单元格数组中，这就是为什么vertcat需要将它们全部放入一个向量中的原因。

我们得到：

请注意，其他人已经尝试过与您在这篇文章中所做的类似的事情：A similar function to R's rep in Matlab。这本质上是模仿R的做事方式rep，这就是你想做的事！

替代方案 - 使用`for`循环

由于@Divakar 的基准测试，我很想知道如何预先分配数组，然后使用实际for循环来迭代A并r通过索引填充它会基准测试。因此，上面使用for循环和索引的等效代码将是：

Anew = zeros(sum(r), 1);
counter = 1;
for idx = 1 : numel(r)
    Anew(counter : counter + r(idx) - 1) = A(idx);
    counter = counter + r(idx);
end

我们需要一个变量来跟踪我们需要在数组中插入元素的位置，该数组存储在counter. 我们通过每个数字要复制的元素总数来抵消这一点，该元素存储在的每个值中r。

因此，这种方法完全避免使用repmat并仅使用索引来生成我们的复制向量。

基准测试（à la Divakar）

基于 Divakar 的基准测试代码，我实际上尝试在我的机器上运行所有测试，除了for循环方法。我只是将他的基准代码与相同的测试用例一起使用。

这些是我根据算法得到的计时结果：

案例#1 - `N = 4000`,`max_repeat = 4000`

-------------------  With arrayfun
Elapsed time is 1.202805 seconds.
-------------------  With cumsum
Elapsed time is 1.691591 seconds.
-------------------  With bsxfun
Elapsed time is 0.835201 seconds.
-------------------  With for loop
Elapsed time is 0.136628 seconds.

案例#2 - `N = 10000`,`max_repeat = 1000`

-------------------  With arrayfun
Elapsed time is 2.117631 seconds.
-------------------  With cumsum
Elapsed time is 1.080247 seconds.
-------------------  With bsxfun
Elapsed time is 0.540892 seconds.
-------------------  With for loop
Elapsed time is 0.127728 seconds.

在这些情况下，cumsum实际上击败了arrayfun......这是我最初的预期。 bsxfun击败其他所有人，除了for循环。我的猜测是arrayfun我和 Divakar 之间的时间不同，我们在不同的架构上运行我们的代码。我目前正在使用 MATLAB R2013a 在 Mac OS X 10.9.5 MacBook Pro 机器上运行我的测试。

正如我们所看到的，for循环要快得多。我知道一个事实，当涉及到for循环中的索引操作时，JIT 会发挥作用并为您提供更好的性能。

score 3 · Accepted Answer

首先想到形成一个索引向量[1 1 1 1 1 2 2 3 3 3 4 4 5]。注意到这里的规则增量让我想到了 cumsum：我们可以通过将这些步骤放在零向量中的正确位置：[1 0 0 0 0 1 0 1 0 0 1 0 1]. 我们可以通过cumsum在输入列表上运行另一个来获得。在调整结束条件和基于 1 的索引后，我们得到：

B(cumsum(r) + 1) = 1;
idx = cumsum(B) + 1;
idx(end) = [];
A(idx)

score 3 · Accepted Answer

bsxfun基于方法 -

A = [0.1 0.2 0.3 0.4 0.5]
r = [5 2 3 2 1]

repeats = bsxfun(@le,[1:max(r)]',r) %//' logical 2D array with ones in each column 
                                    %// same as the repeats for each entry
A1 = A(ones(1,max(r)),:) %// 2D matrix of all entries repeated maximum r times
                         %// and this resembles your repmat 
out = A1(repeats) %// desired output with repeated entries

它基本上可以变成一条两条线——

A1 = A(ones(1,max(r)),:);
out = A1(bsxfun(@le,[1:max(r)]',r));

输出 -

基准测试

到目前为止，可以为此处介绍的解决方案生成一些基准测试结果。

基准代码 - 案例一

%// Parameters and input data
N = 4000;
max_repeat = 4000;
A = rand(1,N);
r = randi(max_repeat,1,N);
num_runs = 10; %// no. of times each solution is repeated for better benchmarking

disp('-------------------  With arrayfun')
tic
for k1 = 1:num_runs
    Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
    Anew = vertcat(Anew{:});
end
toc, clear Anew

disp('-------------------  With cumsum')
tic
for k1 = 1:num_runs
    B(cumsum(r) + 1) = 1;
    idx = cumsum(B) + 1;
    idx(end) = [];
    out1 = A(idx);
end
toc,clear B idx out1

disp('-------------------  With bsxfun')
tic
for k1 = 1:num_runs
    A1 = A(ones(1,max(r)),:);
    out2 = A1(bsxfun(@le,[1:max(r)]',r));
end
toc

结果

-------------------  With arrayfun
Elapsed time is 2.198521 seconds.
-------------------  With cumsum
Elapsed time is 5.360725 seconds.
-------------------  With bsxfun
Elapsed time is 2.896414 seconds.

基准代码 - 案例 II [更大的数据大小，但 r 的最大值更小]

%// Parameters and input data
N = 10000;
max_repeat = 1000;

结果

-------------------  With arrayfun
Elapsed time is 2.641980 seconds.
-------------------  With cumsum
Elapsed time is 3.426921 seconds.
-------------------  With bsxfun
Elapsed time is 1.858007 seconds.

基准的结论

对于case I,arrayfun似乎是要走的路，而对于Case II,bsxfun可能是首选武器。因此，您正在处理的数据类型似乎真的决定了采用哪种方法。

matlab - 重复向量的元素

3 回答 3

替代方案 - 使用for循环

基准测试（à la Divakar）

案例#1 - N = 4000,max_repeat = 4000

案例#2 - N = 10000,max_repeat = 1000

基准测试

基准的结论

Related

Reference

替代方案 - 使用`for`循环

案例#1 - `N = 4000`,`max_repeat = 4000`

案例#2 - `N = 10000`,`max_repeat = 1000`