4

根据这个问题,我应该尝试使用 Preallocation is Matlab。

现在我有一种情况,我无法计算要预分配的矩阵的确切大小。我能猜出大小。

假设矩阵的实际大小是 100,但我不知道。嘘

哪种方案更有效:

  1. 我应该奢侈吗?我猜是一个大矩阵,最后我删除了多余的行。
  2. 我应该小气吗?我猜尺寸很小,如果错了,我会添加新行。

谢谢。

4

3 回答 3

3

在我看来,答案比@natan 描述的要复杂一些。我认为他的回答没有考虑两个因素:

  1. 可能需要的内存副本:当您低估矩阵大小并重新分配它时,应将其所有旧值复制到新分配的位置。

  2. 内存块的连续性:有时 Matlab 能够在旧矩阵的末尾连续分配新内存。原则上,在这种情况下,不需要将旧值复制到新位置 - 因为它与旧值相同,只是更大。但是,如果将添加到 2D 矩阵,即使在这种情况下也需要复制内容,因为 Matlab 在内存中以行优先方式存储矩阵。

所以,我的回答是这样的:

首先,您究竟不知道矩阵的大小:如果您知道一维 - 将其设为矩阵的行数,因此您只需要更改列数。这样,如果您已经存储的数据需要被复制,它将被复制到更大的块中。

其次,这取决于您拥有多少可用 RAM。如果您不缺 RAM,那么高估也没有错。

但是,如果您缺少 RAM,请考虑低估。但是当你重新分配时,在每次迭代时增加新的块大小:

BASIC_SIZE = X;  % first estimate
NEW_SIZE = Y;    % if need more, add this amount
factor = 2;      
arr = zeros( m, BASIC_SIZE ); % first allocation, assuming we know number of rows
while someCondition
    % process arr ...
    if needMoreCols
        arr(:, size(arr,2) + (1:NEW_SIZE) ) = 0; % allocate another block
        NEW_SIZE = round(NEW_SIZE * factor);  % it seems like we are off in estimation, try larger chunk next time factor should be > 1
    end
end
arr = arr(:, 1:actualNumOfCols ); % resize to actual size, discard unnecessary columns
于 2012-12-27T07:31:02.733 回答
2

+1 for the interesting question.

EDITED Answer: From a little experimental study at first it seems better to add rows later, but it now seems more efficient overesrimated and preallocate again when you have the info of the correct size . I started with matrix size 3000 and guessed an error of 10% in the size estimation, see below:

    clear all
    clc
    guess_size=3000;
    m=zeros(guess_size);
    %1. oops overesrimated, take out rows
    tic
    m(end-300:end,:)=[];
    toc

    %1b. oops overesrimated,  preallocate again
    tic
    m=zeros(guess_size-300,guess_size);
    toc

    %2. oops overesrimated, take out cols
    m=zeros(guess_size);
    tic
    m(:,end-300:end)=[];
    toc

    %2b. oops overesrimated,  preallocate again
    m=zeros(guess_size);
    tic
    m=zeros(guess_size,guess_size-300);
    toc

    %3. oops underesrimated, add rows
    m=zeros(guess_size);
    tic
    m=zeros(guess_size+300,guess_size);
    toc

    %4. oops underesrimated, add cols
    m=zeros(guess_size);
    tic
    m=zeros(guess_size,guess_size+300);
    toc

Elapsed time is 0.041893 seconds.
Elapsed time is 0.026925 seconds.
Elapsed time is 0.041818 seconds.
Elapsed time is 0.023425 seconds.
Elapsed time is 0.027523 seconds.
Elapsed time is 0.029509 seconds.

Option 2b and 1b are slightly faster than underestimating, so if you can, better overestimate and then preallocate again. It is never efficient to delete rows from an array. Also adding columns seems slightly more efficient, but this is just a quick and dirty job. See @Shai detailed answer for the inner workings...

于 2012-12-27T06:22:59.777 回答
2

除了其他教育性答案外,简短的简短版本:有三种情况:

  1. 数组的大小相对较小(最多千字节)-> 这并不重要。
  2. 数组很大,但您不受系统内存量的限制 -> 高估。
  3. 数组很大,并且您受系统内存量的限制 -> 执行 Shai 的建议。
于 2012-12-27T08:48:36.943 回答