matlab - 快速计算子矩阵每一列的唯一值数量

Question

我有一个X包含数十行数千列的矩阵，所有元素都是分类的并重新组织为索引矩阵。例如，将ithcolumnX(:,i) = [-1,-1,0,2,1,2]'转换为X2(:,i) = icof [x,ia,ic] = unique(X(:,i))，以便于使用 function accumarray。我从矩阵中随机选择了一个子矩阵，并计算了子矩阵每一列的唯一值的数量。我执行了这个程序 10,000 次。我知道几种计算列中唯一值数量的方法，到目前为止我发现的禁食方法如下所示：

mx = max(X);
for iter = 1:numperm
    for j = 1:ny
        ky = yrand(:,iter)==uy(j);
        % select submatrix from X where all rows correspond to rows in y that y equals to uy(j)
        Xk = X(ky,:);
        % specify the sites where to put the number of each unique value
        mxj = mx*(j-1);
        mxi = mxj+1;
        mxk = max(Xk)+mxj;
        % iteration to count number of unique values in each column of the submatrix
        for i = 1:c
            pxs(mxi(i):mxk(i),i) = accumarray(Xk(:,i),1);
        end
    end
end

这是一种执行随机排列测试来计算X大小数据矩阵n by c和分类变量之间的信息增益的方法y，在该矩阵变量下y是随机排列的。在上述代码中，所有随机排列y的都存储在矩阵yrand中，排列的数量为numperm。的唯一值y存储在中uy，唯一编号为ny。在的每次迭代中1:numperm，Xk根据唯一元素选择y子矩阵，统计该子矩阵每一列中唯一元素的个数并存储在矩阵中pxs。

上述代码中最耗时的部分是i = 1:cfor large的迭代c。

是否可以以矩阵方式执行该功能accumarray以避免for循环？我还能如何改进上面的代码？

--------

根据要求，提供了包括上述代码的简化测试功能

%% test
function test(x,y)

[r,c] = size(x);
x2 = x;
numperm = 1000;

% convert the original matrix to index matrix for suitable and fast use of accumarray function
for i = 1:c
    [~,~,ic] = unique(x(:,i));
    x2(:,i) = ic;
end

% get 'numperm' rand permutations of y
yrand(r, numperm) = 0;
for i = 1:numperm
    yrand(:,i) = y(randperm(r));
end

% get statistic of y
uy = unique(y);
nuy = numel(uy);

% main iterations
mx = max(x2);
pxs(max(mx),c) = 0;
for iter = 1:numperm
    for j = 1:nuy
        ky = yrand(:,iter)==uy(j);
        xk = x2(ky,:);
        mxj = mx*(j-1);
        mxk = max(xk)+mxj;
        mxi = mxj+1;
        for i = 1:c
            pxs(mxi(i):mxk(i),i) = accumarray(xk(:,i),1);
        end
    end
end

和一个测试数据

x = round(randn(60,3000));
y = [ones(30,1);ones(30,1)*-1];

测试功能

tic; test(x,y); toc

返回Elapsed time is 15.391628 seconds.我的电脑。在测试函数中，设置了 1000 个排列。因此，如果我执行 10,000 次排列并进行一些额外的计算（与上面的代码相比可以忽略不计），时间就会超出150 s预期。我认为代码是否可以改进。直观地说，accumarray以矩阵方式执行可以节省大量时间。我可以吗？

score 0 · Accepted Answer

@rahnema1 建议的方式显着改进了计算，所以我在这里发布了我的答案，这也是@Dev-iL 的要求。

%% test
function test(x,y)

[r,c] = size(x);
x2 = x;
numperm = 1000;

% convert the original matrix to index matrix for suitable and fast use of accumarray function
for i = 1:c
    [~,~,ic] = unique(x(:,i));
    x2(:,i) = ic;
end

% get 'numperm' rand permutations of y
yrand(r, numperm) = 0;
for i = 1:numperm
    yrand(:,i) = y(randperm(r));
end

% get statistic of y
uy = unique(y);
nuy = numel(uy);

% main iterations
mx = max(max(x2));
% preallocation
pxs(mx*nuy,c) = 0;
% set the edges of the bin for function histc
binrg = (1:mx)';
% preallocation of the range of matrix into which the results will be stored
mxr = mx*(0:nuy);
for iter = 1:numperm
    yt = yrand(:,iter);
    for j = 1:nuy
        pxs(mxr(j)+1:mxr(j),:) = histc(x2(yt==uy(j)),binrg);
    end
end

试验结果：

>> x = round(randn(60,3000));
>> y = [ones(30,1);ones(30,1)*-1];
>> tic; test(x,y); toc
Elapsed time is 15.632962 seconds.
>> tic; test(x,y); toc % using the way suggested by rahnema1, i.e., revised function posted above
Elapsed time is 2.900463 seconds.

matlab - 快速计算子矩阵每一列的唯一值数量

1 回答 1

Related

Reference