0

我有一个A具有维度K x M x N和长度int向量的 3D GPU 数组,并且想要构建以下形式的 2D GPU 数组vM

X = [A(:,1,v(1)), A(:,2,v(2)),..., A(:,M,v(M))](取决于v

以最省时的方式。由于所有这些都是 GPU 数组,我想知道是否有比预分配X和使用明显for循环更快的方法来完成此任务。我的代码需要调用数百万个这样的实例,所以这成为了相当大的瓶颈。如果这很重要,典型的数量级将是K = 350 000, 2<=M<=15, N<=2000

编辑:这是我试图改进的原始瓶颈代码的最小工作版本。对 3D 数组的转换A已被注释掉。根据需要调整数组大小参数。

% generate test data:
K = 4000; M = 2; % N = 100

A_cell = cell(1,M);
s = zeros(1,M,'uint16');
for m=1:M
    s(m) = m*50; % defines some widths for the matrices in the cells
    A_cell{m} = cast(randi([0 1],K,s(m),'gpuArray'),'logical');
end
N = max(s,[],2);

% % A_cell can be replaced by a 3D array A of dimensions K x M x N:
% A = true(K,M,N,'gpuArray');
% for m=1:M
%     A(:,m,1:s(m)) = permute(A_cell{m},[1 3 2]);
% end

% bottleneck code starts here and has M = 2 nested loops:
I_true = true(K,1,'gpuArray');
I_01 = false(K,1,'gpuArray');
I_02 = false(K,1,'gpuArray');

for j_1=1:s(1)
    for j_2=1:s(2)

        v = [j_1,j_2];

        I_tmp = I_true;

        for m=1:M
            I_tmp = I_tmp & A_cell{m}(:,v(m));
        end

        I_02 = I_02 | I_tmp;
    end

    I_01 = I_01 | I_02;
end

Out = gather(I_01);

% A_cell can be replaced by 3D array A above
4

2 回答 2

3

MATLAB 允许您一次索引多个维度。这允许您提供一个线性索引向量h,该向量同时索引第二维和第三维:

% Some example data
k=2;
m=3;
n=4;
v=[2,3,1];
A=rand(k,m,n);
X=[A(:,1,v(1)),A(:,2,v(2)),A(:,3,v(3))]
%solution
h=sub2ind([m,n],[1:m],v);
Y=A(:,h)

进一步阅读:线性索引、逻辑索引等等

于 2019-11-17T17:23:01.743 回答
0

关于我上面发布的代码,事实证明使用 2D gpuAarray 而不是 3D gpuArray 代替单元格更快。这允许对最远的内部循环进行非常直接的列选择和矢量化。更确切地说:

% generate test data:
K = 4000; M = 2;

A_cell = cell(1,M); % this is given externally
s = zeros(1,M,'uint16');
for m=1:M
    s(m) = m*50; % defines some widths for the matrices in the cells
    A_cell{m} = cast(randi([0 1],K,s(m)),'logical'); % cell2mat doesn't work with cells of gpuArrays
end

% conversion of A_cell into an appropriate 2D array is straightforward:
A_hor = gpuArray(cell2mat(A_cell)); % horizontal concatenation of the cells

% bottleneck code starts here and has M = 2 nested loops:
I_01 = false(K,1,'gpuArray');
I_02 = false(K,1,'gpuArray');

t = [1,s]; t = t(1:M); % vector of the starting indices of the old matrices inside A_hor

for j_1=1:s(1)
    for j_2=1:s(2)

        j = [j_1,j_2];

        k = t-1+j; % vector of the positions of the needed columns

        I_02 = I_02 | all(A_hor(:,k),2);
    end

    I_01 = I_01 | I_02;
end

Out = gather(I_01);
于 2019-11-18T02:17:40.223 回答