matlab - 将值分组到行中

Question

我有一个信息向量，比如：

Info = [10, 20, 10, 30, 500, 400, 67, 350, 20, 105, 15];

另一个是 ID 向量，例如：

Info_IDs = [1, 2, 1, 4, 2, 3, 4, 1, 3, 1, 2];

我想获得一个定义如下的矩阵：

Result =
    10    10   350   105
    20   500    15     0
   400    20     0     0
    30    67     0     0

其中每一行显示Info对应于不同 ID 的值。从这个简短的示例中可以看出，每行中每个值的数量ID不同。

我正在处理大量数据（Info是 1x1000000 并且Info_IDs是 1x25000），所以我想Result最好在没有循环的情况下实现这个矩阵。我正在考虑的一种方法是计算每个 ID 的直方图并存储此信息（因此Result不包含原始信息，但包含分箱信息）。

预先感谢大家的意见。

score 1 · Accepted Answer

这是一个向量化的解决方案，即使在大型矩阵上也应该既能节省内存又能快速工作：

%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))];
Info_IDs = [Info_IDs, cumsum(padval) + 1];

%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';

第二步也可以如下进行：

%// Group data into rows
[sorted_IDs, sorted_idx] = sort(Info_IDs);
Result = reshape(Info(sorted_idx), numel(len), []).';

例子

%// Sample input data
Info = [10 20 10 30 500 400 67 350 20 105 15];
Info_IDs = [1 2 1 4 2 3 4 1 3 1 2];

%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))]
Info_IDs = [Info_IDs, cumsum(padval) + 1]

%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';

结果是：

Result =
    10    10   350   105
    20   500    15     0
   400    20     0     0
    30    67     0     0

score 0 · Accepted Answer

我不知道不使用循环，但这非常快：

Result = [];
n = 4; %i.e.  number of classes
for c = 1:n 
    row = Info(Info_IDs == c);
    Result (c, 1:size(row,2)) = row;
end

如果速度真的是一个问题，那么你可以预先分配为Result = zeros(4, sum(Info_IDs == mode(Info_IDs)))

score 0 · Accepted Answer

如果您不介意两者之间有零：

number_Ids = 4; % set as required
aux = (bsxfun(@eq,Info_IDs,(1:number_Ids).'));
sol = bsxfun(@(x,y) x.*y,Info,aux)

在您的示例中，这给出了：

10     0    10     0     0     0     0   350     0   105     0
 0    20     0     0   500     0     0     0     0     0    15
 0     0     0     0     0   400     0     0    20     0     0
 0     0     0    30     0     0    67     0     0     0     0

或者，如果您确实介意零而不是顺序，则可以sort按行显示此结果：

sol2 = sort(sol,2,'descend')

这使

350   105    10    10     0     0     0     0     0     0     0
500    20    15     0     0     0     0     0     0     0     0
400    20     0     0     0     0     0     0     0     0     0
 67    30     0     0     0     0     0     0     0     0     0

编辑：可以使用与此处相同的技巧来保留非零条目的顺序

matlab - 将值分组到行中

3 回答 3

例子

Related

Reference