matlab - 通过仅保留第 1 列中在第 2 列中具有最大值的条目来删除数组第 1 列中的重复项

Question

我有一个 X 2 矩阵，它是通过将许多矩阵附加在一起而形成的。矩阵的第 1 列由指示 item_ids 的数字组成，第 2 列由相似度值组成。由于该矩阵是通过将许多矩阵连接在一起形成的，因此第 1 列中可能存在我不想要的重复值。我想删除第 1 列中的所有重复值，这样对于第 1 列中存在重复的任何值 X ，矩阵的所有行都被删除其中 column 1 = X ，除了矩阵的行1 = X 和 column2 值是矩阵中所有 X 值中的最大值。

Example:

  1    0.85
  1    0.5
  1    0.95
  2    0.5

result required:
    1 0.95
    2 0.5

通过删除 n X 2 矩阵中的所有行获得，其中第 1 列中的重复值在第 2 列中没有最大值。

score 2 · Accepted Answer

如果索引中可能存在空白，请使用稀疏输出：

>> result = accumarray( M(:,1), M(:,2), [], @max, 0, true)
>> uMat = [find(result) nonzeros(result)]
uMat =
    1.0000    0.9500
    2.0000    0.5000

这也简化了输出第一列的创建。

使用unique.

第一种方式，sort与'descend'订购一起使用：

>> [~,IS] = sort(M(:,2),'descend');
>> [C,ia] = unique(M(IS,1));
>> M(IS(ia),:)
ans =
    1.0000    0.9500
    2.0000    0.5000

其次，使用sortrows（按第二列升序排序），并unique带有'first'出现选项：

>> [Ms,IS] = sortrows(M,2)
>> [~,ia] = unique(Ms(:,1),'last')
>> M(IS(ia),:)
ans =
    1.0000    0.9500
    2.0000    0.5000

score 1 · Accepted Answer

你可以试试

result = accumarray( M(:,1), M(:,2), [max(M(:,1)) 1], @max);

根据文档，这应该可行。

抱歉，我现在不能尝试...

更新- 我确实尝试了上述方法，它正确地给了我最大值。但是，它不会为您提供与最大值对应的索引。为此，您需要做更多的工作（因为标识符可能未排序）。

result = accumarray( M(:,1), M(:,2), [], @max, true);  % to create a sparse matrix
c1 = find(result);     % to get the indices of nonzero values
c2 = full(result(c1)); % to get the values corresponding to the indices
answer = [c1 c2];      % to put them side by side

score 0 · Accepted Answer

另一种方法：使用sortrowsand thendiff为第一列的每个值选择最后一行：

M2 = sortrows(M);
result = M2(diff([M2(:,1); inf])>0,:);

如果第一列中的索引有间隙，这也有效。

score 0 · Accepted Answer

result = accumarray( M(:,1), M(:,2), [max(M(:,1)) 1], @max);

finalResult = [sort(unique(M(:,1))),nonzeros(result)]

This basically reattaches the required item_ids in sorted order to the corresponding max_similarity values in the second column. As a result in the finalResult matrix, each value in column 1 is unique and the corresponding value in column 2 is the maximum similarity value for that item_id. @Floris, thanks for your help couldn't have solved this without your help.

matlab - 通过仅保留第 1 列中在第 2 列中具有最大值的条目来删除数组第 1 列中的重复项

4 回答 4

Related

Reference