python - deleting rows of a numpy array based on uniqueness of a value

Question

let's say I have a bi-dimensional array like that

numpy.array(
    [[0,1,1.2,3],
    [1,5,3.2,4],
    [3,4,2.8,4], 
    [2,6,2.3,5]])

I want to have an array formed eliminating whole rows based on uniqueness of values of last column, selecting the row to keep based on value of third column. e.g. in this case i would like to keep only one of the rows with 4 as last column, and choose the one which has the minor value of third column, having something like that as a result:

array([0,1,1.2,3],
      [3,4,2.8,4],
      [2,6,2.3,5])

thus eliminating row [1,5,3.2,4]

which would be the best way to do it?

score 1 · Accepted Answer

我的 numpy 已经过时了，但这应该可行：

#keepers is a dictionary of type int: (int, int)
#the key is the row's final value, and the tuple is (row index, row[2])
keepers = {}
deletions = []
for i, row in enumerate(n):
    key = row[3]
    if key not in keepers:
        keepers[key] = (i, row[2])
    else:
        if row[2] > keepers[key][1]:
            deletions.append(i)
        else:
            deletions.append(keepers[key][0])
            keepers[key] = (i, row[2])
o = numpy.delete(n, deletions, axis=0)

我已经从我的声明式解决方案中大大简化了它，这变得非常笨拙。希望这更容易理解；我们所做的只是维护一个我们想要保留的值的字典和一个我们想要删除的索引列表。

score 1 · Accepted Answer

这可以通过组合在 Numpy 中有效地实现lexsort，unique如下所示

import numpy as np

a = np.array([[0, 1, 1.2, 3], 
              [1, 5, 3.2, 4],
              [3, 4, 2.8, 4], 
              [2, 6, 2.3, 5]])

# Sort by last column and 3rd column when values are equal
j = np.lexsort(a.T)

# Find first occurrence (=smallest 3rd column) of unique values in last column
k = np.unique(a[j, -1], return_index=True)[1]

print(a[j[k]])

这将返回所需的结果

[[ 0.   1.   1.2  3. ]
 [ 3.   4.   2.8  4. ]
 [ 2.   6.   2.3  5. ]]

python - deleting rows of a numpy array based on uniqueness of a value

2 回答 2

Related

Reference