python - Python：从多维数组中删除重复项

Question

在 Pythonnumpy.unique中，可以非常有效地从一维数组中删除所有重复项。

1）如何删除二维数组中的重复行或列？

2) nD 数组怎么样？

score 4 · Accepted Answer

如果可能的话，我会使用熊猫。

In [1]: from pandas import *

In [2]: import numpy as np

In [3]: a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])

In [4]: DataFrame(a).drop_duplicates().values
Out[4]: 
array([[1, 1],
       [2, 3],
       [5, 4]], dtype=int64)

score 1 · Accepted Answer

以下是另一种比for循环执行得更好的方法。10k+100 次重复需要 2 秒。

def tuples(A):
    try: return tuple(tuples(a) for a in A)
    except TypeError: return A

b = set(tuples(a))

这个想法灵感来自Waleed Khan的第一部分。所以不需要任何可能有进一步应用的附加包。我猜它也是超级Pythonic。

score 0 · Accepted Answer

numpy_indexed包为 n 维情况解决了这个问题。（免责声明：我是它的作者）。事实上，解决这个问题是启动这个包的动机；但它已经发展到包含许多相关的功能。

import numpy_indexed as npi
a = np.random.randint(0, 2, (3, 3, 3))
print(npi.unique(a))
print(npi.unique(a, axis=1))
print(npi.unique(a, axis=2))

python - Python：从多维数组中删除重复项

3 回答 3

Related

Reference