python - 获取数组 B 中的 NumPy 数组索引以获取数组 A 中的唯一值、两个数组中都存在的值、与数组 A 对齐

Question

我有两个 NumPy 数组：

A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])

我想要一个函数，它以参数为参数，并尽可能有效地为中的每个值A, B返回索引BA，与对齐。A

这些是上述测试用例的输出：

indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

我试过探索in1d(), where()，nonzero()但没有运气。任何帮助深表感谢。

编辑：数组是字符串。

score 3 · Accepted Answer

你也可以这样做：

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

根据文档，您应该能够指定right=False并跳过减一部分。这对我不起作用，可能是由于版本问题，因为我没有 numpy 1.7。

我不确定你在做什么，但一个简单且非常快速的方法是：

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

顺序会有所不同，但如果需要，可以更改。

score 1 · Accepted Answer

B对于此类事情，尽可能快地进行查找很重要。字典提供O(1)查找时间。所以，首先，让我们构造这个字典：

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

然后只需通过A并找到相应的索引：

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

score 1 · Accepted Answer

我认为你可以这样做np.searchsorted：

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

请注意，这B是从您的版本中打乱的，因此输出不同。如果其中的某些项目A不在B其中（您可以使用进行检查np.all(np.in1d(A, B))），那么这些值的返回索引将是废话，您甚至可能从最后一行获得一个（如果中缺少IndexError最大值）。AB

score 1 · Accepted Answer

numpy_indexed包（免责声明：我是它的作者）实现了与 Jaime 的解决方案相同的解决方案；但具有良好的界面、测试和许多相关的有用功能：

import numpy_indexed as npi
print(npi.indices(B, A))

score 0 · Accepted Answer

我不确定这有多有效，但它有效：

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

我从中得到：

[1 1 0 2 2 2 2 2 3 4 3 3 4]

python - 获取数组 B 中的 NumPy 数组索引以获取数组 A 中的唯一值、两个数组中都存在的值、与数组 A 对齐

5 回答 5

Related

Reference