python - 在带有numpy的python中，如何从另一个数组更新数组取决于两者中都存在的列？

Question

所以我有一个像这样的源数组：

 [[  9  85  32 100]
 [  7  80  30 100]
 [  2  90  16 100]
 [  6 120  22 100]
 [  5 105  17 100]
 [  0 100  33 100]
 [  3 110  22 100]
 [  4  80  22 100]
 [  8 115  19 100]
 [  1  95  28 100]]

我想用这个更新数组，取决于第一列

[[  3 110  22 105]
 [  5 105  17 110]
 [  1  95  28 115]]

变成这样

 [[  9  85  32 100]
 [  7  80  30 100]
 [  2  90  16 100]
 [  6 120  22 100]
 [  5 105  17 110]
 [  0 100  33 100]
 [  3 110  22 105]
 [  4  80  22 100]
 [  8 115  19 100]
 [  1  95  28 115]]

但是我在 NumPy 中找不到可以直接执行此操作的函数，因此目前没有比我写的这种方法更好的方法：

def update_ary_with_ary(source, updates):
    for x in updates:
        index_of_col = np.argwhere(source[:,0] == x[0])
        source[index_of_col] = x

这个函数会产生一个循环，所以它不专业，也没有高性能，所以我会使用它，直到有人给我一个更好的方法来处理 NumPy 圈，我不想要另一个圈的解决方案，只是 Numpy

score 1 · Accepted Answer

假设您的源数组是s并且更新数组是u，并且假设s并且u不是很大，您可以执行以下操作：

update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u

测试：

import numpy as np
s = np.array(
    [[  9,  85,  32, 100],
     [  7,  80,  30, 100],
     [  2,  90,  16, 100],
     [  6, 120,  22, 100],
     [  5, 105,  17, 100],
     [  0, 100,  33, 100],
     [  3, 110,  22, 100],
     [  4,  80,  22, 100],
     [  8, 115,  19, 100],
     [  1,  95,  28, 100]])
u = np.array(
    [[  3, 110,  22, 105],
     [  5, 105,  17, 110],
     [  1,  95,  28, 115]])

update_row_ids = np.nonzero(s[:,0] == u[:,0].reshape(-1,1))[1]
s[update_row_ids] = u

print(s)

这打印：

[[  9  85  32 100]
 [  7  80  30 100]
 [  2  90  16 100]
 [  6 120  22 100]
 [  5 105  17 110]
 [  0 100  33 100]
 [  3 110  22 105]
 [  4  80  22 100]
 [  8 115  19 100]
 [  1  95  28 115]]

编辑： OP提供了以下附加细节：

“源数组”是“巨大的”。
“更新数组”中的每一行都与“源数组”中的一行完全匹配。

基于这个额外的细节，以下替代解决方案可能会提供更好的性能，特别是如果源数组的行没有在第一列上排序：

sorted_idx = np.argsort(s[:,0])
pos = np.searchsorted(s[:,0],u[:,0],sorter=sorted_idx)
update_row_ids = sorted_idx[pos]

s[update_row_ids] = u

score 0 · Accepted Answer

源头你的答案工作正常，是的，它完全使用了 Numpy 圈，但在性能测试中，我的模拟程序中处理 50K 行的时间增加了一倍！从22秒到44秒！！我不知道为什么！！但是您的回答可以帮助我仅在这一行上得到正确的答案：

source[updates[:,0]] = updates
# or 
s[u[:,0]] = u

因此，当我使用它的较低处理时间从 100K 行到仅 0.5 秒，然后让我处理更像 1M 行仅 5 秒时，我已经在学习 python 并且数据挖掘对这些数字感到震惊，它在其他方面从未发生过我像常规变量一样在庞大的数组上播放的语言。你可以在我的 GitHub 上看到。

https://github.com/qahmad81/war_simulation

源头你应该接受答案，但访问过应该知道使用的最佳答案。

python - 在带有numpy的python中，如何从另一个数组更新数组取决于两者中都存在的列？

2 回答 2

Related

Reference