python - 与 NumPy 中的 For 循环进行慢速逐行比较 - 如何改进？

Question

我正在使用python并numpy比较两个数组或相等的形状与坐标 (x,y,z) 以匹配它们，如下所示：

coordsCFS
array([[ 0.02      ,  0.02      ,  0.        ],
       [ 0.03      ,  0.02      ,  0.        ],
       [ 0.02      ,  0.025     ,  0.        ],
        ..., 
       [ 0.02958333,  0.029375  ,  0.        ],
       [ 0.02958333,  0.0290625 ,  0.        ],
       [ 0.02958333,  0.0296875 ,  0.        ]])

和

coordsRMED
array([[ 0.02      ,  0.02      ,  0.        ],
       [ 0.02083333,  0.02      ,  0.        ],
       [ 0.02083333,  0.020625  ,  0.        ],
       ..., 
       [ 0.03      ,  0.0296875 ,  0.        ],
       [ 0.02958333,  0.03      ,  0.        ],
       [ 0.02958333,  0.0296875 ,  0.        ]])

使用 h5py 从两个 hdf5 文件中读取数据。对于比较，我使用allclose，它测试“几乎相等”。坐标在 python 的常规浮点精度内不匹配。这就是我使用 for 循环的原因，否则它会与numpy.where. 我通常会尽量避免 for 循环，但在这种情况下，我不知道怎么做。所以我想出了这个令人惊讶的缓慢片段：

mapList = []
for cfsXYZ in coordsCFS:
    # print cfsXYZ
    indexMatch = 0
    match = []
    for asterXYZ in coordRMED:
        if numpy.allclose(asterXYZ,cfsXYZ):
            match.append(indexMatch)
            # print "Found match at index " + str(indexMatch)
            # print asterXYZ
        indexMatch += 1

    # check: must only find one match. 
    if len(match) != 1:
        print "ERROR matching"
        print match
        print cfsXYZ
        return 1

    # save to list
    mapList.append(match[0])

if len(mapList) != coordsRMED.shape[0]:
    print "ERROR: matching consistency check"
    print mapList
    return 1

这对于我的测试样本大小（800 行）来说非常慢。我计划比较更大的集合。我可以删除一致性检查并break在内部 for 循环中使用以获得一些速度优势。还有更好的方法吗？

score 1 · Accepted Answer

一种解决方案是对两个数组进行排序（添加一个索引列，以便排序后的数组仍然包含原始索引）。然后，为了匹配，步调一致地遍历数组。由于您期望精确的 1-1 对应关系，因此您应该始终能够匹配成对的行。

score 1 · Accepted Answer

首先要记住的是，默认情况下，在 NumPy 中，“迭代总是以 C 风格的连续方式进行（最后一个索引变化最快）”[1]。您可以通过颠倒迭代顺序来改进事情（迭代上coordMED.T，转置coordMED...）

尽管如此，我仍然对您需要循环感到惊讶：您声称“坐标在 python 的常规浮点精度内不匹配”：您是否尝试过调整的rtol和atol参数np.allclose，如其文档中所述？

[1]

score 1 · Accepted Answer

您可以通过以下方式摆脱内部循环：

for cfsXYZ in coordsCFS:
    match = numpy.nonzero(
        numpy.max(numpy.abs(coordRMED - cfsXYZ), axis=1) < TOLERANCE)

python - 与 NumPy 中的 For 循环进行慢速逐行比较 - 如何改进？

3 回答 3

Related

Reference