python - 两个数组的高效匹配（KDTree如何使用）

Question

我有两个二维数组，obs1并且obs2. 它们代表两个独立的测量系列，并且都具有dim0 = 2和稍有不同的dim1，例如obs1.shape = (2, 250000)和obs2.shape = (2, 250050)。obs1[0]和obs2[0]表示时间，obs1[1]和obs2[1]表示一些空间坐标。两个数组都（或多或少）按时间排序。两个测量系列之间的时间和坐标应该相同，但实际上并非如此。此外，并非每个测量obs1值都有相应的值，obs2反之亦然。另一个问题是时间可能会有轻微的偏移。

我正在寻找一种有效的算法来将最佳匹配值obs2与obs1. 目前，我这样做：

define dt = some_maximum_time_difference
define dx = 3
j = 0
i = 0
matchresults = np.empty(obs1.shape[1])
for j in obs1.shape[1]:
    while obs1[0, j] - obs2[0, j] < dt:
        i += 1
    matchresults[j] = i - dx + argmin(abs(obs1[1, i] - obs2[1, i-dx:i+dx+1]))

这会产生良好的结果。但是，它非常慢，循环运行。

我将非常感谢有关如何在速度方面改进此算法的想法，例如使用 KDtree 或类似的东西。

score 1 · Accepted Answer

在这种情况下使用cKDTree如下所示：

from scipy.spatial import cKDTree

obs2 = array with shape (2, m)
obs1 = array with shape (2, n)

kdt = cKDTree(obs2.T)
dist, indices = kdt.query(obs1.T)

where将包含与中的每个观察相对应indices的列索引。请注意，我必须转置和.obs2obs1obs1obs2

python - 两个数组的高效匹配（KDTree如何使用）

1 回答 1

Related

Reference