python - 识别具有最小欧几里得距离的点

Question

我有一个 n 维点的集合，我想找到最接近的 2 个。我能想到的最好的二维是：

from numpy import *
myArr = array( [[1, 2],
                [3, 4],
                [5, 6],
                [7, 8]] )

n = myArr.shape[0]
cross = [[sum( ( myArr[i] - myArr[j] ) ** 2 ), i, j]
         for i in xrange( n )
         for j in xrange( n )
         if i != j
         ]

print min( cross )

这使

[8, 0, 1]

但这对于大型阵列来说太慢了。我可以对其应用什么样的优化？

有关的：

两个不同 Numpy 数组中的点之间的欧几里德距离，不在

score 11 · Accepted Answer

试试scipy.spatial.distance.pdist(myArr)。这将为您提供一个浓缩的距离矩阵。您可以使用argmin它并找到最小值的索引。这可以转换成对信息。

score 9 · Accepted Answer

关于这个问题有一个完整的维基百科页面，请参阅： http ://en.wikipedia.org/wiki/Closest_pair_of_points

执行摘要：您可以使用递归分治算法（在上面的 Wiki 页面上概述）实现 O(n log n)。

score 6 · Accepted Answer

您可以利用最新版本的 SciPy (v0.9) Delaunay 三角测量工具。您可以确定最近的两个点将是三角剖分中单纯形的边，它是比每次组合都要小得多的对子集。

这是代码（针对一般 ND 更新）：

import numpy
from scipy import spatial

def closest_pts(pts):
    # set up the triangluataion
    # let Delaunay do the heavy lifting
    mesh = spatial.Delaunay(pts)

    # TODO: eliminate reduncant edges (numpy.unique?)
    edges = numpy.vstack((mesh.vertices[:,:dim], mesh.vertices[:,-dim:]))

    # the rest is easy
    x = mesh.points[edges[:,0]]
    y = mesh.points[edges[:,1]]

    dists = numpy.sum((x-y)**2, 1)
    idx = numpy.argmin(dists)

    return edges[idx]
    #print 'distance: ', dists[idx]
    #print 'coords:\n', pts[closest_verts]

dim = 3
N = 1000*dim
pts = numpy.random.random(N).reshape(N/dim, dim)

看起来很接近 O(n)：

在此处输入图像描述

score 2 · Accepted Answer

有一个 scipy 函数pdist可以以相当有效的方式获得数组中点之间的成对距离：

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

输出 N*(N-1)/2 个唯一对（因为 r_ij == r_ji）。然后，您可以搜索最小值并避免代码中的整个循环混乱。

score 1 · Accepted Answer

也许您可以按照以下思路进行：

In []: from scipy.spatial.distance import pdist as pd, squareform as sf
In []: m= 1234
In []: n= 123
In []: p= randn(m, n)
In []: d= sf(pd(p))
In []: a= arange(m)
In []: d[a, a]= d.max()
In []: where(d< d.min()+ 1e-9)
Out[]: (array([701, 730]), array([730, 701]))

有了更多的点，您需要能够以某种方式利用聚类的层次结构。

score 0 · Accepted Answer

与仅执行嵌套循环并跟踪最短的对相比，它有多快？我认为创建一个巨大的交叉数组可能会伤害你。如果你只做二维点，即使 O(n^2) 仍然很快。

score 0 · Accepted Answer

对于小型数据集，公认的答案是可以的，但它的执行时间缩放为n**2. 但是，正如@payne 所指出的，最优解决方案可以实现n*log(n)计算时间缩放。

可以使用sklearn.neighbors.BallTree获得此最佳解决方案，如下所示。

import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import BallTree as tree

n = 10
dim = 2
xy = np.random.uniform(size=[n, dim])

# This solution is optimal when xy is very large
res = tree(xy)
dist, ids = res.query(xy, 2)
mindist = dist[:, 1]  # second nearest neighbour
minid = np.argmin(mindist)

plt.plot(*xy.T, 'o')
plt.plot(*xy[ids[minid]].T, '-o')

此过程适用于非常大的xy值集，甚至适用于大尺寸dim（尽管示例说明了这种情况dim=2）。结果输出如下所示

可以使用scipy.spatial.cKDTree获得相同的解决方案，方法是将sklearn导入替换为以下 Scipy 之一。但是请注意cKDTree，与不同BallTree的是，它不能很好地扩展到高维度。

from scipy.spatial import cKDTree as tree

python - 识别具有最小欧几里得距离的点

7 回答 7

Related

Reference