python - 使用 Numpy 查找一组点中的平均距离

Question

我在未知维度空间中有一个点数组，例如：

data=numpy.array(
[[ 115, 241, 314],
[ 153, 413, 144],
[ 535, 2986, 41445]])

我想找到所有点之间的平均欧几里得距离。

请注意，我有超过 20,000 分，所以我想尽可能高效地完成这项工作。

谢谢。

score 12 · Accepted Answer

如果您可以访问 scipy，则可以尝试以下操作：

scipy.spatial.distance.cdist(data,data)

score 5 · Accepted Answer

好吧，我认为没有一种超级快速的方法可以做到这一点，但应该这样做：

tot = 0.

for i in xrange(data.shape[0]-1):
    tot += ((((data[i+1:]-data[i])**2).sum(1))**.5).sum()

avg = tot/((data.shape[0]-1)*(data.shape[0])/2.)

score 4 · Accepted Answer

无法绕过评估的数量：

Sum[ni, {i, 0, n}] = http://www.equationsheet.com/latexrender/pictures/27744c0bd81116aa31c138ab38a2aa87.gif

但是如果你能得到一个近似的结果，你就可以节省所有这些平方根的费用。这取决于您的需求。

如果您要计算平均值，我建议您在计算之前不要尝试将所有值放入数组中。只需计算总和（如果还需要标准偏差，还可以计算平方和）并在计算时丢弃每个值。

因为 and ，我不知道这是否意味着你必须在某个地方乘以 2。

score 4 · Accepted Answer

既然您已经说明了查找异常值的目标，那么您最好计算样本均值以及样本方差，因为这两个操作都会为您提供 O(nd) 操作。这样，您应该能够找到异常值（例如，排除比标准差的一部分更远离平均值的点），并且过滤过程应该可以在 O(nd) 时间内执行，总共 O( nd)。

您可能有兴趣复习切比雪夫不等式。

score 4 · Accepted Answer

在没有有效解决方案的情况下进行优化是否值得？此外，在整个数据集上计算距离矩阵很少需要快速，因为您只需要执行一次 - 当您需要知道两点之间的距离时，您只需查找它，它已经计算过了。

因此，如果您没有地方开始，这里有一个。如果您想在 Numpy 中执行此操作而无需编写任何内联 fortran 或 C，那应该没问题，尽管您可能想要包含这个名为“ numexpr ”的小型基于向量的虚拟机（在 PyPI 上可用，安装起来很简单) 在这种情况下，与单独的 Numpy 相比，性能提升了 5 倍。

下面我计算了2D 空间中 10,000 个点的距离矩阵（一个 10K x 10k 矩阵给出了所有 10k 点之间的距离）。这在我的 MBP 上花了 59 秒。

import numpy as NP
import numexpr as NE

# data are points in 2D space (x, y)--obviously, this code can accept data of any dimension
x = NP.random.randint(0, 10, 10000)
y = NP.random.randint(0, 10, 10000)
fnx = lambda q : q - NP.reshape(q, (len(q), 1))
delX = fnx(x)
delY = fnx(y)
dist_mat = NE.evaluate("(delX**2 + delY**2)**0.5")

score 1 · Accepted Answer

If you want a fast and inexact solution, you could probably adapt the Fast Multipole Method algorithm.

Points that are separated by a small distance have a smaller contribution to the final average distance, so it would make sense to group points into clusters and compare the clusters distances.

python - 使用 Numpy 查找一组点中的平均距离

6 回答 6

Related

Reference