python - 根据每个点的最近邻距离在最佳网格上插入非结构化 X、Y、Z 数据

Question

这个问题是在我使用的显示最终解决方案的答案之后编辑的

我有来自不同来源的非结构化 2D 数据集，例如：这些数据集是 3 个 numpy.ndarray（X、Y 坐标和 Z 值）。

我的最终目标是在网格上插入这些数据以转换为图像/矩阵。所以，我需要找到插入这些数据的“最佳网格”。而且，为此，我需要在该网格的像素之间找到最佳的 X 和 Y 步长。

根据点之间的欧几里德距离确定步长：

使用每个点与其最近邻点之间的欧几里得距离的平均值。

使用KDTree/cKDTree来自 scipy.spacial 构建 X、Y 数据树。
使用forquery方法k=2获取距离（如果k=1，距离仅为零，因为对每个点的查询都找到了自己）。



    # Generate KD Tree
    xy = np.c_[x, y]  # X,Y data converted for use with KDTree
    tree = scipy.spacial.cKDTree(xy)  # Create KDtree for X,Y coordinates.

    # Calculate step
    distances, points = tree.query(xy, k=2)  # Query distances for X,Y points
    distances = distances[:, 1:]  # Remove k=1 zero distances
    step = numpy.mean(distances)  # Result

性能调整：

使用scipy.spatial.cKDTree而不是scipy.spatial.KDTree因为它真的更快。
balanced_tree=False与一起使用scipy.spatial.cKDTree：在我的情况下可以大大加快速度，但可能并非对所有数据都适用。
使用n_jobs=-1withcKDTree.query用于使用多线程。
使用p=1with cKDTree.queryfor use 曼哈顿距离代替欧几里得距离 ( p=2)：更快但可能不太准确。
仅查询点的随机子样本的距离：使用大型数据集可大大加快速度，但可能不太准确且可重复性较差。

在网格上插入点：

使用计算的步骤在网格上插入数据集点。



    # Generate grid
    def interval(axe):
        '''Return numpy.linspace Interval for specified axe'''
        cent = axe.min() + axe.ptp() / 2  # Interval center
        nbs = np.ceil(axe.ptp() / step)  # Number of step in interval
        hwid = nbs * step / 2  # Half interval width 
        return np.linspace(cent - hwid, cent + hwid, nbs)  # linspace

    xg, yg = np.meshgrid(interval(x), interval(y))  # Generate grid

    # Interpolate X,Y,Z datas on grid
    zg = scipy.interpolate.griddata((x, y), z, (xg, yg))

如果像素离初始点太远，则设置 NaN：

将 NaN 设置为与初始 X、Y、Z 数据中的点相距太远（距离 > 步长）的网格像素。使用之前生成的 KDTree。



    # Calculate pixel to X,Y,Z data distances
    dist, _ = tree.query(np.c_[xg.ravel(), yg.ravel()])
    dist = dist.reshape(xg.shape)

    # Set NaN value for too far pixels
    zg[dist > step] = np.nan

score 2 · Accepted Answer

您要解决的问题称为“所有最近邻问题”。例如看这篇文章：http: //link.springer.com/article/10.1007/BF02187718

我相信解决方案是 O(N log N)，因此与 KDTree.query 的顺序相同，但实际上比一堆单独的查询要快得多。对不起，我不知道这个的python实现。

score 1 · Accepted Answer

我建议你一起去KDTree.query。

您正在搜索一个特征距离来缩放您的分箱：我建议您只取点的随机子集，并使用曼哈顿距离，因为KDTree.query它非常慢（但它是一个 *log(n) 复杂度）。

这是我的代码：

# CreateTree
tree=scipy.spatial.KDTree(numpy.array(points)) # better give it a copy?
# Create random subsample of points
n_repr=1000
shuffled_points=numpy.array(points)
numpy.random.shuffle(shuffled_points)
shuffled_points=shuffled_points[:n_repr]
# Query the tree
(dists,points)=tree.query(shuffled_points,k=2,p=1)
# Get _extimate_ of average distance:
avg_dists=numpy.average(dists)
print('average distance Manhattan with nearest neighbour is:',avg_dists)

我建议您使用曼哈顿距离（https://en.wikipedia.org/wiki/Taxicab_geometry），因为它的计算速度比欧几里得距离快。而且由于您只需要平均距离的估计器就足够了。

python - 根据每个点的最近邻距离在最佳网格上插入非结构化 X、Y、Z 数据

2 回答 2

Related

Reference