1

我想使用Haversine公式根据500个位置的纬度和经度生成距离矩阵500X500。

以下是 10 个位置的示例数据“coordinate.csv”:

Name,Latitude,Longitude
depot1,35.492807,139.6681689
depot2,33.6625572,130.4096027
depot3,35.6159881,139.7805445
customer1,35.622632,139.732631
customer2,35.857287,139.821461
customer3,35.955313,139.615387
customer4,35.16073,136.926239
customer5,36.118163,139.509548
customer6,35.937351,139.909783
customer7,35.949508,139.676462

得到距离矩阵后,我想根据距离矩阵找到离每个客户最近的仓库,然后将输出(从每个客户到壁橱仓库的距离和最近仓库的名称)保存到 Pandas DataFrame。

预期产出:

// Distance matrix
[ [..],[..],[..],[..],[..],[..],[..],[..],[..],[..] ]

// Closet depot to each customer (just an example)
Name,Latitude,Longitude,Distance_to_closest_depot,Closest_depot
depot1,35.492807,139.6681689,,
depot2,33.6625572,130.4096027,,
depot3,35.6159881,139.7805445,,
customer1,35.622632,139.732631,10,depot1
customer2,35.857287,139.821461,20,depot3
customer3,35.955313,139.615387,15,depot2
customer4,35.16073,136.926239,12,depot3
customer5,36.118163,139.509548,25,depot1
customer6,35.937351,139.909783,22,depot2
customer7,35.949508,139.676462,15,depot1
4

2 回答 2

1

有几个库函数可以帮助您解决这个问题:

  • cdistfromscipy可用于使用您喜欢的任何距离度量生成距离矩阵。
  • 还有一个haversine函数可以传递给cdist.

之后,这只是从距离矩阵中找到逐行最小值并将它们添加到您的 DataFrame 的情况。完整代码如下:

import pandas as pd
from scipy.spatial.distance import cdist
from haversine import haversine


df = pd.read_clipboard(sep=',')
df.set_index('Name', inplace=True)
customers = df[df.index.str.startswith('customer')]
depots = df[df.index.str.startswith('depot')]

dm = cdist(customers, depots, metric=haversine)
closest = dm.argmin(axis=1)
distances = dm.min(axis=1)

customers['Closest Depot'] = depots.index[closest]
customers['Distance'] = distances

结果:

            Latitude   Longitude Closest Depot    Distance
Name                                                      
customer1  35.622632  139.732631        depot3    4.393506
customer2  35.857287  139.821461        depot3   27.084212
customer3  35.955313  139.615387        depot3   40.565820
customer4  35.160730  136.926239        depot1  251.466152
customer5  36.118163  139.509548        depot3   60.945377
customer6  35.937351  139.909783        depot3   37.587862
customer7  35.949508  139.676462        depot3   38.255776

根据评论,我创建了一个替代解决方案,它使用平方距离矩阵。我认为原始解决方案更好,因为问题表明我们只想为每个客户找到最近的站点,因此无需计算客户之间和站点之间的距离。但是,如果您出于其他目的需要平方距离矩阵,请按照以下方式创建它:

import pandas as pd
import numpy as np
from scipy.spatial.distance import squareform, pdist
from haversine import haversine


df = pd.read_clipboard(sep=',')
df.set_index('Name', inplace=True)

dm = pd.DataFrame(squareform(pdist(df, metric=haversine)), index=df.index, columns=df.index)
np.fill_diagonal(dm.values, np.inf)  # Makes it easier to find minimums

customers = df[df.index.str.startswith('customer')]
depots = df[df.index.str.startswith('depot')]
customers['Closest Depot'] = dm.loc[depots.index, customers.index].idxmin()
customers['Distance'] = dm.loc[depots.index, customers.index].min()

最终结果与以前相同,只是您现在有一个平方距离矩阵。如果您愿意,可以在提取最小值后将 0 放回对角线上:

np.fill_diagonal(dm.values, 0)
于 2019-10-09T15:39:12.020 回答
0

如果您需要一个非常大的矩阵并且可以访问带有 CUDA 的 NVIDIA GPU,您可以使用这个 numba 函数:

from numba import cuda
import math

@cuda.jit
def haversine_gpu_distance_matrix(p, G):
  i,j = cuda.grid(2)
  if i < p.shape[0] == G.shape[0] and j < p.shape[0] == G.shape[1]:
    if i == j:
      G[i][j] = 0
    else:
      longit_a = math.radians(p[i][0])
      latit_a = math.radians(p[i][1])
      longit_b = math.radians(p[j][0])
      latit_b =  math.radians(p[j][1])
      dist_longit_add = longit_b - longit_a
      dist_latit_sub = latit_b - latit_a
      dist_latit_add = latit_b + latit_a
      pre_comp = math.sin(dist_latit_sub/2)**2
      area = pre_comp + ((1 - pre_comp - math.sin(dist_latit_add/2)**2) * math.sin(dist_longit_add/2)**2)
      central_angle = 2 * math.asin(math.sqrt(area))
      radius = 3958
      G[i][j] = math.fabs(central_angle * radius)

您可以使用以下命令调用此函数:

# 10k [lon, lat] elements, replace this with your [lon, lat] array
# if your data is in a Pandas DataFrame, please convert it to a numpy array
geo_array = np.ones((10000, 2)) 
# allocate an empty distance matrix to fill when the function is called
dm_global_mem = cuda.device_array((geo_array.shape[0], geo_array.shape[0]))
# move the data in geo_array onto the GPU
geo_array_global_mem = cuda.to_device(geo_array)

# specify kernel dimensions, this can/should be further optimized for your hardware
threadsperblock = (16, 16)
blockspergrid_x = math.ceil(geo_array.shape[0] / threadsperblock[0])
blockspergrid_y = math.ceil(geo_array.shape[1] / threadsperblock[1])
blockspergrid = (blockspergrid_x, blockspergrid_y)

# run the function, which will transform dm_global_mem inplace
haversine_gpu_distance_matrix[blockspergrid, threadsperblock](geo_array_global_mem, dm_global_mem)

请注意,这可以针对您的硬件进一步优化。g4dn.xlarge 实例在 10k 个地理坐标对(即 100M 距离测量)上的运行时间在编译后不到 0.01 秒。半径值设置为距离矩阵以英里为单位,您可以将其更改6371为米。

于 2021-12-17T07:25:59.540 回答