pandas - 比较两个熊猫数据框之间的地理哈希

Question

我有 2 个数据框 df1 和 df2 具有不同的纬度和经度以及它们相应的地理哈希。现在对于 df1 中的每个 geohash，我想在数据框 df2 中找到最接近的 geohash。我不确定是否有办法比较geohashes。例如，对于 df1 中的 id 121，df2 中最近的 geohash 将是 9muc3rr，对于 df2 中的 id 122，最近的 geohash 将是 9wv97m1。

数据框 df1

Id    Latitude   Longitude  Geohash 
121   32.815130 -117.151695  9mudwju
122   37.920948 -108.005043  9wepwr3

数据框 df2

Id   Latitude    Longitude  Geohash

124  32.604187  -117.005745  9muc3rr
127  37.920948  -108.005043  9wv97m1
135  39.70122   -104.876976  9xj3v7q
128  38.844032  -104.718307  9wvscp6

score 0 · Accepted Answer

如果您可以稍微重新发明轮子，您可以将 (lat, lon) 对转换为笛卡尔单位向量，然后使用点积进行比较。由于点积基本上是衡量一个向量到另一个向量的投影，因此最接近 1（最大值）的乘积将是两个向量之间的最佳匹配。

下面的示例计算基于此答案。我将假设您在 WGS84 椭球上提供大地坐标（因为 GPS 使用的是），并且椭球上方的高度对于所有点都为零：

from math import radians, sin, cos
import numpy as np

# WGS 84 parameters. Any other ellipsoid can be plugged in by changing
# the following lines. All parameters are taken from Wikipedia at
# https://en.wikipedia.org/wiki/Geodetic_datum#Parameters_for_some_geodetic_systems
invFlat = 298.257222101  # Inverse flattening (1/f), unitless
# Derived parameters
e2 = 6694.37999014  # First eccentricity squared. Unitless. Can be computed from 2*f − f**2

# Note that the radius is irrelevant since we are going to
# normalize the result anyway.

def cartesianUnitVector(lat, lon, isdeg=True):
    if isdeg:
        lat, lon = radians(lat), radians(lon)
    vec = np.array([
        cos(lat) * cos(lon),
        cos(lat) * sin(lon),
        (1 - e2) * sin(lat)
    ])
    norm = np.linalg.norm(vec)
    return vec / norm

target = (32.815130, -117.151695)
candidates = [
    (32.604187,  -117.005745),
    (37.920948,  -108.005043),
    (39.70122,   -104.876976),
    (38.844032,  -104.718307)
]

max(candidates, key=lambda x: np.dot(cartesianUnitVector(*x), cartesianUnitVector(*target)))

可以在Wikipedia上找到 geodetic-to-ECEF 公式。该示例显示了如何对可迭代的经纬对进行操作。我不完全确定如何将其应用于熊猫，但您的问题是关于如何进行比较，我想我已经为此提供了答案。我敢肯定，一旦你定义了转换函数和使用它的比较键，你就可以毫无困难地将它应用到 pandas。

pandas - 比较两个熊猫数据框之间的地理哈希

1 回答 1

Related

Reference