python - sklearn.gaussian_process.kernels 中的半正弦距离

Question

是否有一种内置方法可以传递可用于高斯过程模型的内核使用的自定义距离函数？特别是，我有纬度/经度坐标中的地理数据，因此使用欧几里德距离不会给出点之间的准确距离。这似乎是 GPR 的一个不常见的用例，所以想知道在 scikit-learn 中是否有标准（-ish）方法来实现它。

现在，我编写了一个新的 Kernel 子类，在其中我用实例化内核时可以设置的参数替换了 RBF 内核源代码中的pdist(metric='sqeuclidean')和cdist(metric='sqeuclidean')调用中的 metric 参数，但这似乎是 hack-y，我想知道是否有更好的方法来做到这一点。最终，您似乎应该能够将任意距离函数传递给所有这些内核，但我无法弄清楚如何做到这一点。我编写的类（几乎与标准 kernels.RBF 类几乎完全一样）如下。有人看到更好的方法吗？或者为什么我正在做的是一个坏主意？

class customDistRBF(RBF):
    """Same as sklearn.gaussian_process.kernels.RBF except that
    it allows for a custom distance function in the RBF kernel.
    """

    def __init__(self, length_scale=1.0, length_scale_bounds=(1e-5, 1e5), dist_func='sqeuclidean'):
        RBF.__init__(self, length_scale=1.0, length_scale_bounds=length_scale_bounds)
        self.dist_func = dist_func

    def __call__(self, X, Y=None, eval_gradient=False):

        X = np.atleast_2d(X)
        length_scale = _check_length_scale(X, self.length_scale)
        if Y is None:
            # the below line is changed
            dists = pdist(X, metric=self.dist_func) / length_scale
            K = np.exp(-.5 * dists)
            # convert from upper-triangular matrix to square matrix
            K = squareform(K)
            np.fill_diagonal(K, 1)
        else:
            if eval_gradient:
                raise ValueError(
                    "Gradient can only be evaluated when Y is None.")
            # the below line is changed
            dists = cdist(X, Y,
                          metric=self.dist_func) / length_scale
            K = np.exp(-.5 * dists)

        if eval_gradient:
            if self.hyperparameter_length_scale.fixed:
                # Hyperparameter l kept fixed
                return K, np.empty((X.shape[0], X.shape[0], 0))
            elif not self.anisotropic or length_scale.shape[0] == 1:
                K_gradient = \
                    (K * squareform(dists))[:, :, np.newaxis]
                return K, K_gradient
            elif self.anisotropic:
                # We need to recompute the pairwise dimension-wise distances
                K_gradient = (X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2 \
                    / (length_scale ** 2)
                K_gradient *= K[..., np.newaxis]
                return K, K_gradient
        else:
            return K

score 0 · Accepted Answer

我建议将您的纬度/经度坐标转换为笛卡尔空间，然后您应该能够使用任何依赖欧几里得距离计算的 sklearn 内核。

def lon_lat_to_cartesian(lon, lat, R = 6371):
    """
    Returns Cartesian coordinates of a point on a sphere with radius R = 6371 
    km for earth
    """
    import numpy as np
    lon_r = np.radians(lon)
    lat_r = np.radians(lat)
    x =  R * np.cos(lat_r) * np.cos(lon_r)
    y = R * np.cos(lat_r) * np.sin(lon_r)
    z = R * np.sin(lat_r)
    return x,y,z

python - sklearn.gaussian_process.kernels 中的半正弦距离

1 回答 1

Related

Reference