python - 使用 Mahalanobis 在最近邻居中的错误

Question

当 algorithm=brute 时，来自 sklearn.neighbors 的 NearestNeighbors 类不能正确处理 V 矩阵。V 矩阵似乎被忽略了。

import numpy as np
from sklearn.neighbors import NearestNeighbors, DistanceMetric
from sklearn.datasets import make_classification

现在我们创建一些数据并说明问题。

 X = np.random.randn(100, 5)  # "Real dataset"
another_X = np.random.randn(100, 5)  # Another to compute a false covariance matrix
 
# Create many different Nearest Neighbor objects
neighbors_dict = dict()

#----- Algo=auto works
neighbors_dict['algo=auto, correct V'] = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='mahalanobis',
                                                    metric_params = {'V': np.cov(X, rowvar=False)})

# Using the wrong covariance to show a difference
neighbors_dict['algo=auto, wrong V'] = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='mahalanobis',
                                                    metric_params = {'V': np.cov(another_X, rowvar=False)})

# When algo=auto, we must specify V
try:
    foo = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='mahalanobis')
    foo.fit(X)
except:
    print("(Need to pass the V parameter when using 'auto' in the constructor)\n")


# ----- Algo=brute is broken
# Not asked to specify V
neighbors_dict['algo=brute, no V'] = NearestNeighbors(n_neighbors=2, algorithm='brute', metric='mahalanobis')

# The results are the same regardless of whether you pass the correct or incorrect covariance matrix
neighbors_dict['algo=brute, correct V'] = NearestNeighbors(n_neighbors=2, algorithm='brute', metric='mahalanobis',
                                                       metric_params = {'V': np.cov(X, rowvar=False)})
neighbors_dict['algo=brute, wrong V'] = NearestNeighbors(n_neighbors=2, algorithm='brute', metric='mahalanobis',
                                                       metric_params = {'V': np.cov(another_X, rowvar=False)})
    

    
print("Results for various choices of algo and V")
for kk, model in neighbors_dict.items():
    model.fit(X)
    result = model.kneighbors(X[0:1,:])
    print(kk,  "\t", result)

    
print("\n\nNote, the covariance matrices *ARE* different, even when algo=brute, it's just ignored")
print(neighbors_dict['algo=brute, wrong V'].effective_metric_params_)
print(neighbors_dict['algo=brute, correct V'].effective_metric_params_)


print("\n\nUsing  DistanceMetric, we can confirm that algo=auto is  getting the right answer\n")
dist, idx = neighbors_dict['algo=auto, correct V'] .kneighbors(X[0:1,:])
metric = DistanceMetric.get_metric('mahalanobis', V=np.cov(X, rowvar=False))
metric_result = metric.pairwise(X[idx].squeeze())

print(f"Distance from NearestNeighbors with auto=algo:\n{dist}")
print(f"Distance from DistanceMetric:\n{metric_result[0]}")

如果 V 不同，结果应该会改变。它适用于 algorithm=auto 但不适用于 brute

实际结果

(Need to pass the V parameter when using 'auto' in the constructor)

Results for various choices of algo and V
algo=auto, correct V     (array([[0.       , 1.1426675]]), array([[0, 9]]))
algo=auto, wrong V   (array([[0.        , 1.27951348]]), array([[0, 9]]))
algo=brute, no V     (array([[0.        , 1.13477115]]), array([[0, 9]]))
algo=brute, correct V    (array([[0.        , 1.13477115]]), array([[0, 9]]))
algo=brute, wrong V      (array([[0.        , 1.13477115]]), array([[0, 9]]))


Note, the covariance matrices *ARE* different, even when algo=brute, it's just ignored
{'V': array([[ 0.99350868, -0.00725689, -0.05251638, -0.07933377, -0.19698916],
       [-0.00725689,  0.99468328,  0.12071139,  0.17797095,  0.00706579],
       [-0.05251638,  0.12071139,  0.81470842,  0.06171428, -0.01742768],
       [-0.07933377,  0.17797095,  0.06171428,  0.92773868, -0.09856927],
       [-0.19698916,  0.00706579, -0.01742768, -0.09856927,  0.79182037]])}
{'V': array([[ 0.85808327,  0.05140924, -0.0976945 , -0.0479244 ,  0.06784053],
       [ 0.05140924,  1.21992361,  0.19561193,  0.05436643,  0.02422382],
       [-0.0976945 ,  0.19561193,  0.92783274, -0.11489006, -0.01373795],
       [-0.0479244 ,  0.05436643, -0.11489006,  0.82881417, -0.00136617],
       [ 0.06784053,  0.02422382, -0.01373795, -0.00136617,  0.97597997]])}


Using  DistanceMetric, we can confirm that algo=auto is  getting the right answer 

Distance from NearestNeighbors with auto=algo:
[[0.        1.1426675]]
Distance from DistanceMetric:
[0.        1.1426675]

我该如何解决这个问题？

python - 使用 Mahalanobis 在最近邻居中的错误

0 回答 0

Related

Reference