当 algorithm=brute 时,来自 sklearn.neighbors 的 NearestNeighbors 类不能正确处理 V 矩阵。V 矩阵似乎被忽略了。
import numpy as np
from sklearn.neighbors import NearestNeighbors, DistanceMetric
from sklearn.datasets import make_classification
现在我们创建一些数据并说明问题。
X = np.random.randn(100, 5) # "Real dataset"
another_X = np.random.randn(100, 5) # Another to compute a false covariance matrix
# Create many different Nearest Neighbor objects
neighbors_dict = dict()
#----- Algo=auto works
neighbors_dict['algo=auto, correct V'] = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='mahalanobis',
metric_params = {'V': np.cov(X, rowvar=False)})
# Using the wrong covariance to show a difference
neighbors_dict['algo=auto, wrong V'] = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='mahalanobis',
metric_params = {'V': np.cov(another_X, rowvar=False)})
# When algo=auto, we must specify V
try:
foo = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='mahalanobis')
foo.fit(X)
except:
print("(Need to pass the V parameter when using 'auto' in the constructor)\n")
# ----- Algo=brute is broken
# Not asked to specify V
neighbors_dict['algo=brute, no V'] = NearestNeighbors(n_neighbors=2, algorithm='brute', metric='mahalanobis')
# The results are the same regardless of whether you pass the correct or incorrect covariance matrix
neighbors_dict['algo=brute, correct V'] = NearestNeighbors(n_neighbors=2, algorithm='brute', metric='mahalanobis',
metric_params = {'V': np.cov(X, rowvar=False)})
neighbors_dict['algo=brute, wrong V'] = NearestNeighbors(n_neighbors=2, algorithm='brute', metric='mahalanobis',
metric_params = {'V': np.cov(another_X, rowvar=False)})
print("Results for various choices of algo and V")
for kk, model in neighbors_dict.items():
model.fit(X)
result = model.kneighbors(X[0:1,:])
print(kk, "\t", result)
print("\n\nNote, the covariance matrices *ARE* different, even when algo=brute, it's just ignored")
print(neighbors_dict['algo=brute, wrong V'].effective_metric_params_)
print(neighbors_dict['algo=brute, correct V'].effective_metric_params_)
print("\n\nUsing DistanceMetric, we can confirm that algo=auto is getting the right answer\n")
dist, idx = neighbors_dict['algo=auto, correct V'] .kneighbors(X[0:1,:])
metric = DistanceMetric.get_metric('mahalanobis', V=np.cov(X, rowvar=False))
metric_result = metric.pairwise(X[idx].squeeze())
print(f"Distance from NearestNeighbors with auto=algo:\n{dist}")
print(f"Distance from DistanceMetric:\n{metric_result[0]}")
如果 V 不同,结果应该会改变。它适用于 algorithm=auto 但不适用于 brute
实际结果
(Need to pass the V parameter when using 'auto' in the constructor)
Results for various choices of algo and V
algo=auto, correct V (array([[0. , 1.1426675]]), array([[0, 9]]))
algo=auto, wrong V (array([[0. , 1.27951348]]), array([[0, 9]]))
algo=brute, no V (array([[0. , 1.13477115]]), array([[0, 9]]))
algo=brute, correct V (array([[0. , 1.13477115]]), array([[0, 9]]))
algo=brute, wrong V (array([[0. , 1.13477115]]), array([[0, 9]]))
Note, the covariance matrices *ARE* different, even when algo=brute, it's just ignored
{'V': array([[ 0.99350868, -0.00725689, -0.05251638, -0.07933377, -0.19698916],
[-0.00725689, 0.99468328, 0.12071139, 0.17797095, 0.00706579],
[-0.05251638, 0.12071139, 0.81470842, 0.06171428, -0.01742768],
[-0.07933377, 0.17797095, 0.06171428, 0.92773868, -0.09856927],
[-0.19698916, 0.00706579, -0.01742768, -0.09856927, 0.79182037]])}
{'V': array([[ 0.85808327, 0.05140924, -0.0976945 , -0.0479244 , 0.06784053],
[ 0.05140924, 1.21992361, 0.19561193, 0.05436643, 0.02422382],
[-0.0976945 , 0.19561193, 0.92783274, -0.11489006, -0.01373795],
[-0.0479244 , 0.05436643, -0.11489006, 0.82881417, -0.00136617],
[ 0.06784053, 0.02422382, -0.01373795, -0.00136617, 0.97597997]])}
Using DistanceMetric, we can confirm that algo=auto is getting the right answer
Distance from NearestNeighbors with auto=algo:
[[0. 1.1426675]]
Distance from DistanceMetric:
[0. 1.1426675]
我该如何解决这个问题?