我们希望将余弦相似度与层次聚类一起使用,并且我们已经计算了余弦相似度。在 sklearn.cluster.AgglomerativeClustering 文档中它说:
需要距离矩阵(而不是相似度矩阵)作为拟合方法的输入。
因此,我们将余弦相似度转换为距离
distance = 1 - similarity
我们的 python 代码在fit()
最后的方法中产生错误。(我没有X
在代码中写出真正的值,因为它非常大。)X 只是一个余弦相似度矩阵,其值已转换为上面所写的距离。注意对角线,全为 0。)这是代码:
import pandas as pd
import numpy as np
from sklearn.cluster import AgglomerativeClustering
X = np.array([0,0.3,0.4],[0.3,0,0.7],[0.4,0.7,0])
cluster = AgglomerativeClustering(affinity='precomputed')
cluster.fit(X)
错误是:
runfile('/Users/stackoverflowuser/Desktop/4.2/Pr/untitled0.py', wdir='/Users/stackoverflowuser/Desktop/4.2/Pr')
Traceback (most recent call last):
File "<ipython-input-1-b8b98765b168>", line 1, in <module>
runfile('/Users/stackoverflowuser/Desktop/4.2/Pr/untitled0.py', wdir='/Users/stackoverflowuser/Desktop/4.2/Pr')
File "/anaconda2/lib/python2.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 704, in runfile
execfile(filename, namespace)
File "/anaconda2/lib/python2.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 100, in execfile
builtins.execfile(filename, *where)
File "/Users/stackoverflowuser/Desktop/4.2/Pr/untitled0.py", line 84, in <module>
cluster.fit(X)
File "/anaconda2/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py", line 795, in fit
(self.affinity, ))
ValueError: precomputed was provided as affinity. Ward can only work with euclidean distances.
有什么我可以提供的吗?已经谢谢了。