我在 python 中实现了 kmeans 算法,并试图计算集群在各种 k 值下的轮廓性能。以下是一小部分数据集的几个变量。
def avgdist(pt, clust):
dists = []
for elem in clust:
dists.append(np.linalg.norm(pt-elem))
return np.mean(dists)
def silhouette(data, clusts):
s = []
print("data-")
print(data)
for i in range(len(clusts)):
for j in range(len(clusts[i])):
clusts[i][j] = clusts[i][j].tolist()
print("Clusters")
print(clusts)
for elem in data:
a = []
b = []
print(elem)
for clust in clusts:
print(clust)
if elem in clust: #Error in this line
b.append(avgdist(elem, clust))
else:
a.append(avgdist(elem, clust))
s.append((min(b)-min(a)/(max(min(b), min(a)))))
return np.mean(s)
获得的终端输出如下 -
data-
[[ 0. 0. 5.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 7.]
[ 0. 0. 0.]
[ 0. 0. 12.]
[ 0. 0. 0.]
[ 0. 0. 7.]
[ 0. 0. 9.]
[ 0. 0. 11.]]
Clusters
[[array([ 0., 0., 5.]), array([ 0., 0., 0.]), array([ 0., 0., 0.]), array([ 0., 0., 0.]), array([ 0., 0., 0.])], [array([ 0., 0., 7.]), array([ 0., 0., 12.]), array([ 0., 0., 7.]), array([ 0., 0., 9.]), array([ 0., 0., 11.])]]
[ 0. 0. 5.]
[[0.0, 0.0, 5.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
这是与注释行中的错误一起获得的 -
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
请帮忙,因为我不确定该错误在我的上下文中意味着什么。类似的问题让我对错误性质有了一些了解,但我相信这里不适用。
编辑-我通过更改错误行解决了这个问题-
.....
if elem.tolist() in clust: #Error in this line
.....