“我理解输出 T[i] 只表示集群中元素的数量......”
T[j]
是第 j 个数据点的“簇号”。也就是说,fcluster
将数据点分配给集群。因此,例如,如果有五个数据点,fcluster
并将第一个、第二个和最后一个放在集群 1 中,而将其他数据点放在集群 2 中,则返回值fcluster
将是array([1, 1, 2, 2, 1])
。
这是一个演示,展示了如何将这些数据分开。为方便起见,我使用了和fclusterdata
的组合来代替。 返回与.linkage
fcluster
fclusterdata
fcluster
import numpy as np
def cluster_indices(cluster_assignments):
n = cluster_assignments.max()
indices = []
for cluster_number in range(1, n + 1):
indices.append(np.where(cluster_assignments == cluster_number)[0])
return indices
if __name__ == "__main__":
from scipy.cluster.hierarchy import fclusterdata
# Make some test data.
data = np.random.rand(15,2)
# Compute the clusters.
cutoff = 1.0
cluster_assignments = fclusterdata(data, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
for k, ind in enumerate(indices):
print "cluster", k + 1, "is", ind
典型输出:
4 clusters
cluster 1 is [ 0 1 6 8 10 13 14]
cluster 2 is [ 3 4 5 7 11 12]
cluster 3 is [9]
cluster 4 is [2]