我正在尝试使用 pyclustering 库中的 xmeans 对一些数据进行聚类和可视化。我直接从文档中的示例中复制了代码,
from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.xmeans import xmeans
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.utils import read_sample
from pyclustering.samples.definitions import SIMPLE_SAMPLES
sample = X # read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
# Prepare initial centers - amount of initial centers defines amount of clusters from which X-Means will
# start analysis.
amount_initial_centers = 2
initial_centers = kmeans_plusplus_initializer(sample, amount_initial_centers).initialize()
# Create instance of X-Means algorithm. The algorithm will start analysis from 2 clusters, the maximum
# number of clusters that can be allocated is 20.
xmeans_instance = xmeans(sample, initial_centers, 20)
xmeans_instance.process()
# Extract clustering results: clusters and their centers
clusters = xmeans_instance.get_clusters()
centers = xmeans_instance.get_centers()
# Print total sum of metric errors
print("Total WCE:", xmeans_instance.get_total_wce())
# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.append_cluster(centers, None, marker='*', markersize=10)
visualizer.show()
唯一的区别是我分配sample
了矩阵 X 的值,而不是加载样本数据集。
当我尝试可视化聚类结果时,出现此错误:
Only objects with size dimension 1 (1D plot), 2 (2D plot) or 3 (3D plot) can be displayed. For multi-dimensional data use 'cluster_visualizer_multidim'.
我的 X 矩阵是这样生成的:
features = ["I", "Iu", other 7 column names]
data = df[features]
...
X = scaler.fit_transform(data)
有没有办法可视化集群并一次只绘制两个/三个特征?
我在文档中找不到任何内容。
我试过这个:
visualizer.append_clusters(clusters, sample[:,[0,1]])
为了只可视化前两个特征并得到这个错误
Only clusters with the same dimension of objects can be displayed on canvas.
编辑:
我按照 annoviko 的答案中的建议更新了代码,但现在出现以下错误:
ValueError Traceback (most recent call last)
<ipython-input-69-6fd7d2ce5fcd> in <module>
20 visualizer.append_clusters(clusters, X)
21 visualizer.append_cluster(centers, None, marker='*', markersize=10)
---> 22 visualizer.show(pair_filter=[[0, 1], [0, 2]])
/usr/local/lib/python3.8/site-packages/pyclustering/cluster/__init__.py in show(self, pair_filter, **kwargs)
224 raise ValueError("There is no non-empty clusters for visualization.")
225
--> 226 cluster_data = self.__clusters[0].data or self.__clusters[0].cluster
227 dimension = len(cluster_data[0])
228
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
它由 Visualizer.show() 引发,即使我从函数调用中删除 pair_filter 也会发生。