可能重复:
使用 k-means 聚类时如何确定 k?
如果我不了解数据,我如何最初选择 K?
有人可以帮我选择K吗?
谢谢纳文
基本思想是评估样本数据上的聚类评分,通常是聚类内的距离和聚类之间的距离。此度量越多,聚类效果越好,基于此度量,您可以选择最佳聚类参数。可以在此处找到其中一项指标http://alias-i.com/lingpipe/docs/api/com/aliasi/cluster/ClusterScore.html
Seriously, what do you want to know? Do you want us to tell you some number? Or a strategy how to find the optimal k
? You have to read a book or other resources about k-means, I'm pretty sure it is covered there.
There is something on Wikipedia about it:
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Before you use an algorithm, read about it.