algorithm - k=2 的 Kmeans 算法给出相等的簇大小输出

Question

我正在使用修改后的劳埃德算法在 k=2 的 kmeans 中获得相等的集群大小输出。以下是伪代码：

- Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2)
- Repeat below steps until convergence
    - Sort all points xi according to ascending values of ||xi-c1|| - ||xi-c2||, i.e. differences in distances to the first and the second cluster
    - Put top 50% points in cluster 1 , others in cluster 2
    - Recalculate centroids as average of the allocated points (as usual in Lloyd's)

现在，上述算法在经验上对我来说效果很好：

它提供了平衡的集群
它总是降低目标

以前有文献提出或分析过这样的算法吗？请问我可以得到一些参考吗？

score 2 · Accepted Answer

此处解释了超过 2 个集群的更通用版本：

https://elki-project.github.io/tutorial/same-size_k_means

我在文献中多次看到具有各种大小限制的 k-means，但我手头没有任何参考资料。我不相信这一点：强制集群具有相同的大小与寻找最小二乘最佳逼近恕我直言的 k 均值想法相矛盾，因为这意味着故意选择更差的逼近。

algorithm - k=2 的 Kmeans 算法给出相等的簇大小输出

1 回答 1

Related

Reference