我正在使用修改后的劳埃德算法在 k=2 的 kmeans 中获得相等的集群大小输出。以下是伪代码:
- Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2)
- Repeat below steps until convergence
- Sort all points xi according to ascending values of ||xi-c1|| - ||xi-c2||, i.e. differences in distances to the first and the second cluster
- Put top 50% points in cluster 1 , others in cluster 2
- Recalculate centroids as average of the allocated points (as usual in Lloyd's)
现在,上述算法在经验上对我来说效果很好:
- 它提供了平衡的集群
- 它总是降低目标
以前有文献提出或分析过这样的算法吗?请问我可以得到一些参考吗?