如果我们将 K-means 和顺序 K-means 方法应用到具有相同初始设置的相同数据集,我们会获得相同的结果吗?解释你的理由。
我个人认为答案是否定的。顺序 K-means 得到的结果取决于数据点的呈现顺序。而且结束的条件也不一样。
这里附上两种聚类算法的伪代码。
K-均值
Make initial guesses for the means m1, m2, ..., mk
Until there is no change in any mean
Assign each data point to the cluster whose mean is the nearest.
Calculate the mean of each cluster.
For i from 1 to k
Replace mi with the mean of all examples for cluster i.
end_for
end_until
顺序 K 均值
Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
Acquire the next example, x
If mi is closest to x
Increment ni
Replace mi by mi + (1/ni)*(x - mi)
end_if
end_until