下面的代码(减去我的问题)生成此图:
我用“->”标记了 4 个混淆区域
> m <- matrix(c(1,1,1) , ncol=3)
>
> x <- rbind(matrix(c(1,0,1) , ncol=3),
+ matrix(c(1,1,1) , ncol=3),
+ matrix(c(1,1,0) , ncol=3),
+ matrix(c(0,1,1) , ncol=3),
+ matrix(c(0,0,1) , ncol=3),
+ matrix(c(0,0,0) , ncol=3),
+ matrix(c(1,1,1) , ncol=3),
+ matrix(c(1,1,1) , ncol=3),
+ matrix(c(1,1,0) , ncol=3),
+ matrix(c(1,0,0) , ncol=3),
+ matrix(c(0,0,1) , ncol=3),
+ matrix(c(0,0,0) , ncol=3),
+ matrix(c(0,0,1) , ncol=3),
+ matrix(c(0,1,1) , ncol=3),
+ matrix(c(1,0,1) , ncol=3),
+ matrix(c(0,1,0) , ncol=3))
> colnames(x) <- c("google", "stackoverflow", "tester")
> (cl <- kmeans(x, 3))
K-means clustering with 3 clusters of sizes 3, 10, 3
-> Where are sizes 3, 10 3 appearing ?
Cluster means:
google stackoverflow tester
1 0.6666667 1.0 0
2 0.5000000 0.5 1
3 0.3333333 0.0 0
-> There are three clusters, but what does each number signify ?
Clustering vector:
[1] 2 2 1 2 2 3 2 2 1 3 2 3 2 2 2 1
-> This looks to be created by summing the values of each matrix but seems to be unordered as second element in this vector is '2' but second element in 'x' is matrix(c(1,1,1) , ncol=3) which is '3'
Within cluster sum of squares by cluster:
[1] 0.6666667 5.0000000 0.6666667
(between_SS / total_SS = 46.1 %)
-> what are between_SS & total_SS ?
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size"
> plot(x, col = cl$cluster)
> points(cl$centers, col = 1:5, pch = 8, cex = 2)
>
可以通过阅读该算法的实现(http://en.wikipedia.org/wiki/K-means_clustering)来提供这些问题的答案我看不到 r 是如何计算这些值的