I have set of data (of 5000 points with 4 dimensions) that I have clustered using kmeans in R.
I want to order the points in each cluster by their distance to the center of that cluster.
Very simply, the data looks like this (I am using a subset to test out various approaches):
id Ans Acc Que Kudos
1 100 100 100 100
2 85 83 80 75
3 69 65 30 29
4 41 45 30 22
5 10 12 18 16
6 10 13 10 9
7 10 16 16 19
8 65 68 100 100
9 36 30 35 29
10 36 30 26 22
Firstly, I used the following method to cluster the dataset into 2 clusters:
(result <- kmeans(data, 2))
This returns a kmeans object that has the following methods: cluster, centers etc.
But I cannot figure out how to compare each point and produce an ordered list.
Secondly, I tried the seriation approach as suggested by another SO user here
I use these commands:
clus <- kmeans(scale(x, scale = FALSE), centers = 3, iter.max = 50, nstart = 10)
mns <- sapply(split(x, clus$cluster), function(x) mean(unlist(x)))
result <- dat[order(order(mns)[clus$cluster]), ]
Which seems to produce an ordered list but if I bind it to the labeled clusters (using the following cbind command):
result <- cbind(x[order(order(mns)[clus$cluster]), ],clus$cluster)
I get the following result, which does not appear to be ordered correctly:
id Ans Acc Que Kudos clus
1 3 69 65 30 29 1
2 4 41 45 30 22 1
3 5 10 12 18 16 2
4 6 10 13 10 9 2
5 7 10 16 16 19 2
6 9 36 30 35 29 2
7 10 36 30 26 22 2
8 1 100 100 100 100 1
9 2 85 83 80 75 2
10 8 65 68 100 100 2
I don't want to be writing commands willy-nilly but understand how the approach works. If anyone could help out or spread some light on this, it would be really great.
EDIT:::::::::::
As the clusters can be easily plotted, I'd imagine there is a more straightforward way to get and rank the distances between points and the center.
The centers for the above clusters (when using k = 2) are as follows. But I do not know how to get and compare this with each individual point.
Ans Accep Que Kudos
1 83.33333 83.66667 93.33333 91.66667
2 30.28571 30.14286 23.57143 20.85714
NB::::::::
I don't need top use kmeans but I want to specify the number of clusters and retrieve an ordered list of points from those clusters.