我只是想知道 cmeans 函数 [在包 e1071 中] 是否有一种方法可以使用马氏距离执行聚类?
非常感谢
The e1071
package does not have a mahalanobis option. However, you can look into the cluster
package and the fanny
function. As per the help page, it also computes a fuzzy clustering of the data into k-clusters. With this function, you can provide your own distance matrix.
So for mahalanobis distance, you can calculate your distance matrix with dist
and then run your clustering.
require(cluster)
set.seed(123)
x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
y <- dist(x, "mahalanobis")
fanny(y, k=2)
Given your understandable concerns over equivalence between the functions here is an example comparing them:
require(e1071)
cl<-cmeans(x,centers=2,iter.max=20,dist="euclidean",method="cmeans",m=2)
fl <- fanny(x, k=2, maxit=20, metric="SqEuclidean", memb.exp=2)
> head(cl$membership)
1 2
[1,] 0.9948729 0.005127121
[2,] 0.3647778 0.635222221
[3,] 0.9290126 0.070987385
[4,] 0.7588260 0.241174043
[5,] 0.9282550 0.071745007
[6,] 0.9599231 0.040076886
> head(fl$membership)
[,1] [,2]
[1,] 0.9948722 0.005127775
[2,] 0.3647890 0.635211040
[3,] 0.9290171 0.070982905
[4,] 0.7588304 0.241169649
[5,] 0.9282575 0.071742489
[6,] 0.9599221 0.040077878
Although not absolutely identical, you can see there are very close. You will also notice that fanny is specifying the squared euclidean distance which is what cmeans is doing. This equivalence is noted on the fanny help page ?fanny
under metric.