我有数据“学院”。它有许多 NA。
library(mlbench)
library(stats)
College <- read.csv("colleges.XL.csv", header=T)
na.college<- na.omit(College)
row.names(na.college) <- NULL
na.college[, c(4:23)] <- scale(as.matrix(na.college[,c(-1,-2,-3)]))
plot(hc<-hclust(dist(na.college[,c(-1,-2,-3)]),method="complete"),hang=-1)
a=11
groups <- cutree(hc, a) # cut tree into "a" clusters
# draw dendogram with red borders around the "a" clusters
rect.hclust(hc, a, border="red")
# your matrix dimensions have to match with the clustering results
# remove any columns from na.college, as you did for clustering
mat <- na.college
# select the columns based on the clustering results
cluster_1 <- mat[which(groups==1),]
cluster_2 <- mat[which(groups==2),]
cluster_3 <- mat[which(groups==3),]
cluster_4 <- mat[which(groups==4),]
cluster_5 <- mat[which(groups==5),]
cluster_6 <- mat[which(groups==6),]
cluster_7 <- mat[which(groups==7),]
cluster_8 <- mat[which(groups==8),]
cluster_9 <- mat[which(groups==10),]
cluster_11 <- mat[which(groups==11),]
cluster_1<-rbind(cluster_1[, -(1:3)], colMeans(cluster_1[, -(1:3)]))
根据标准化数据,我制作了 11 个集群和 11 个集群的数据集。现在原始数据College 有一个观察结果。它有许多 NA,但并非所有都是 NA。但是,其列值未标准化。
我希望它具有除 NA 之外的标准化值,以便确定它应该属于 11 个集群中的哪个。
如果您有任何答案,请告诉我。