3

我有一个问题,关于放大我的数据集中找到的集群。我想创建与返回时给定数量的集群一样多的新矩阵。具体来说,我不确定如何返回数据并剔除感兴趣的子群体。我知道我可以做到:

mycl <- cutree(hr, 2);

但是然后呢?

这是我到目前为止所拥有的[完整代码]:

假设您有一个矩阵“m”,您通过相关矩阵中的距离按行“hr”和列“hc”进行聚类

m = matrix(0, 10, 5, dimnames = list(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), c(1, 2, 3, 4, 5)))
m[1,] = c(0,0,0,0,1)
m[2,] = c(0,0,0,1,1)
m[3,] = c(0,0,1,1,1)
m[4,] = c(0,0,1,1,0)
m[5,] = c(1,0,0,0,0)
m[6,] = c(1,1,1,0,0)
m[7,] = c(0,1,1,0,0)
m[8,] = c(0,1,1,0,0)
m[9,] = c(0,1,1,1,0)
m[10,] = c(1,1,1,0,1)
# Generates row and column dendrograms.
hr <- hclust(as.dist(1-cor(t(m), method="pearson")), method="ward"); 
hc <- hclust(as.dist(1-cor(m, method="spearman")), method="ward")

现在,我可以对我的数据进行热图:

library(gplots)
mycl <- cutree(hr, 2); 
mycolhc <- rainbow(length(unique(mycl)), start=0.1, end=0.9); 
mycolhc <- mycolhc[as.vector(mycl)]
myheatcol <- redgreen(75)

# Creates heatmap for entire data set
heatmap.2(
           m, 
           Rowv=as.dendrogram(hr), 
           Colv=as.dendrogram(hc), 
           col=myheatcol, 
           scale="row", 
           density.info="none", 
           trace="none", 
           RowSideColors=mycolhc, 
           cexCol=0.6, 
           labRow=NA
           )

具有聚类的自定义玩具矩阵的热图

4

1 回答 1

2

想到两件事:

解决方案1:

# Convert to a dendrogram object
hor.dendro <- as.dendrogram(hr)
# Get values for the first branch
m.1 <- m[unlist(hor.dendro[[1]]),]

解决方案2:

# Cut the tree in 2
tree.cut <- cutree(hr, 2)
# Get the ids for cluster #1
clust.1 <- which(tree.cut==1)
# Get the values from m
m.1 <- m[clust.1,]

以更通用的方式,您可能希望使用其中一个*apply功能。

例如:

clusters <- lapply(unique(tree.cut), function(grp)
       {
       m[which(tree.cut==grp),]
       })

这返回(调用cutree2 个组)

[[1]]
  1 2 3 4 5
A 0 0 0 0 1
B 0 0 0 1 1
C 0 0 1 1 1
D 0 0 1 1 0
I 0 1 1 1 0

[[2]]
  1 2 3 4 5
E 1 0 0 0 0
F 1 1 1 0 0
G 0 1 1 0 0
H 0 1 1 0 0
J 1 1 1 0 1

您可以使用[[ ]]运算符访问结果,例如:clusters[[2]]获取第二个集群。

于 2013-11-02T07:29:43.230 回答