2

有没有一种简单的方法可以在根处合并多个 hclust 对象(或树状图)?

我已经使示例尽可能完整以说明我的问题。

假设我想按地区对 USArrests 进行聚类,然后将所有 hclust 对象联合起来,在热图中将它们绘制在一起。

USArrests
Northeast <- c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", 
"Vermont", "New Jersey", "New York", "Pennsylvania")
Midwest <-  c("Illinois", "Indiana", "Michigan", "Ohio",  "Wisconsin", 
    "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", 
    "South Dakota")
South <- c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", 
           "South Carolina", "Virginia", "West Virginia", 
           "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", 
           "Louisiana", "Oklahoma", "Texas")
West <- c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", 
          "Utah", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", "Washington")

h1 <- hclust(dist(USArrests[Northeast,]))
h2 <- hclust(dist(USArrests[Midwest,]))
h3 <- hclust(dist(USArrests[South,]))
h4 <- hclust(dist(USArrests[West,]))

现在我有 4 个 hclust 对象(h1 到 h4)。我通常像这样合并它们:

hc <- as.hclust(merge(merge(merge(
    as.dendrogram(h1), as.dendrogram(h2)), as.dendrogram(h3)), 
    as.dendrogram(h4)))

然后,要绘制它们,我必须根据 hclust 对象重新排序矩阵,然后绘制(我添加了一些注释以使绘图更清晰):

usarr <- USArrests[c(Northeast, Midwest, South, West),]

region_annotation <- data.frame(Region = c(rep("Northeast", length(Northeast)), 
                                rep("Midwest", length(Midwest)),
                                rep("South", length(South)),
                                rep("West", length(West))),
                                row.names = c(Northeast, Midwest, South, West))

pheatmap(usarr, cluster_rows = hc, 
         annotation_row = region_annotation)

热图结果,为了美观,还有一些额外的图形参数

总而言之:有没有比合并所有单独的 hclust 更简单的方法?

4

2 回答 2

2

要创建合并hclust对象,您可以<<-在使用new.env.

可能有其他方法可以一次创建两个合并对象,而无需使用<<-. 希望有人可以照亮它。

我尝试使用do.call('merge', list( dendrograms of h1, h2, h3, h4 ). 但它不起作用,因为hclust顶部需要两个分支而不是 4 个分支。

代码:

library('pheatmap')
myenv <- new.env()
myenv$hc <- as.dendrogram( hclust( dist(USArrests[Northeast,])))
invisible( lapply( list( Midwest, South, West), function(x){
  myenv$hc <<- merge( myenv$hc, as.dendrogram( hclust( dist( USArrests[ x, ]) )) )
  NULL
} ) )
myenv$hc <- as.hclust(myenv$hc)

图形:

pheatmap(usarr, cluster_rows = myenv$hc, 
         annotation_row = region_annotation)

在此处输入图像描述

于 2018-03-28T19:56:24.510 回答
2

我最终制作了几个函数来更自动地执行此操作。(在我的版本中,我还添加了对相关“距离”的支持,所以它有点大)

hclust_semisupervised <- function(data, groups, dist_method = "euclidean",
                                  dist_p = 2, hclust_method = "complete") {
    hclist <- lapply(groups, function (group) {
        hclust(dist(data[group,], method = dist_method, p = dist_p), method = hclust_method)
    })
    hc <- .merge_hclust(hclist)
    data_reordered <- data[unlist(groups),]

    return(list(data = data_reordered, hclust = hc))
}

.merge_hclust <- function(hclist) {
    #-- Merge
    d <- as.dendrogram(hclist[[1]])
    for (i in 2:length(hclist)) {
        d <- merge(d, as.dendrogram(hclist[[i]]))
    }
    as.hclust(d)
}

有了 USArrests 和区域向量,我hclust_semisupervised这样称呼:

semi_hc <- hclust_semisupervised(USArrests, list(Northeast, Midwest, South, West)

现在绘制热图:

pheatmap(semi_hc$data, cluster_rows = semi_hc$hclust, 
         annotation_row = region_annotation)
于 2018-03-29T15:35:56.350 回答