我建议您使用dendextend包中的cutree
功能。它包括一个树状图方法(即:)。dendextend:::cutree.dendrogram
您可以从它的介绍性小插图中了解有关该包的更多信息。
我应该补充一点,虽然您的函数 ( ) 很好,但使用from dendextendclassify
有几个优点:cutree
它还允许您使用特定k
的(簇数),而不仅仅是h
(特定的高度)。
这与您从 ctree 在 hclust 上获得的结果一致(classify
不会)。
它通常会更快。
以下是使用代码的示例:
# Toy data:
hc <- hclust(dist(USArrests), "ave")
dend1 <- as.dendrogram(hc)
# Get the package:
install.packages("dendextend")
library(dendextend)
# Get the package:
cutree(dend1,h=70) # it now works on a dendrogram
# It is like using:
dendextend:::cutree.dendrogram(dend1,h=70)
顺便说一句,在这个功能的基础上,dendextend允许用户做更多很酷的事情,比如基于切割树状图的颜色分支/标签:
dend1 <- color_branches(dend1, k = 4)
dend1 <- color_labels(dend1, k = 5)
plot(dend1)

最后,这里有更多代码来展示我的其他观点:
# This would also work with k:
cutree(dend1,k=4)
# and would give identical result as cutree on hclust:
identical(cutree(hc,h=70) , cutree(dend1,h=70) )
# TRUE
# But this is not the case for classify:
identical(classify(dend1,70) , cutree(dend1,h=70) )
# FALSE
install.packages("microbenchmark")
require(microbenchmark)
microbenchmark(classify = classify(dend1,70),
cutree = cutree(dend1,h=70) )
# Unit: milliseconds
# expr min lq median uq max neval
# classify 9.70135 9.94604 10.25400 10.87552 80.82032 100
# cutree 37.24264 37.97642 39.23095 43.21233 141.13880 100
# 4 times faster for this tree (it will be more for larger trees)
# Although (if to be exact about it) if I force cutree.dendrogram to not go through hclust (which can happen for "weird" trees), the speed will remain similar:
microbenchmark(classify = classify(dend1,70),
cutree = cutree(dend1,h=70, try_cutree_hclust = FALSE) )
# Unit: milliseconds
# expr min lq median uq max neval
# classify 9.683433 9.819776 9.972077 10.48497 29.73285 100
# cutree 10.275839 10.419181 10.540126 10.66863 16.54034 100
如果您正在考虑改进此功能的方法,请在此处进行修补:
https://github.com/talgalili/dendextend/blob/master/R/cutree.dendrogram.R
我希望您或其他人会发现此答案很有帮助。