4

我为一个项目做了一个分层集群。我有 20 个变量中的每一个有 300 个观察值。我对所有变量进行了索引,以便每个变量都介于 0 和 1 之间,值越大越好。

我使用以下代码创建了一个聚类图。

d_data <- dist(all_data[,-1])
d_data_ind <- dist(data_ind[,-1])
hc_data_ind <- hclust(d_data_ind, method = "complete")
dend<- as.dendrogram(hc_data_ind)
plot(dend)

现在节点的标签在行名中,数字 1 到 300(见上图)。在分析过程中,我删除了数据框中标有“地理”的第一列(见下图),因为它们是文本中的城市名称,会搞砸分析。但是我真的需要在正确的位置获取集群图上的城市名称,因为我需要根据结果选择城市列表。

我应该写什么代码来将“地理”列中的城市名称插入到这个图中,对应于它们的行名?

从数据框(下图)中可以看出,所有城市名称都按字母顺序排列,整齐地按升序排列,就像行名一样。我确信将城市名称放在情节上并不难,我只是通过谷歌搜索和四处询问找不到它。

在此处输入图像描述 如何更改节点的标签? 现在是数字,但我需要它们是城市。

在此处输入图像描述

4

2 回答 2

3

I think that what you are asking is "how can I decide on the labels in a dendrogram". So this has two parts. For example, let's use the simple data of the numbers c(1,2,5,6)

1) When you create the hclust using dist, it uses the names of the items. And if they don't exist then it uses a running index. For example:

x <- c(1,2,5,6)
d1 <- as.dendrogram(hclust(dist(x)))
plot(d1)

enter image description here

This is obviously a problem since the items we have are 1,2,5,6 and not 1:4! So how can we fix this? One way is update the names. For example:

x <- c(1,2,5,6)
names(x) <- x
x
d2 <- as.dendrogram(hclust(dist(x)))
plot(d2)

enter image description here

I believe this basically solves your problem (and frankly, doesn't require dendextend). But if you want to update the text AFTER creating the dendrogram - read on:

2) The dendextend package allows you to update the labels of a dendrogram. But you need to make sure you are using the correct order (since the order of the original vector, and that of the labels in the tree are not the same!). Here is how it can be done:

if (!require(dendextend)) install.packages(dendextend);
library(dendextend)
x <- c(1,2,5,6)
d3 <- as.dendrogram(hclust(dist(x)))
labels(d3) <- x[order.dendrogram(d3)]
plot(d3)

enter image description here

Here is how we would do it for a more complex data object (where we may not want to play with the row names of the object, but to update the dendrogram):

if (!require(dendextend)) install.packages(dendextend);
library(dendextend)
x <- CO2[,4:5]
d4 <- as.dendrogram(hclust(dist(x)))
labels(d4) <- apply(CO2[,1:3], 1, paste, collapse = "_")[order.dendrogram(d4)]

d4 <- set(d4, "labels_cex", 0.6)
d4 <- color_branches(d4, k = 3)
par(mar = c(3,0,0,6))
plot(d4, horiz = T)

enter image description here

于 2016-04-07T18:56:19.603 回答
2

您想要原始标签而不是 ID?也许这可以帮助您进行分析:

data <- USArrests[1:5, ]
data <- cbind(label=row.names(data), data)
row.names(data) <- NULL
d <- dist(data[, -1])
hc <- hclust(d)
plot(hc)
rect.hclust(hc, h=40)

![在此处输入图像描述

data$label[order.dendrogram(as.dendrogram(hc))]
# [1] "Arkansas"   "Arizona"    "California" "Alabama"    "Alaska"  

clusters <- cutree(hc, h=40)
split(data$label, clusters)
# $`1`
# [1] "Alabama" "Alaska" 
# 
# $`2`
# [1] "Arizona"    "California"
# 
# $`3`
# [1] "Arkansas"

hc$labels <- data$label
plot(hc)

在此处输入图像描述

PS:我发现将树状图保存为 pdf 很有帮助,您可以在其中轻松放大和缩小:pdf("my.pdf"); plot(hc); dev.off().

于 2016-04-06T22:20:47.537 回答