Objective: Create a dendrogram with branches colored by a factor variable with the resultant plot containing a legend to translate the branch colors to each factor variable value.
I have some data with factor variables followed by the numeric data that I am creating a dendrogram with:
> cleaned_mayo[1:5,1:20]
patient Source Tissue RIN Diagnosis Gender AgeAtDeath ApoE FLOWCELL PMI N_unmapped N_multimapping N_noFeature N_ambiguous ENSG00000223972
1924_TCX 1924_TCX MayoBrainBank_Dickson TemporalCortex 5.6 Control F 90_or_above 33 AC5R6PACXX 2 2773880 9656114 8225967 2876479 1
1926_TCX 1926_TCX MayoBrainBank_Dickson TemporalCortex 7.8 Control F 88 33 AC44HKACXX 2 2279283 12410116 9503353 3600252 2
1935_TCX 1935_TCX MayoBrainBank_Dickson TemporalCortex 8.6 Control F 88 33 AC5T2GACXX 3 3120169 8650081 9640468 4603751 0
1925_TCX 1925_TCX MayoBrainBank_Dickson TemporalCortex 6.6 Control F 89 33 BC6178ACXX 4 2046886 10627577 7533671 3361385 1
1963_TCX 1963_TCX MayoBrainBank_Dickson TemporalCortex 9.7 Control M 90_or_above 33 AC5T1WACXX 4 1810116 9611375 5343437 2983079 2
ENSG00000227232 ENSG00000278267 ENSG00000243485 ENSG00000274890 ENSG00000237613
1924_TCX 80 7 1 0 0
1926_TCX 113 22 9 0 0
1935_TCX 181 21 2 0 0
1925_TCX 75 9 5 0 0
1963_TCX 73 14 1 0 0
The data dimensions are: 161 x 60,739. With this data I have achieved a dendrogram with colored branches but no legend, and a dendrogram with colored labels (not branches) with a legend. I would like to combine the two.
Create dendrogram with colored branches but no legend:
# Create the dendrogram for visualization
dend_expr<- cleaned_mayo[,15:60739] %>% # Isolate expression data
scale %>% # Normalize
dist %>% # Compute distance measure
hclust %>% # Cluster hierarchically
as.dendrogram()
# Arrange labels in order with tree
tree_labels<- cleaned_mayo[order.dendrogram(dend_expr),]
# Color branches by diagnosis
dend_expr<- assign_values_to_leaves_edgePar(dend_expr, value= tree_labels$Diagnosis, edgePar= "col") %>%
as.ggdend()
# Plot dendrogram
ggplot(dend_expr, horiz= T, theme= NULL, labels= F) +
ggtitle("Mayo Cohort: Hierarchical Clustering of Patients Colored by Diagnosis")
Create dendrogram with colored labels (not branches) and a legend:
# Create the dendrogram for visualization
dend_expr<- cleaned_mayo[,15:60739] %>% # Isolate expression data
scale %>% # Normalize
dist %>% # Compute distance measure
hclust %>% # Cluster hierarchically
as.dendrogram()
tree_labels<- dendro_data(dend_expr, type = "rectangle")
tree_labels$labels<- merge(x= tree_labels$labels, y= cleaned_mayo, by.x= "label", by.y= "patient")
ggplot() +
geom_segment(data=segment(tree_labels), aes(x=x, y=y, xend=xend, yend=yend)) +
geom_text(data = label(tree_labels), aes(x=x, y=y, label=label, colour = Diagnosis, hjust=0), size=3) +
#geom_point(data = label(tree_labels), aes(x=x, y=y), size=2, shape = 21) +
coord_flip() +
scale_y_reverse(expand=c(0.2, 0)) +
scale_colour_brewer(palette = "Dark2") +
theme_dendro() +
ggtitle("Mayo Cohort: Hierarchical Clustering of Patients Colored by Diagnosis")
Examples of the respective outputs: Colored branches; Colored labels with legend.
Any help is appreciated. Thanks!