0

I've read a lot of Q&As on sorting the y-axis of a heatmap made in ggplot2 and thus feel bad writing yet another, but I cannot seem to achieve what I desire. (This probably is because I am new to R and am only just beginning to get a handle on the terminology and how things work.) Thanks in advance for any help!

I am trying to generate a heatmap for a gene enrichment analysis. My data is imported as a .csv file in this form: Gene Category Description Variable1 Variable2 Variable3. So each row lists a gene, the category the gene falls into (there are multiple genes in each category), a description of the gene category, a numerical value associated with each of the samples (3 columns, a value for each sample).

What I would like to do is order the y-axis by the Category while plotting the value by the Gene. (And a way to label this would be fantastic!) Below is the code I have thus far.... it appears to order the y-axis alphabetically.

library(ggplot2)
library(reshape2)
GO_sum <- read.csv("~/R/FuncEnr/GO_sum.csv", header=T)
GO_sum.m <- melt(GO_sum, id = c("Gene", "Category", "Description"), na.rm = FALSE)


(GOplot <- ggplot(GO_sum.m, aes(variable, Gene)) + 
    geom_tile(aes(fill = value), colour = "white") + 
    scale_fill_gradient2(low = "darkred", high = "darkblue", guide="colorbar"))

Thank you!

Here is some example data (copy and paste, save as .csv):

Gene    Category    Description s1  s2  s3
G0001   GO:0000036  acyl carrier activity   -1.357472549    -1.357472549    -0.703587499
G0002   GO:0000103  sulfate assimilation    0   -0.761925294    -1.772268589
G0003   GO:0000104  succiNAte dehydrogeNAse activity    -1.192800096    -1.192800096    -1.192800096
G0014   GO:0000160  two-component sigNAl transduction system (phosphorelay) 0   -1.772268589    -1.192800096
G0005   GO:0000287  magnesium ion binding   -1.772268589    -1.772268589    -1.192800096
G0006   GO:0000287  magnesium ion binding   -1.192800096    -1.192800096    -1.164082367
G0007   GO:0000287  magnesium ion binding   -1.132072566    -1.772268589    -1.772268589
G0008   GO:0000287  magnesium ion binding   -1.452170577    0   -1.192800096
G0009   GO:0000287  magnesium ion binding   0   -1.772268589    -1.192800096
G0083   GO:0003676  nucleic acid binding    -1.192800096    -1.192800096    -1.772268589
G0044   GO:0003676  nucleic acid binding    -0.587905946    -0.363837338    -0.843984355
G0045   GO:0003676  nucleic acid binding    0.212339083 0.212339083 0.276358685
G0046   GO:0003676  nucleic acid binding    -0.374137972    -0.761925294    -0.761925294
G0147   GO:0003677  DNA binding 0   0   0
G0048   GO:0003677  DNA binding -1.192800096    0   -1.192800096
G0049   GO:0003677  DNA binding 0.530699113 -0.340270054    -0.485584696
G0050   GO:0003677  DNA binding -1.192800096    -0.374137972    -0.374137972
4

1 回答 1

1

我建议你用显示基因类别的facet_grid()列来划分你的情节。Description使用参数scales="free_y"space="free_y"您可以确保每个方面的图块大小相同。只有您应该使用较短的名称,Description因为长名称不适合。

ggplot(GO_sum.m, aes(variable, Gene)) + 
   geom_tile(aes(fill = value), colour = "white") + 
   scale_fill_gradient2(low = "darkred", high = "darkblue", guide="colorbar")+
   facet_grid(Description~.,scales="free_y",space="free_y")
于 2013-06-10T07:49:08.897 回答