我有用于时间过程实验(6 个时间点)的 RNAseq 数据,涉及数万个基因。我已经使用 Tidyverse 上的 Filter 程序来查找符合某些标准的基因(qPCR 的参考基因),但我不知道如何轻松地将这些数据制作成图形。现在,我必须完全改变数据集的格式,但这需要很长时间才不切实际。
目标只是为每个基因绘制一个图表,显示每个条件下表达随时间的变化(不同的叶对和干旱/充分浇水)。我已经在 Excel 中为某些人完成了此操作,但想要一种更快的方法来完成此操作。
数据集是这样设置的:
[1] "gene.id" "LP1.2.02:00.WW" "LP1.2.02:00.WW_1" "LP1.2.02:00.WW_2"
[5] "LP1.2.06:00.WW" "LP1.2.06:00.WW_1" "LP1.2.06:00.WW_2" "LP1.2.10:00.WW"
[9] "LP1.2.10:00.WW_1" "LP1.2.10:00.WW_2" "LP1.2.14:00.WW" "LP1.2.14:00.WW_1"
[13] "LP1.2.14:00.WW_2" "LP1.2.18:00.WW" "LP1.2.18:00.WW_1" "LP1.2.18:00.WW_2"
[17] "LP1.2.22:00.WW" "LP1.2.22:00.WW_1" "LP1.2.22:00.WW_2" "LP3.4.5.02:00.WW"
[21] "LP3.4.5.02:00.WW_1" "LP3.4.5.02:00.WW_2" "LP3.4.5.06:00.WW" "LP3.4.5.06:00.WW_1"
[25] "LP3.4.5.06:00.WW_2" "LP3.4.5.10:00.WW" "LP3.4.5.10:00.WW_1" "LP3.4.5.10:00.WW_2"
[29] "LP3.4.5.14:00.WW" "LP3.4.5.14:00.WW_1" "LP3.4.5.14:00.WW_2" "LP3.4.5.18:00.WW"
[33] "LP3.4.5.18:00.WW_1" "LP3.4.5.18:00.WW_2" "LP3.4.5.22:00.WW" "LP3.4.5.22:00.WW_1"
[37] "LP3.4.5.22:00.WW_2" "LP1.2.02:00.Drought" "LP1.2.02:00.Drought_1" "LP1.2.02:00.Drought_2"
[41] "LP1.2.06:00.Drought" "LP1.2.06:00.Drought_1" "LP1.2.06:00.Drought_2" "LP1.2.10:00.Drought"
[45] "LP1.2.10:00.Drought_1" "LP1.2.10:00.Drought_2" "LP1.2.14:00.Drought" "LP1.2.14:00.Drought_1"
[49] "LP1.2.14:00.Drought_2" "LP1.2.18:00.Drought" "LP1.2.18:00.Drought_1" "LP1.2.18:00.Drought_2"
[53] "LP1.2.22:00.Drought" "LP1.2.22:00.Drought_1" "LP1.2.22:00.Drought_2" "LP3.4.5.02:00.Drought"
[57] "LP3.4.5.02:00.Drought_1" "LP3.4.5.02:00.Drought_2" "LP3.4.5.06:00.Drought" "LP3.4.5.06:00.Drought_1"
[61] "LP3.4.5.06:00.Drought_2" "LP3.4.5.10:00.Drought" "LP3.4.5.10:00.Drought_1" "LP3.4.5.10:00.Drought_2"
[65] "LP3.4.5.14:00.Drought" "LP3.4.5.14:00.Drought_1" "LP3.4.5.14:00.Drought_2" "LP3.4.5.18:00.Drought"
[69] "LP3.4.5.18:00.Drought_1" "LP3.4.5.18:00.Drought_2" "LP3.4.5.22:00.Drought." "LP3.4.5.22:00.Drought"
[73] "LP3.4.5.22:00.Drought_1" "X74" "LP1.2.02:00.WW.mean" "LP1.2.06:00.WW.mean"
[77] "LP1.2.10:00.WW.mean" "LP1.2.14:00.WW.mean" "LP1.2.18:00.WW.mean" "LP1.2.22:00.WW.mean"
[81] "LP1.2.02:00.drought.mean" "LP1.2.06:00.drought.mean" "LP1.2.10:00.drought.mean" "LP1.2.14:00.drought.mean"
[85] "LP1.2.18:00.drought.mean" "LP1.2.22:00.drought.mean" "LP3.4.5.02:00.WW.mean" "LP3.4.5.06:00.WW.mean"
[89] "LP3.4.5.10:00.WW.mean" "LP3.4.5.14:00.WW.mean" "LP3.4.5.18:00.WW.mean" "LP3.4.5.22:00.WW.mean"
[93] "LP3.4.5.02:00.drought.mean" "LP3.4.5.06:00.drought.mean" "LP3.4.5.10:00.drought.mean" "LP3.4.5.14:00.drought.mean"
[97] "LP3.4.5.18:00.drought.mean" "LP3.4.5.22:00.drought.mean"
它有很多标题,从标题中可以看出,它们包含时间、叶子对和条件。所以,我不确定如何将其转换为 x~y 图。
我有几个想法,包括尝试将条件划分为不同的子集(LP1.2. WW/LP.1.2.D/LP3.4.5.WW/LP.3.4.5.D)并为时间制作一个子集(02: 00、06:00 等)并尝试为此制作图表。
#make subset for the time points
Time <- c("02:00", "06:00", "10:00", "14:00", "18:00", "22:00")
#make subsets for each condition (LP1.2. WW/ LP.1.2.D/LP3.4.5.WW/LP.3.4.5.D)
LP1.2.WW.mean <- as.matrix(KG_graph_data[c( "LP1.2.02:00.WW.mean",
"LP1.2.06:00.WW.mean",
"LP1.2.10:00.WW.mean",
"LP1.2.14:00.WW.mean",
"LP1.2.18:00.WW.mean",
"LP1.2.22:00.WW.mean",
"gene.id")])
LP.1.2.D.mean <-
as.matrix(KG_graph_data[c("LP1.2.02:00.drought.mean",
"LP1.2.06:00.drought.mean",
"LP1.2.10:00.drought.mean",
"LP1.2.14:00.drought.mean",
"LP1.2.18:00.drought.mean",
"LP1.2.22:00.drought.mean",
"gene.id")])
LP345.WW.mean <- as.matrix((KG_graph_data[c("LP3.4.5.02:00.WW.mean",
"LP3.4.5.06:00.WW.mean",
"LP3.4.5.10:00.WW.mean",
"LP3.4.5.14:00.WW.mean",
"LP3.4.5.18:00.WW.mean",
"LP3.4.5.22:00.WW.mean",
"gene.id")]))
LP345.D.mean <-
as.matrix(KG_graph_data[c("LP3.4.5.02:00.drought.mean",
"LP3.4.5.06:00.drought.mean",
"LP3.4.5.10:00.drought.mean",
"LP3.4.5.14:00.drought.mean",
"LP3.4.5.18:00.drought.mean",
"LP3.4.5.22:00.drought.mean",
"gene.id")])
我尝试从每个矩阵中提取一个特定的基因,然后可能从中绘制一个图表,但它仅在它来自一个矩阵时才有效,即使那样,该表也不包含数据。
Total_KgGene007565 <- subset(LP1.2.WW.mean, "gene.id"=="KgGene007565",
LP.1.2.D.mean, "gene.id"=="KgGene007565",
LP345.WW.mean, "gene.id"=="KgGene007565",
LP345.D.mean, "gene.id"="KgGene007565")
我不确定如何从这里开始,或者这是否是解决此问题的错误方法。