我正在比较来自三种不同组织“肝脏”、“肾脏”和“大脑”的动物的三个年龄“新生儿”、“四岁”和“二十岁”的 RNA-seq 数据。我的colata如下所示。我成功地运行了 DESeq2 工具来分析差异表达的基因。但是,当我使用“plotCounts”和“ggplot2”绘制具有最小 padj 值的差异表达基因时,三个组织之一的基因被单独绘制,两个一起绘制。我无法弄清楚我哪里出错了。如果有人可以查看我的脚本,请建议我将所有样本绘制在一起。提前感谢您的宝贵时间。
pasCts <- "C:/Users/krishna/Desktop/project/featurecounts_9samples/countmatrix.Rmatrix.txt"
pasAnno <- "C:/Users/krishna/Desktop/project/featurecounts_9samples/featurecounts_test.csv"
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="Geneid"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","tissue")]
coldata$tissue <- factor(coldata$tissue)
coldata$condition <- factor(coldata$condition)
OUTPUT:
> coldata
condition tissue
SRR306394 "NB" "Liver"
SRR306395 "four" "Liver"
SRR306396 "twenty" "Liver"
SRR306397 "NB" "Kidney"
SRR306398 "four" "Kidney"
SRR306399 "twenty" "Kidney"
SRR306400 "NB" "Brain"
SRR306401 "four" "Brain"
SRR306402 "twenty" "Brain"
##使coldata的行和矩阵(cts)的列的顺序相同:
all(rownames(coldata) %in% colnames(cts))
all(rownames(coldata) == colnames(cts)) ### to check if the order is even
cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
矩阵的输出(cts):
OUTPUT:
> head(cts)
SRR306394 SRR306395 SRR306396 SRR306397 SRR306398 SRR306399 SRR306400 SRR306401 SRR306402
ENSMUSG00000102693 0 0 0 0 0 0 0 0 0
ENSMUSG00000064842 0 0 0 0 0 0 0 0 0
ENSMUSG00000051951 1 0 0 1 0 0 62 32 22
为数据创建 Deseq2 矩阵对象:
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition)
dds
预过滤 - 在这里我们删除读取计数非常低的行。
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
设置因子
dds$tissue <- factor(dds$tissue, levels = c("liver", "kidney", "brain"))
运行差异表达分析
dds <- DESeq(dds)
res <- results(dds)
res
要获得构建结果表的系数: resultsNames(dds) OUTPUT: 1 "Intercept" "condition_..NB. vs ..four." “条件_..二十。对..四。” [4] “condition_.four. vs ..four.” “条件_.NB。与..四个。” “条件_.二十。对..四。”
是否有可能只获得一个系数“condition_..NB. vs ..four._vs..twenty”?如果是,我应该使用什么代码?
基于 resultsName(dds) 获得的系数的对数倍数变化收缩:
## FOR COEF= AND FOUR
resLFC_20vs4 <- lfcShrink(dds, coef=3, type="apeglm")
resLFC_20vs4
## FOR COEF= NB AND FOUR
resLFC_NBvs4 <- lfcShrink(dds, coef=2, type="apeglm")
resLFC_NBvs4
要按最小 p 值排序我们的结果表:
resOrdered <- res[order(res$pvalue),]
> resOrdered
> summary(resOrdered) ### to summarize some basic tallies using the "summary" function
了解小于 0.1 的调整后 p 值的数量
sum(res$padj < 0.1, na.rm=TRUE)OUTPUT:2472
res05 <- results(dds, alpha=0.05) ###by default alpha = 0.1 but if adjpvalue is other than 0.1 then specify
res05
summary(res05)
sum(res05$padj < 0.05, na.rm=TRUE) ### to know the number of adj p value less than 0.05
运行上述代码后,我尝试使用 ggplot2 绘制具有 min padj 值的基因:
d <- plotCounts(dds, gene=which.min(res$padj), intgroup = "condition", returnData = TRUE) ##plotting reads with minimun padj value
> dput(d)
structure(list(count = c(23099.7389197999, 19548.8369195126,
17799.941667842, 20473.6092655006, 18165.0693569093, 13919.6719941735,
1008.89639856882, 581.070434144846, 576.594165656907), condition = structure(c(2L,
1L, 3L, 2L, 1L, 3L, 5L, 4L, 6L), .Label = c(" \"four\"", " \"NB\"",
" \"twenty\"", "\"four\"", "\"NB\"", "\"twenty\""), class = "factor")), class = "data.frame", row.names = c("SRR306394",
"SRR306395", "SRR306396", "SRR306397", "SRR306398", "SRR306399",
"SRR306400", "SRR306401", "SRR306402"))
library("ggplot2")
ggplot(d, aes(x=condition, y=count))+
geom_point(position=position_jitter(w=0.1,h=0))+
scale_y_log10(breaks=c(25,100,400))
但是该图显示了两个组织样本的基因一起绘制,而第三个组织分别绘制。该图可以在这里看到: custom plotting using ggplot
谁能建议我正确的代码来获得显示所有组织的所有基因的图表?
感谢您的时间。