0

经过一天的谷歌搜索,我决定最好在这里问这个问题。

所以实验是我有来自 3 名患者的大量 RNA seq 数据:A、B、C。他们的 RNA seq 数据是针对预处理、治疗周期 1、治疗周期 2、治疗周期 3 获得的。

所以我总共有 12 个批量 RNA seq 样本:

  • A.PreTreat -> A.Cycle1 -> A.Cycle2 -> A.Cycle3

  • B.PreTreat -> B.Cycle1 -> B.Cycle2 -> B.Cycle3

  • C.PreTreat -> C.Cycle1 -> C.Cycle2 -> C.Cycle3

我想使用 得到不同周期(即周期 3 到预处理,周期 3 到周期 2)之间的差异基因列表model.matrix(), lmFit(), makeContrasts(), contrasts.fit(), eBayes(),所有这些都在 limma 包中。

这是我的最小工作示例。

library(limma)
# Already normalized expression set: rows are genes, columns are the 12 samples
normalized_expression <- matrix(data=sample(1:100), nrow=10, ncol=12)
colnames(normalized_expression) <- c("A.PreTreat", "A.Cycle1", "A.Cycle2", "A.Cycle3", "B.PreTreat", "B.Cycle1", "B.Cycle2", "B.Cycle3", "C.PreTreat", "C.Cycle1", "C.Cycle2", "C.Cycle3")

patient_and_treatment <- factor(colnames(normalized_expression), levels = colnames(normalized_expression))

design.matrix <- model.matrix(~0 + patient_and_treatment)
colnames(design.matrix) <- patient_and_treatment
fit <- lmFit(normalized_expression, design.matrix)

# I want to get a contrast matrix to get differential genes between cycle 3 treatment and pre-treatment in all patients
contrast.matrix <- makeContrasts("A.Cycle3+B.Cycle3+C.Cycle3-A.PreTreat-B.PreTreat-C.PreTreat",
                                 levels = levels(patient_and_treatment))

# Outputs Error of no residual degree of freedom
fit2 <- eBayes( contrasts.fit( fit, contrast.matrix ) )

# Want to run but cannot
summary(decideTests(fit2))

到目前为止,我被困在没有残留的自由度错误上。

我什至不确定这是否是 limma 统计上正确的方法来解决我在所有患者的第 3 周期治疗与预处理之间获取差异基因列表的问题。

任何帮助将不胜感激。

谢谢!

4

1 回答 1

1

每组不能有 1 个观察值,这使得回归毫无意义,因为您将每个数据点拟合到自身。

简而言之,您正在寻找的是在所有患者中观察到的共同效果,例如 Cycle3 与 PreTreat 相比,等等,这样设置模型:

library(limma)

metadata = data.frame(
Patient=gsub("[.][^ ]*","",colnames(normalized_expression)),
Treatment=gsub("^[A-Z][.]*","",colnames(normalized_expression))
)

   Patient    Treatment
1        A PreTreat
2        A   Cycle1
3        A   Cycle2
4        A   Cycle3
5        B PreTreat
6        B   Cycle1
7        B   Cycle2
8        B   Cycle3
9        C PreTreat
10       C   Cycle1
11       C   Cycle2
12       C   Cycle3

现在指定模型矩阵,Patient 项是为了说明患者之间起始水平的差异:

design.matrix <- model.matrix(~0 + Treatment+Patient,data=metadata)
fit <- lmFit(normalized_expression, design.matrix)
contrast.matrix <- makeContrasts(TreatmentCycle3-TreatmentPreTreat,
TreatmentCycle1-TreatmentPreTreat,levels=design.matrix)
fit2 = contrasts.fit(fit, contrast.matrix)
fit2 = eBayes(fit2)

您可以检查系数是否为您提供了您想要的:

fit2$coefficients
       Contrasts
        TreatmentCycle3 - TreatmentPreTreat
   [1,]                           -3.666667
   [2,]                          -13.666667
   [3,]                            1.666667
   [4,]                          -40.666667
   [5,]                           12.000000
   [6,]                          -46.000000
   [7,]                          -32.000000
   [8,]                            4.666667
   [9,]                           11.333333
  [10,]                            5.666667
       Contrasts
        TreatmentCycle1 - TreatmentPreTreat
   [1,]                           -11.33333
   [2,]                           -19.33333
   [3,]                           -27.33333
   [4,]                           -42.33333
   [5,]                            27.33333
   [6,]                           -32.66667
   [7,]                           -33.00000
   [8,]                           -30.66667
   [9,]                            46.00000
  [10,]                            17.33333
于 2020-02-28T09:48:22.617 回答