我想计算一个主方差分量分析,它结合了 PCA 和混合效应线性模型来识别批次效应。(https://www.niehs.nih.gov/research/resources/software/biostatistics/pvca/index.cfm)我已经检查了该网站的 R 脚本(PCVA 也可作为具有几乎相同代码的 R 包提供) 并且可以按照这些步骤进行操作,直到拟合出线性混合效应模型。计算 PCA 后,选择 n 个分量,并为每个模型单独拟合。我附上了第一个comp的数据。
第一个组件的数据如下所示(列:pc_data_matrix = eigenvectors):
import pandas as pd
data = pd.DataFrame(
[x.split("\t") for x in """columnNames treat time batch sample compIdx pc_data_matrix
0 _23_1CON 0 0 1 1 0 0.296379
1 _23_2CON 0 7 1 2 0 -0.310216
2 _23_3NO0 1 0 1 3 0 0.386527
3 _23_4NO7 1 7 1 4 0 -0.147179
4 _26_1CON 0 0 2 5 0 0.253927
5 _26_2CON 0 7 2 6 0 -0.299453
6 _26_3NO0 1 0 2 7 0 0.380901
7 _26_4NO7 1 7 2 8 0 0.065552
8 CONTROL7 0 7 3 9 0 -0.445159
9 CONTROLZ 0 0 3 10 0 0.188493
10 NO_7_5 1 7 3 11 0 -0.290607
11 NO_ZERO 1 0 3 12 0 0.152242
""".split("\n")])
print(data)
然后在 R 中,模型拟合是这样完成的:
randomEffects <- lmer(pc_data_matrix ~ (1|Time) + (1|Treatment) + (1|Batch) + (1|Time:Treatment) + (1|Time:Batch) + (1|Treatment:Batch), Data, REML = TRUE, verbose = FALSE, na.action = na.omit)))
print(summary(randomEffects))
打印出来
Linear mixed model fit by REML ['lmerMod']
Formula: pc_data_matrix ~ (1 | Time) + (1 | Treatment) + (1 | Batch) + (1 | Time:Treatment) + (1 | Time:Batch) + (1 | Treatment:Batch)
Data: Data
REML criterion at convergence: -12.1
Scaled residuals:
Min 1Q Median 3Q Max
-0.6753 -0.3790 -0.1755 0.2801 1.1945
Random effects:
Groups Name Variance Std.Dev.
Treatment:Batch (Intercept) 0.003572 0.05977
Time:Batch (Intercept) 0.001616 0.04020
Time:Treatment (Intercept) 0.006390 0.07994
Batch (Intercept) 0.007885 0.08880
Treatment (Intercept) 0.005668 0.07528
Time (Intercept) 0.128193 0.35804
Residual 0.001809 0.04253
Number of obs: 12, groups: Treatment:Batch, 6; Time:Batch, 6; Time:Treatment, 4; Batch, 3; Treatment, 2; Time, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.01928 0.26865 0.072
然后在python中,我目前正在statsmodels中执行此操作,但我知道这是不正确的,因为我缺少交互(治疗:时间)(时间:批处理)(治疗:批处理)。
import statsmodels.api as sm
import statsmodels.formula.api as smf
#define variance of each random effect as variance component
vcf = {"treat": "treat", "time": "time", "batch": "batch"}
md = smf.mixedlm("pc ~ 1",
data = data,
vc_formula=vcf,
groups=data["compIdx"]) #complete data is a single group
mdf = md.fit()
print(mdf.summary())
谢谢