0

我想计算一个主方差分量分析,它结合了 PCA 和混合效应线性模型来识别批次效应。(https://www.niehs.nih.gov/research/resources/software/biostatistics/pvca/index.cfm)我已经检查了该网站的 R 脚本(PCVA 也可作为具有几乎相同代码的 R 包提供) 并且可以按照这些步骤进行操作,直到拟合出线性混合效应模型。计算 PCA 后,选择 n 个分量,并为每个模型单独拟合。我附上了第一个comp的数据。

第一个组件的数据如下所示(列:pc_data_matrix = eigenvectors):

import pandas as pd 
data = pd.DataFrame(
    [x.split("\t") for x in """columnNames  treat  time  batch  sample  compIdx        pc_data_matrix
0     _23_1CON      0     0      1       1        0  0.296379
1     _23_2CON      0     7      1       2        0 -0.310216
2     _23_3NO0      1     0      1       3        0  0.386527
3     _23_4NO7      1     7      1       4        0 -0.147179
4     _26_1CON      0     0      2       5        0  0.253927
5     _26_2CON      0     7      2       6        0 -0.299453
6     _26_3NO0      1     0      2       7        0  0.380901
7     _26_4NO7      1     7      2       8        0  0.065552
8     CONTROL7      0     7      3       9        0 -0.445159
9     CONTROLZ      0     0      3      10        0  0.188493
10      NO_7_5      1     7      3      11        0 -0.290607
11     NO_ZERO      1     0      3      12        0  0.152242
""".split("\n")])
print(data)

然后在 R 中,模型拟合是这样完成的:

    randomEffects <- lmer(pc_data_matrix ~ (1|Time) + (1|Treatment) + (1|Batch) + (1|Time:Treatment) + (1|Time:Batch) + (1|Treatment:Batch), Data, REML = TRUE, verbose = FALSE, na.action = na.omit)))
  print(summary(randomEffects))

打印出来

    Linear mixed model fit by REML ['lmerMod']
Formula: pc_data_matrix ~ (1 | Time) + (1 | Treatment) + (1 | Batch) +      (1 | Time:Treatment) + (1 | Time:Batch) + (1 | Treatment:Batch)
   Data: Data

REML criterion at convergence: -12.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.6753 -0.3790 -0.1755  0.2801  1.1945 

Random effects:
 Groups          Name        Variance Std.Dev.
 Treatment:Batch (Intercept) 0.003572 0.05977 
 Time:Batch      (Intercept) 0.001616 0.04020 
 Time:Treatment  (Intercept) 0.006390 0.07994 
 Batch           (Intercept) 0.007885 0.08880 
 Treatment       (Intercept) 0.005668 0.07528 
 Time            (Intercept) 0.128193 0.35804 
 Residual                    0.001809 0.04253 
Number of obs: 12, groups:  Treatment:Batch, 6; Time:Batch, 6; Time:Treatment, 4; Batch, 3; Treatment, 2; Time, 2

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.01928    0.26865   0.072

然后在python中,我目前正在statsmodels中执行此操作,但我知道这是不正确的,因为我缺少交互(治疗:时间)(时间:批处理)(治疗:批处理)。

import statsmodels.api as sm
import statsmodels.formula.api as smf
#define variance of each random effect as variance component
vcf = {"treat": "treat", "time": "time", "batch": "batch"}
md = smf.mixedlm("pc ~ 1", 
          data = data, 
          vc_formula=vcf,
          groups=data["compIdx"]) #complete data is a single group
            
mdf = md.fit()
print(mdf.summary())

谢谢

4

0 回答 0