0

我正在尝试生成一个model.matrix,如果它存在于一对因子中的任何一个中,它将为分类变量放置虚拟变量。这是一个例子:

group1 <- factor(c("A","A","A","A","B",
                   "B","B","C","C","D"),
                 levels=c("A","B","C","D","E"))

group2 <- factor(c("B","C","D","E","C",
                   "D","E","D","E","E"),
                 levels=levels(group1))

set.seed(8)
val <- rnorm(10,1,.25)
control1 <- rnorm(10,2,.5)

df <- data.frame(group1,
                 group2,
                 val,
                 control1)

这导致 (5*(5-1)/2) 对 (A,B,C,D,E) 有 10 行:

df
   group1 group2       val control1
1       A      B 0.9788535 1.620103
2       A      C 1.2101000 2.146025
3       A      D 0.8841293 2.210699
4       A      E 0.8622912 1.352755
5       B      C 1.1840101 2.034643
6       B      D 0.9730296 1.593481
7       B      E 0.9574277 2.755427
8       C      D 0.7279171 1.864196
9       C      E 0.2472371 2.779127
10      D      E 0.8517064 1.881325

当特定级别位于 group1 或 group2 中时,我想控制线性模型中的固定效应。我可以为此构建一个模型矩阵:

tmp1 <- model.matrix(~ 0+group1,df)
tmp2 <- model.matrix(~ 0+group2,df)

tmp3 <- (tmp1|tmp2)*1

tmp3
   group1A group1B group1C group1D group1E
1        1       1       0       0       0
2        1       0       1       0       0
3        1       0       0       1       0
4        1       0       0       0       1
5        0       1       1       0       0
6        0       1       0       1       0
7        0       1       0       0       1
8        0       0       1       1       0
9        0       0       1       0       1
10       0       0       0       1       1

几个问题:

就其他协变量而言,这样做并没有给我留下很多选择。如何构建由模型矩阵表示的虚拟变量tmp3,然后在调用lm其他协变量时使用它,例如control1

这个想法是,对于个人(A、B、C、D、E)是否在 group1 或 group2 中存在固定的影响。这似乎是一个合理的假设,但我没有找到任何参考资料。我是否遗漏了一些明显的东西,或者这在统计学中有一个共同的名字?

谢谢你的帮助。

4

2 回答 2

1

这是使用 akrun 的想法的解决方案:

group1 <- factor(c("A","A","A","A","B",
                   "B","B","C","C","D"),
                 levels=c("A","B","C","D","E"))

group2 <- factor(c("B","C","D","E","C",
                   "D","E","D","E","E"),
                 levels=levels(group1))

set.seed(8)
val <- rnorm(10,1,.25)
control1 <- rnorm(10,2,.5)

df <- data.frame(group1,
                 group2,
                 val,
                 control1)

tmpval <- as.data.frame(Reduce('|',lapply(df[1:2], function(group) model.matrix(~0+group)))*1)

indf <- cbind(df,tmpval)

mod1 <- lm(val ~ 0+groupA+groupB+groupC+groupD+groupE,
           indf)

summary(mod1)
于 2015-09-17T06:25:40.793 回答
1

我不确定是否model.matrix确实提供了任何选项,但至少在您的示例中,您可以毫不费力地重建您所追求的矩阵。

model_mat <- data.frame(tmp3[,-1], val = df$val, control1 = df$control1)
lm(val ~ ., data = model_mat)

您需要删除其中一个假人,我已经删除了 A 但您当然可以选择其他任何一个作为参考类别。

于 2015-09-17T06:05:06.560 回答