r - 如何在 lm 中指定参数估计之间的关系？

Question

使用 lm，我想拟合模型： y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2

我的问题是：如何指定交互作用的系数应该等于主效应系数的乘积？

我已经看到要将系数设置为特定值，您可以使用 offset() 和 I() 但我不知道如何指定系数之间的关系。

这是一个简单的模拟数据集：

n <- 50 # Sample size
x1 <- rnorm(n, 1:n, 0.5) # Independent variable 1
x2 <- rnorm(n, 1:n, 0.5) # Independent variable 2
b0 <- 1 
b1 <- 0.5
b2 <- 0.2
y <- b0 + b1*x1 + b2*x2 + b1*b2*x1*x2 + rnorm(n,0,0.1)

为了拟合模型 1：y = b0 + b1*x1 + b2*x2 + b3*x1*x2，我会使用：

summary(lm(y~ x1 + x2 + x1:x2))

但是如何拟合模型 2：y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2？

两个模型之间的主要区别之一是要估计的参数数量。在模型 1 中，我们估计 4 个参数：b0（截距）、b1（变量 1 的斜率）、b2（变量 2 的斜率）和 b3（变量 1 和 2 之间相互作用的斜率）。在模型 2 中，我们估计了 3 个参数：b0（截距）、b1（var.1 的斜率和 var.1 和 2 之间相互作用的部分斜率）和 b2（var.2 的斜率和部分斜率） vars. 1 和 2 之间的交互）

我之所以要这样做是因为在调查x1和x2之间是否存在显着交互时，模型2，y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2，可以更好比 y = b0 + b1*x1 + b2*x2 的空模型。

非常感谢！

玛丽

score 5 · Accepted Answer

Brian provides a way to fit the constrained model you specify but if you're interested in if the unconstrained model fits better than your constrained model you use the delta method to test that hypothesis.

# Let's make some fake data where the constrained model is true
n <- 100
b0 <- 2
b1 <- .2
b2 <- -1.3
b3 <- b1 * b2
sigma <- 1

x1 <- rnorm(n)
# make x1 and x2 correlated for giggles
x2 <- x1 + rnorm(n) 
# Generate data according to the model
y <- b0 + b1*x1 + b2*x2 + b3*x1*x2 + rnorm(n, 0, sigma)

# Fit full model y = b0 + b1*x1 + b2*x3 + b3*x1*x2 + error
o <- lm(y ~ x1 + x2 + x1:x2)

# If we want to do a hypothesis test of Ho: b3 = b1*b2
# this is the same as Ho: b3 - b1*b2 = 0
library(msm)
# Get estimate of the difference specified in the null
est <- unname(coef(o)["x1:x2"] - coef(o)["x1"] * coef(o)["x2"])
# Use the delta method to get a standard error for
# this difference
standerr <- deltamethod(~ x4 - x3*x2, coef(o), vcov(o))

# Calculate a test statistic.  We're relying on asymptotic
# arguments here so hopefully we have a decent sample size
z <- est/standerr
# Calculate p-value
pval <- 2 * pnorm(-abs(z))
pval

I explain what the delta method is used for and more on how to use it in R in this blog post.

Expanding on Brian's answer you could alternatively do this by comparing the full model to the constrained model - however you have to use nls to fit the full model to be able to easily compare the models.

o2 <- nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=0, b1=1, b2=1))
o3 <- nls(y ~ b0 + b1*x1 + b2*x2 + b3*x1*x2, start = list(b0 = 0, b1 = 1, b2 = 1, b3 = 1))
anova(o2, o3)

score 5 · Accepted Answer

由于您对系数施加的约束，您指定的模型不是线性模型，因此lm不能用于拟合它。您将需要使用非线性回归，例如nls.

> summary(nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=0, b1=1, b2=1)))

Formula: y ~ b0 + b1 * x1 + b2 * x2 + b1 * b2 * x1 * x2

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
b0 0.987203   0.049713   19.86   <2e-16 ***
b1 0.494438   0.007803   63.37   <2e-16 ***
b2 0.202396   0.003359   60.25   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1121 on 47 degrees of freedom

Number of iterations to convergence: 5 
Achieved convergence tolerance: 2.545e-06

当您将模型重写为时，您确实可以看到模型是非线性的

> summary(nls(y ~ b0+(1+b1*x1)*(1+b2*x2)-1, start=list(b0=0, b1=1, b2=1)))

Formula: y ~ b0 + (1 + b1 * x1) * (1 + b2 * x2) - 1

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
b0 0.987203   0.049713   19.86   <2e-16 ***
b1 0.494438   0.007803   63.37   <2e-16 ***
b2 0.202396   0.003359   60.25   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1121 on 47 degrees of freedom

Number of iterations to convergence: 5 
Achieved convergence tolerance: 2.25e-06

score 1 · Accepted Answer

没有办法做你要求的事情lm，也没有理由让它能够做到。你跑来lm估计你的系数。如果您不想估计系数，则不要在模型中包含预测变量。您可以使用coef提取所需的系数，然后将它们相乘。

请注意，忽略交互是一个不同的模型，将产生不同的 b1 和 b2。您也可以保留I(x1 * x2)而不使用系数。

至于您为什么要这样做，没有很好的先验理由证明您的约束模型实际上比简单的加法模型更适合。拥有更多自由参数必然意味着模型拟合得更好，但您没有添加这一点，您添加了一个约束，在现实世界中，它可能会使其拟合得更差。在这种情况下，您是否认为它是与包括交互在内的模型进行比较的更好的“基线”？

r - 如何在 lm 中指定参数估计之间的关系？

3 回答 3

Related

Reference