我想在 R 中运行两阶段概率最小二乘回归。有人知道该怎么做吗?外面有包吗?我知道可以使用 Stata 来做到这一点,所以我想用 R 来做到这一点是可能的。
问问题
6806 次
3 回答
8
当您说“两阶段概率最小二乘法”时,您可能想要更具体。由于您指的是实现此功能的 Stata 程序,我猜您是在谈论 CDSIMEQ 程序包,该程序包实现了 Heckit 模型(又名广义 Tobit,又名 Tobit II 型模型等)的 Amemiya (1978) 程序。正如格兰特所说,systemfit 会为你做一个 Tobit,但不是用两个方程。MicEcon 包确实有一个 Heckit(但包已经分裂了很多次,我不知道它现在在哪里)。
如果您想要 CDSIMEQ 的功能,可以在 R 中轻松实现。我编写了一个复制 CDSIMEQ 的函数:
tspls <- function(formula1, formula2, data) {
# The Continous model
mf1 <- model.frame(formula1, data)
y1 <- model.response(mf1)
x1 <- model.matrix(attr(mf1, "terms"), mf1)
# The dicontionous model
mf2 <- model.frame(formula2, data)
y2 <- model.response(mf2)
x2 <- model.matrix(attr(mf2, "terms"), mf2)
# The matrix of all the exogenous variables
X <- cbind(x1, x2)
X <- X[, unique(colnames(X))]
J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1))
J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2))
for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1
for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1
# Step 1:
cat("\n\tNOW THE FIRST STAGE REGRESSION")
m1 <- lm(y1 ~ X - 1)
m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit"))
print(summary(m1))
print(summary(m2))
yhat1 <- m1$fitted.values
yhat2 <- X %*% coef(m2)
PI1 <- m1$coefficients
PI2 <- m2$coefficients
V0 <- vcov(m2)
sigma1sq <- sum(m1$residuals ^ 2) / m1$df.residual
sigma12 <- 1 / length(y2) * sum(y2 * m1$residuals / dnorm(yhat2))
# Step 2:
cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS")
m1 <- lm(y1 ~ yhat2 + x1 - 1)
m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
sm1 <- summary(m1)
sm2 <- summary(m2)
print(sm1)
print(sm2)
# Step 3:
cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n")
gamma1 <- m1$coefficients[1]
gamma2 <- m2$coefficients[1]
cc <- sigma1sq - 2 * gamma1 * sigma12
dd <- gamma2 ^ 2 * sigma1sq - 2 * gamma2 * sigma12
H <- cbind(PI2, J1)
G <- cbind(PI1, J2)
XX <- crossprod(X) # X'X
HXXH <- solve(t(H) %*% XX %*% H) # (H'X'XH)^(-1)
HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH
Valpha1 <- cc * HXXH + gamma1 ^ 2 * HXXH %*% HXXVXXH %*% HXXH
GV <- t(G) %*% solve(V0) # G'V0^(-1)
GVG <- solve(GV %*% G) # (G'V0^(-1)G)^(-1)
Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG
ans1 <- coef(sm1)
ans2 <- coef(sm2)
ans1[,2] <- sqrt(diag(Valpha1))
ans2[,2] <- sqrt(diag(Valpha2))
ans1[,3] <- ans1[,1] / ans1[,2]
ans2[,3] <- ans2[,1] / ans2[,2]
ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE)
ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE)
cat("Continuous:\n")
print(ans1)
cat("Dichotomous:\n")
print(ans2)
}
为了比较,我们可以复制 CDSIMEQ 作者关于包的文章中的示例。
> library(foreign)
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta")
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4,
+ dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7,
+ data = cdsimeq)
NOW THE FIRST STAGE REGRESSION
Call:
lm(formula = y1 ~ X - 1)
Residuals:
Min 1Q Median 3Q Max
-1.885921 -0.438579 -0.006262 0.432156 2.133738
Coefficients:
Estimate Std. Error t value Pr(>|t|)
X(Intercept) 0.010752 0.020620 0.521 0.602187
Xexog3 0.158469 0.021862 7.249 8.46e-13 ***
Xexog2 -0.009669 0.021666 -0.446 0.655488
Xexog1 0.159955 0.021260 7.524 1.19e-13 ***
Xexog4 0.316575 0.022456 14.097 < 2e-16 ***
Xexog5 0.497207 0.021356 23.282 < 2e-16 ***
Xexog6 -0.078017 0.021755 -3.586 0.000352 ***
Xexog7 0.161177 0.022103 7.292 6.23e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6488 on 992 degrees of freedom
Multiple R-squared: 0.5972, Adjusted R-squared: 0.594
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16
Call:
glm(formula = y2 ~ X - 1, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49531 -0.59244 0.01983 0.59708 2.41810
Coefficients:
Estimate Std. Error z value Pr(>|z|)
X(Intercept) 0.08352 0.05280 1.582 0.113692
Xexog3 0.21345 0.05678 3.759 0.000170 ***
Xexog2 0.21131 0.05471 3.862 0.000112 ***
Xexog1 0.45591 0.06023 7.570 3.75e-14 ***
Xexog4 0.39031 0.06173 6.322 2.57e-10 ***
Xexog5 0.75955 0.06427 11.818 < 2e-16 ***
Xexog6 0.85461 0.06831 12.510 < 2e-16 ***
Xexog7 -0.16691 0.05653 -2.953 0.003152 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.14 on 992 degrees of freedom
AIC: 770.14
Number of Fisher Scoring iterations: 6
NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS
Call:
lm(formula = y1 ~ yhat2 + x1 - 1)
Residuals:
Min 1Q Median 3Q Max
-2.32152 -0.53160 0.04886 0.53502 2.44818
Coefficients:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.257592 0.021451 12.009 <2e-16 ***
x1(Intercept) 0.012185 0.024809 0.491 0.623
x1exog3 0.042520 0.026735 1.590 0.112
x1exog2 0.011854 0.026723 0.444 0.657
x1exog1 0.007773 0.028217 0.275 0.783
x1exog4 0.318636 0.028311 11.255 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7803 on 994 degrees of freedom
Multiple R-squared: 0.4163, Adjusted R-squared: 0.4128
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16
Call:
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49610 -0.58595 0.01969 0.59857 2.41281
Coefficients:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26287 0.16061 7.863 3.75e-15 ***
x2(Intercept) 0.07080 0.05276 1.342 0.179654
x2exog1 0.25093 0.06466 3.880 0.000104 ***
x2exog2 0.22604 0.05389 4.194 2.74e-05 ***
x2exog5 0.12912 0.09510 1.358 0.174544
x2exog6 0.95609 0.07172 13.331 < 2e-16 ***
x2exog7 -0.37128 0.06759 -5.493 3.94e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.21 on 993 degrees of freedom
AIC: 768.21
Number of Fisher Scoring iterations: 6
NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS
Continuous:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.25759209 0.1043073 2.46955009 0.01369540
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445
x1exog3 0.04252006 0.1291588 0.32920764 0.74206810
x1exog2 0.01185438 0.1290754 0.09184073 0.92684309
x1exog1 0.00777347 0.1363643 0.05700519 0.95455252
x1exog4 0.31863627 0.1367881 2.32941597 0.02003661
Dichotomous:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26286574 0.7395166 1.7076909 0.0876937093
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867
x2exog1 0.25092561 0.3126763 0.8025092 0.4222584495
x2exog2 0.22603717 0.2739307 0.8251618 0.4092797527
x2exog5 0.12911922 0.4822986 0.2677163 0.7889176766
x2exog6 0.95609385 0.2823662 3.3860070 0.0007091758
x2exog7 -0.37128221 0.3265478 -1.1369920 0.2555416141
于 2010-10-04T15:48:01.013 回答
2
于 2010-10-02T01:25:54.207 回答
2
systemfit也可以解决问题。
于 2010-10-02T01:46:16.317 回答