对于正在考虑跳上这辆马车的人,这里有一些代码演示了这个问题(数据在这里):
clear
set more off
capture ssc install bcuse
capture ssc install rsource
bcuse affairs
saveold affairs, version(12) replace
rsource, terminator(XXX)
library("foreign")
library("lmtest")
library("sandwich")
mydata<-read.dta("affairs.dta")
probit1<-glm(affair ~ male + age + yrsmarr + kids + relig + educ + ratemarr, family = binomial (link = "probit"), data = mydata)
sandwich1 <- function(object,...) sandwich(object) * nobs(object)/(nobs(object) - 1)
coeftest(probit1,vcov = sandwich1)
XXX
probit affair male age yrsmarr kids relig educ ratemarr, robust cformat(%9.6f) nolog
R给出:
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.764157 0.546692 1.3978 0.1621780
male 0.188816 0.133260 1.4169 0.1565119
age -0.024400 0.011423 -2.1361 0.0326725 *
yrsmarr 0.054608 0.019025 2.8703 0.0041014 **
kids 0.208072 0.168222 1.2369 0.2161261
relig -0.186085 0.053968 -3.4480 0.0005647 ***
educ 0.015506 0.026389 0.5876 0.5568012
ratemarr -0.272711 0.053668 -5.0814 3.746e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
统计产量:
Probit regression Number of obs = 601
Wald chi2(7) = 54.93
Prob > chi2 = 0.0000
Log pseudolikelihood = -305.2525 Pseudo R2 = 0.0961
------------------------------------------------------------------------------
| Robust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 0.188817 0.131927 1.43 0.152 -0.069755 0.447390
age | -0.024400 0.011124 -2.19 0.028 -0.046202 -0.002597
yrsmarr | 0.054608 0.018963 2.88 0.004 0.017441 0.091775
kids | 0.208075 0.166243 1.25 0.211 -0.117754 0.533905
relig | -0.186085 0.053240 -3.50 0.000 -0.290435 -0.081736
educ | 0.015505 0.026355 0.59 0.556 -0.036150 0.067161
ratemarr | -0.272710 0.053392 -5.11 0.000 -0.377356 -0.168064
_cons | 0.764160 0.534335 1.43 0.153 -0.283117 1.811437
------------------------------------------------------------------------------
附录:
系数的协方差估计的差异是由于不同的拟合算法造成的。在 R 中,该glm
命令使用迭代最小二乘法,而 Stataprobit
使用基于 Newton-Raphson 算法的 ML 方法。glm
您可以使用以下选项匹配 R在 Stata中所做的事情irls
:
glm affair male age yrsmarr kids relig educ ratemarr, irls family(binomial) link(probit) robust
这产生:
Generalized linear models No. of obs = 601
Optimization : MQL Fisher scoring Residual df = 593
(IRLS EIM) Scale parameter = 1
Deviance = 610.5049916 (1/df) Deviance = 1.029519
Pearson = 619.0405832 (1/df) Pearson = 1.043913
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = invnorm(u) [Probit]
BIC = -3183.862
------------------------------------------------------------------------------
| Semirobust
affair | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 0.188817 0.133260 1.42 0.157 -0.072367 0.450002
age | -0.024400 0.011422 -2.14 0.033 -0.046787 -0.002012
yrsmarr | 0.054608 0.019025 2.87 0.004 0.017319 0.091897
kids | 0.208075 0.168222 1.24 0.216 -0.121634 0.537785
relig | -0.186085 0.053968 -3.45 0.001 -0.291862 -0.080309
educ | 0.015505 0.026389 0.59 0.557 -0.036216 0.067226
ratemarr | -0.272710 0.053668 -5.08 0.000 -0.377898 -0.167522
_cons | 0.764160 0.546693 1.40 0.162 -0.307338 1.835657
------------------------------------------------------------------------------
这些将是接近的,尽管不相同。我不知道如何让 R 在没有大量工作的情况下使用像 NR 这样的东西。