python - R和Python之间的不同t检验p值

Question

我目前是 python 新手，正在尝试了解有关倾向得分匹配的更多信息。我从 Stanford.edu 找到了一个很棒的教程（因为这是我的第一个帖子堆栈溢出不会让我发布两个链接，但谷歌斯坦福倾向得分匹配）涵盖了这一点。我的目标是在 python 中重新创建这一切并了解正在发生的事情。

我的问题是当我到达第 1.2 节差异均值：预处理协变量并开始运行 t 检验时。我不明白为什么对于相同的测试和相同的数据，R 和 Python 之间的 p 值如此不同。

代码： with(ecls, t.test(race_white ~ catholic, var.equal=FALSE))

输出：

Welch Two Sample t-test

data:  race_white by catholic
t = -13.453, df = 2143.3, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1936817 -0.1444003
sample estimates:
mean in group 0 mean in group 1 
      0.5561246       0.7251656

当我在 python 中执行相同的想法时，我的 t-stat 和自由度是相同的，但我的 p 值相差很远。

Python代码：

cath=dat[dat['catholic']==1]['race_white']
noncath=dat[dat['catholic']==0]['race_white']
fina =sms.ttest_ind(noncath,cath,alternative='two-sided', usevar='unequal')
print(fina)
print("The t-statistic is %.3f the p-value is %.3f and the df is %.3f"%fina)

蟒蛇输出： (-13.45342570302274, 1.1413329198468439e-39, 2143.2902027156415) The t-statistic is -13.453 the p-value is 0.000 and the df is 2143.290'

我使用的是完全相同的数据集，只是无法弄清楚为什么两者不同。我在另一个类似的 SO 主题中看到，但他们的结论是大小不同。这是使用相同的数据集，因此大小没有不同。

可以在此处找到用于 python 和 R 的数据文件（ecls.csv）的数据文件。非常感谢任何关于为什么此 t 检验的 p 值不同的帮助。

score 0 · Accepted Answer

R 不会打印低于 2.2e-16 的 p 值，但会计算并存储它们。试试这个为你的 R 代码：

with(ecls, t.test(race_white ~ catholic, var.equal=FALSE))$p.value
[1] 1.141333e-39

该值实际上为零，这就是为什么当您使用 Python 将其打印到小数点后 3 位时，您会看到 0.000。尝试在 Python 中打印未修改的 p 值（不要使用%.3f- 事实上你已经这样做了！print(fina)），我希望你会看到与 R 相同的值（实际上你这样做了！）

python - R和Python之间的不同t检验p值

1 回答 1

Related

Reference