python - 为什么 Python（scipy、statsmodels）中的 t-test 给出的结果与 R、Stata 或 Excel 不同？

Question

（问题已解决；x,y 和 s1,s2 的大小不同）

在 R 中：

x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
t.test(x,y)
t = -1.6229, df = 29.727, p-value = 0.1152

在 STATA 和 Excel 中获得相同的数字

t.test(x,y,alternative="less")
t = -1.6229, df = 29.727, p-value = 0.05758

无论我尝试哪种选项，我都无法使用 statsmodels.stats.weightstats.ttest_ind 或 scipy.stats.ttest_ind 复制相同的结果。

statsmodels.stats.weightstats.ttest_ind(s1,s2,alternative="two-sided",usevar="unequal")
(-1.8912081781378358, 0.066740317997990656, 35.666557473974343)

scipy.stats.ttest_ind(s1,s2,equal_var=False)
(array(-1.8912081781378338), 0.066740317997990892)

scipy.stats.ttest_ind(s1,s2,equal_var=True)
(array(-1.8912081781378338), 0.066664507499812745)

一定有成千上万的人使用 Python 计算 t-test。我们都得到不正确的结果吗？（我通常依赖 Python，但这次我用 STATA 检查了我的结果）。

score 5 · Accepted Answer

这就是我得到的结果，默认等于 var：

>>> x_ = (373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
>>> y_ = (411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)

>>> from scipy import stats
>>> stats.ttest_ind(x_, y_)
(array(-1.62292672368488), 0.11506840827144681)

>>> import statsmodels.api as sm
>>> sm.stats.ttest_ind(x_, y_)
(-1.6229267236848799, 0.11506840827144681, 30.0)

并且具有不相等的var：

>>> statsmodels.stats.weightstats.ttest_ind(x_, y_,alternative="two-sided",usevar="unequal")
(-1.6229267236848799, 0.11516398707890187, 29.727196553288369)
>>> stats.ttest_ind(x_, y_, equal_var=False)
(array(-1.62292672368488), 0.11516398707890187)

score 3 · Accepted Answer

简短的回答是 Python 中提供的 t 检验与 R 和 Stata 中提供的结果相同，您只是在 Python 数组中多了一个元素。

但是，我不会指望 Excel 的稳健性。

python - 为什么 Python（scipy、statsmodels）中的 t-test 给出的结果与 R、Stata 或 Excel 不同？

2 回答 2

Related

Reference