python - 为 statsmodels t 检验转换数据框

Question

我正在尝试在 pandas/statsmodels 中运行 t 检验来比较两组之间的性能差异，但是我很难以 statsmodels 可以使用的方式（以合理的方式）格式化数据。

我的熊猫数据框目前看起来像这样：

Treatment      Performance
a              2
b              3
a              2
a              1
b              0

我的理解是，要执行 t 检验，我需要按治疗组织的数据，如下所示：

TreatmentA    TreatmentB
2             3
2             0
1

这段代码几乎可以解决问题：

cat1 = df.groupby('Treatment', as_index=False).groups['a']
cat2 = df.groupby('Treatment', as_index=False).groups['b']
print(ttest_ind(cat1, cat2))

但是当我打印时，它看起来像是在拉动发生该处理的索引而不是性能值：

print(cat1)
[0, 2, 4, 5, 9, 10, 11, 16, 18,...131, 133, 142, 147, 152, 153, 156, 157, 158]

它 [也许？] 需要更像这样：

print(cat1)
[2, 2, 1, ...0, 3, 1, 1, 0, 2, 0, 0, 0]

将此数据帧转换为可以执行 t 检验的格式的最佳方法是什么？

score 1 · Accepted Answer

我认为最简单的方法是这样做：

ttest_ind(df[df['Treatment'] == 'a']['Performance'], df[df['Treatment'] == 'b']['Performance'])

希望能帮助到你。

1 回答 1