4

我正在尝试使用列联表计算 python 中的卡方值。这是一个例子。

+--------+------+------+
|        | Cat1 | Cat2 |
+--------+------+------+
| Group1 |   80 |  120 |
| Group2 |  420 |  380 |
+--------+------+------+

预期值为:

+--------+------+------+
|        | Cat1 | Cat2 |
+--------+------+------+
| Group1 |  100 |  100 |
| Group2 |  400 |  400 |
+--------+------+------+

如果我手动计算卡方值,我得到 10。但是使用 python 我得到 9.506。我使用以下代码:

import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy

# Some fake data.
n = 5  # Number of samples.
d = 3  # Dimensionality.
c = 2  # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])

# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])

contingency.iloc[0][0]=80
contingency.iloc[0][1]=120
contingency.iloc[1][0]=420
contingency.iloc[1][1]=380

# Chi-square test of independence.
chi, p, dof, expected = chi2_contingency(contingency)

奇怪的是,该函数给了我正确的期望值,但是卡方和 p 值是关闭的。我在这里做错了什么?

谢谢

ps

我知道我在 pandas 中创建初始表非常蹩脚,但我不是如何在 pandas 中创建这些嵌套表的专家。

4

1 回答 1

7

从文档中:

correction : bool, optional
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity.
The effect of the correction is to adjust each observed value by 0.5 towards
the corresponding expected value.

自由度是 1。如果你将修正设置为 False,你会得到 10。

chi2_contingency(contingency, correction=False)
>>> (10.0, 0.001565402258002549, 1, array([[ 100.,  100.],
    [ 400.,  400.]]))
于 2017-08-03T14:24:50.460 回答