我正在使用以下代码执行 t 检验:
def t_stat(na,abar,avar,nb,bbar,bvar):
logger.info("T-test to be performed")
logger.info("Set A count = %f mean = %f variance = %f" % (na,abar,avar))
logger.info("Set B count = %f mean = %f variance = %f" % (nb,bbar,bvar))
adof = na - 1
bdof = nb - 1
logger.info("Degrees of Freedom of a=%f" % adof)
logger.info("Degrees of Freedom of b=%f" % bdof)
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
logger.info("tf = %f, dof=%f"%(tf,dof))
pf = 2*stdtr(dof, -np.abs(tf))
我的输出看起来像:
Set A count = 3547465.000000 mean = 0.001123 variance = 0.000369
Set B count = 83759692.000000 mean = 0.001242 variance = 0.000424
Degrees of Freedom of a=3547464.000000
Degrees of Freedom of b=83759691.000000
tf = -11.374250, dof=-2176568.362223
formula: t = -11.3743 p = nan
当我传递与数组相同的数据并使用 ttest_ind 函数时,我得到 t = -11.374250 p = 0.000000。
为什么我的函数将 p 设为 nan ?Afaik,我不能将 nan 视为 0。如何理解我的 t_stat 和 ttest_ind 之间的确切区别?任何帮助,将不胜感激。