我有两组数据,其中 X 是观察值,Y 是预期值。我正在尝试量化与 Python 的拟合优度。人们经常计算数据集并根据这些值决定哪个更好,哪个是错误的。我想要帮助我确定哪个数据集观察到的值接近预期值的值。我尝试
使用 Python 进行测试,但是否有任何其他测试可以帮助确定最适合的测试。
代码
from scipy.stats import chisquare
import numpy as np
x1 = np.array([97.83, 95.06, 92.54, 97.69, 93.76, 93.36, 93.37, 99.29, 101.57,
97.88, 98.71, 75.31, 72.52, 67.75, 77.97, 78.42, 72.62, 82.29, 90.26, 76.32, 78.78, 79.96])
y1 = np.array([90.90, 90.50, 89.50, 92.90, 91.20, 91.70, 91.40, 94.20, 96.80,
93.30, 94.40, 70.20, 71.20, 68.40, 74.20, 74.60, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])
x2 = ([92.14, 91.44, 91.31, 93.26, 93.26, 91.65, 92.41, 93.47, 97.12, 101.46,
94.99, 98.08, 69.33, 69.63, 68.45, 72.62, 71.17, 80.54, 90.42, 74.25, 79.60, 80.77])
y2 = ([90.90, 90.50, 89.50, 92.90, 93.00, 91.20, 91.70, 91.40, 94.20, 96.80, 93.30,
94.40, 70.20, 71.20, 68.40, 74.20, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])
print chisquare(x1, y1)
print chisquare(x2, y2)
更新
from scipy.stats import chisquare
from sklearn.metrics import r2_score
from scipy import stats
import numpy as np
x1 = np.array([97.83, 95.06, 92.54, 97.69, 93.76, 93.36, 93.37, 99.29, 101.57,
97.88, 98.71, 75.31, 72.52, 67.75, 77.97, 78.42, 72.62, 82.29, 90.26, 76.32, 78.78, 79.96])
y1 = np.array([90.90, 90.50, 89.50, 92.90, 91.20, 91.70, 91.40, 94.20, 96.80,
93.30, 94.40, 70.20, 71.20, 68.40, 74.20, 74.60, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])
x2 = ([92.14, 91.44, 91.31, 93.26, 93.26, 91.65, 92.41, 93.47, 97.12, 101.46,
94.99, 98.08, 69.33, 69.63, 68.45, 72.62, 71.17, 80.54, 90.42, 74.25, 79.60, 80.77])
y2 = ([90.90, 90.50, 89.50, 92.90, 93.00, 91.20, 91.70, 91.40, 94.20, 96.80, 93.30,
94.40, 70.20, 71.20, 68.40, 74.20, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])
print "Scikit R2, 1:", r2_score(y1, x1)
print "Scikit R2, 2:", r2_score(y2, x2)
slope1, intercept1, r_value1, p_value1, std_err1 = stats.linregress(y1,x1)
slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(y2,x2)
print "Stats R2, 1:", r_value1**2
print "Stats R2, 2", r_value2**2
使用更新后的代码,获得以下输出:
Scikit R2, 1: 0.820091025592
Scikit R2, 2: 0.928643087517
Stats R2, 1: 0.958813342741
Stats R2, 2 0.965013525387
为什么从 scikit 和 scipy 获得的 R2 值不同?