python - 如何量化模型预测是否与 Python 中的预期值接近？

Question

我有两组数据，其中 X 是观察值，Y 是预期值。我正在尝试量化与 Python 的拟合优度。人们经常计算 $R^2$ 数据集并根据这些值决定哪个更好，哪个是错误的。我想要帮助我确定哪个数据集观察到的值接近预期值的值。我尝试 $\chi^2$ 使用 Python 进行测试，但是否有任何其他测试可以帮助确定最适合的测试。

代码

from scipy.stats import chisquare
import numpy as np

x1 = np.array([97.83, 95.06, 92.54, 97.69, 93.76, 93.36, 93.37, 99.29, 101.57, 
        97.88, 98.71, 75.31, 72.52, 67.75, 77.97, 78.42, 72.62, 82.29, 90.26, 76.32, 78.78, 79.96])
y1 = np.array([90.90, 90.50, 89.50, 92.90, 91.20, 91.70, 91.40, 94.20, 96.80,
        93.30, 94.40, 70.20, 71.20, 68.40, 74.20, 74.60, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])


x2 = ([92.14, 91.44, 91.31, 93.26, 93.26, 91.65, 92.41, 93.47, 97.12, 101.46, 
        94.99, 98.08, 69.33, 69.63, 68.45, 72.62, 71.17, 80.54, 90.42, 74.25, 79.60, 80.77])
y2 = ([90.90, 90.50, 89.50, 92.90, 93.00, 91.20, 91.70, 91.40, 94.20, 96.80, 93.30, 
        94.40, 70.20, 71.20, 68.40, 74.20, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])

print chisquare(x1, y1)
print chisquare(x2, y2)

更新

from scipy.stats import chisquare
from sklearn.metrics import r2_score
from scipy import stats
import numpy as np

x1 = np.array([97.83, 95.06, 92.54, 97.69, 93.76, 93.36, 93.37, 99.29, 101.57, 
        97.88, 98.71, 75.31, 72.52, 67.75, 77.97, 78.42, 72.62, 82.29, 90.26, 76.32, 78.78, 79.96])
y1 = np.array([90.90, 90.50, 89.50, 92.90, 91.20, 91.70, 91.40, 94.20, 96.80,
        93.30, 94.40, 70.20, 71.20, 68.40, 74.20, 74.60, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])


x2 = ([92.14, 91.44, 91.31, 93.26, 93.26, 91.65, 92.41, 93.47, 97.12, 101.46, 
        94.99, 98.08, 69.33, 69.63, 68.45, 72.62, 71.17, 80.54, 90.42, 74.25, 79.60, 80.77])
y2 = ([90.90, 90.50, 89.50, 92.90, 93.00, 91.20, 91.70, 91.40, 94.20, 96.80, 93.30, 
        94.40, 70.20, 71.20, 68.40, 74.20, 72.00, 77.80, 83.00, 73.50, 76.70, 82.60])


print "Scikit R2, 1:", r2_score(y1, x1)
print "Scikit R2, 2:", r2_score(y2, x2)


slope1, intercept1, r_value1, p_value1, std_err1 = stats.linregress(y1,x1)
slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(y2,x2)


print "Stats R2, 1:", r_value1**2
print "Stats R2, 2", r_value2**2

使用更新后的代码，获得以下输出：

Scikit R2, 1: 0.820091025592
Scikit R2, 2: 0.928643087517
Stats R2, 1: 0.958813342741
Stats R2, 2 0.965013525387

为什么从 scikit 和 scipy 获得的 R2 值不同？

score 2 · Accepted Answer

您列出的两个函数 (scipy.stats.linregress和sklearn.metrics.r2_score) 做不同的事情。

sklearn.metrics.r2_score

sklearn.metrics.r2_score做你正在寻找的东西：它需要两组数据，并计算这两组数据R^2之间的（确定系数）。从文档：

sklearn.metrics。r2_score (y_true, y_pred, sample_weight=None, multioutput=None)

参数：

y_true : 类似数组的形状 = (n_samples) 或 (n_samples, n_outputs)

基本事实（正确）目标值。

y_pred : 类似数组的形状 = (n_samples) 或 (n_samples, n_outputs)

估计的目标值。

因此，您观察到的数据 ( x1,x2) 是您的y_true，而您的预期值 ( y1,y2) 是您的y_pred。所以，这是正确的称呼方式：

r2_score(x1, y1)

scipy.stats.linregress

scipy.stats.linregress不做你正在寻找的东西。其目的是执行线性回归并找到两组数据（不是一组数据及其预测值）的拟合。它r_value返回（您可以平方得到 R^2，是您提供给它的值与它执行的回归（拟合）的预测值之间的相关系数y。由于您已经知道您的预测值，这不是您正在寻找的函数为了。

python - 如何量化模型预测是否与 Python 中的预期值接近？

1 回答 1

sklearn.metrics.r2_score

scipy.stats.linregress

Related

Reference