r - “预测”给出的结果与手动使用“摘要”中的系数不同

Question

让我通过一个例子来说明我的困惑，

#making datasets
x1<-iris[,1]
x2<-iris[,2]
x3<-iris[,3]
x4<-iris[,4]
dat<-data.frame(x1,x2,x3)
dat2<-dat[1:120,]
dat3<-dat[121:150,]

#Using a linear model to fit x4 using x1, x2 and x3 where training set is first 120 obs.
model<-lm(x4[1:120]~x1[1:120]+x2[1:120]+x3[1:120])

#Usig the coefficients' value from summary(model), prediction is done for next 30 obs.
-.17947-.18538*x1[121:150]+.18243*x2[121:150]+.49998*x3[121:150]

#Same prediction is done using the function "predict"
predict(model,dat3)

我的困惑是：预测最后 30 个值的两个结果不同，可能在一定程度上有所不同，但它们确实不同。为什么会这样？它们不应该完全相同吗？

score 4 · Accepted Answer

差异真的很小，我认为这只是由于您使用的系数的准确性（例如，截距的实际值-0.17947075338464965610...不简单-.17947）。

事实上，如果你取系数值并应用公式，结果等于预测：

intercept <- model$coefficients[1]
x1Coeff <- model$coefficients[2]
x2Coeff <- model$coefficients[3]
x3Coeff <- model$coefficients[4]

intercept + x1Coeff*x1[121:150] + x2Coeff*x2[121:150] + x3Coeff*x3[121:150]

score 2 · Accepted Answer

您可以稍微清理一下代码。要创建训练和测试数据集，您可以使用以下代码：

# create training and test datasets
train.df <- iris[1:120, 1:4] 
test.df <- iris[-(1:120), 1:4]

# fit a linear model to predict Petal.Width using all predictors
fit <- lm(Petal.Width ~ ., data = train.df)
summary(fit)

# predict Petal.Width in test test using the linear model
predictions <- predict(fit, test.df)

# create a function mse() to calculate the Mean Squared Error
mse <- function(predictions, obs) {
  sum((obs - predictions) ^ 2) / length(predictions)
}

# measure the quality of fit
mse(predictions, test.df$Petal.Width)

您的预测不同的原因是该函数predict()使用所有小数点，而在您的“手动”计算中，您仅使用五个小数点。该summary()函数不显示系数的完整值，而是近似到小数点后五位，以使输出更具可读性。

r - “预测”给出的结果与手动使用“摘要”中的系数不同

2 回答 2

Related

Reference