我正在使用lm()
包含多项式的训练数据集。当我提前使用子集时,与在函数调用中使用参数[ ]
相比,我得到不同的系数。为什么?subset
lm()
library(ISLR2)
set.seed (1)
train <- sample(392, 196)
auto_train <- Auto[train,]
lm.fit.data <- lm(mpg ~ poly(horsepower, 2), data = auto_train)
summary(lm.fit.data)
#>
#> Call:
#> lm(formula = mpg ~ poly(horsepower, 2), data = auto_train)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -12.8711 -2.6655 -0.0096 2.0806 16.1063
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 23.8745 0.3171 75.298 < 2e-16 ***
#> poly(horsepower, 2)1 -89.3337 4.4389 -20.125 < 2e-16 ***
#> poly(horsepower, 2)2 33.2985 4.4389 7.501 2.25e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 4.439 on 193 degrees of freedom
#> Multiple R-squared: 0.705, Adjusted R-squared: 0.702
#> F-statistic: 230.6 on 2 and 193 DF, p-value: < 2.2e-16
lm.fit.subset <- lm(mpg ~ poly(horsepower, 2), data = Auto, subset = train)
summary(lm.fit.subset)
#>
#> Call:
#> lm(formula = mpg ~ poly(horsepower, 2), data = Auto, subset = train)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -12.8711 -2.6655 -0.0096 2.0806 16.1063
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 23.5496 0.3175 74.182 < 2e-16 ***
#> poly(horsepower, 2)1 -123.5881 6.4587 -19.135 < 2e-16 ***
#> poly(horsepower, 2)2 47.7189 6.3613 7.501 2.25e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 4.439 on 193 degrees of freedom
#> Multiple R-squared: 0.705, Adjusted R-squared: 0.702
#> F-statistic: 230.6 on 2 and 193 DF, p-value: < 2.2e-16
由reprex 包于 2021-12-26 创建(v2.0.1)