我正在使用食谱进行线性回归,以根据等级(副教授、助理教授和正教授)、性别、学科(应用或理论)、服务年限和博士后的年限来预测薪水。该数据集位于汽车包中。
我创建了虚拟变量,并将因结果变量转换为更正常的形状。我已经将服务年限和博士后的年限标准化为 0 到 1 之间的值。
salary.split <- initial_split(salary.df)
sal.train <- training(salary.split)
sal.test <- testing(salary.split)
sal.recipe <- recipe(salary ~ ., data = salary.df) %>%
step_log(salary) %>%
step_dummy(all_nominal()) %>%
step_range(yrs.since.phd) %>%
step_range(yrs.service)
sal.rec <- prep(sal.recipe, training = sal.train) %>% bake(new_data = sal.train)
sal.lm <- lm(sal.rec)
summary(sal.lm)
总结结果:
Call:
lm(formula = sal.rec)
Residuals:
Min 1Q Median 3Q Max
-0.17727 -0.05780 -0.01406 0.04221 0.34499
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3052564 0.3240025 -0.942 0.34690
yrs.service 0.8054404 0.0292577 27.529 < 2e-16 ***
salary 0.0375859 0.0285323 1.317 0.18877
rank_AsstProf -0.0528260 0.0184926 -2.857 0.00459 **
rank_Prof 0.0740925 0.0174977 4.234 3.08e-05 ***
discipline_B -0.0438070 0.0107863 -4.061 6.28e-05 ***
sex_Male 0.0006626 0.0165779 0.040 0.96815
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08639 on 291 degrees of freedom
Multiple R-squared: 0.8656, Adjusted R-squared: 0.8628
F-statistic: 312.2 on 6 and 291 DF, p-value: < 2.2e-16
当我查看变量信息 ( sal.recipe$var_info
) 时:
# A tibble: 6 x 4
variable type role source
<chr> <chr> <chr> <chr>
1 rank nominal predictor original
2 discipline nominal predictor original
3 yrs.since.phd numeric predictor original
4 yrs.service numeric predictor original
5 sex nominal predictor original
6 salary numeric outcome original
它将工资显示为结果,而不是预测值。当我查看线性模型的摘要信息时,为什么薪水会显示为系数?