好吧,老实说,我已经阅读了 step_num2factor 的函数参考,并没有弄清楚如何正确使用它。
temp_names <- as.character(unique(sort(all_raw$MSSubClass)))
price_recipe <-
recipe(SalePrice ~ . , data = train_raw) %>%
step_num2factor(MSSubClass, levels = temp_names)
temp_rec <- prep(price_recipe, training = train_raw, strings_as_factors = FALSE) # temporary recipe
temp_data <- bake(temp_rec, new_data = all_raw) # temporary data
class(all_raw$MSSubClass)
# > col_double()
MSSubClass: Identifies the type of dwelling involved in the sale.
20 1-STORY 1946 & NEWER ALL STYLES
30 1-STORY 1945 & OLDER
40 1-STORY W/FINISHED ATTIC ALL AGES
45 1-1/2 STORY - UNFINISHED ALL AGES
50 1-1/2 STORY FINISHED ALL AGES
60 2-STORY 1946 & NEWER
70 2-STORY 1945 & OLDER
75 2-1/2 STORY ALL AGES
80 SPLIT OR MULTI-LEVEL
85 SPLIT FOYER
90 DUPLEX - ALL STYLES AND AGES
120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER
150 1-1/2 STORY PUD - ALL AGES
160 2-STORY PUD - 1946 & NEWER
180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
190 2 FAMILY CONVERSION - ALL STYLES AND AGES
使用 step 后数据输出temp_data$MSSubClass
全是 NA。obs 保存为 20,30,40.... 190,我想转换为名称(甚至是相同的数字,但作为无序因子)
如果你知道更多关于 step_num2factor 使用的博客文章或一些使用的代码,我也很乐意看到。
完整的数据集由 kaggle 提供: kaggle data
提前谢谢,