我有一个包含许多行业名称的因素。我需要将它们分解为主要类别和行业。例如,因为我允许受访者随心所欲地做出回应,所以我的级别数量夸大了(例如金融服务、金融服务、银行、金融)。因为这些案例不匹配,它们作为一个额外的级别出现,所以我试图用 forcats 折叠它们:

test <- fct_collapse(PrescreenF$Industry, Finance = c("Banking",
  "Corporate Finance", "Finance", "Financial", "financial services",
  "financial services", "Financial Services", "Financial services"),
  NULL = "H")



-> test$industry
Corporate Finance 
Finance Financial 
financial services
financial services 
Financial Services 
Financial services


编辑 dput(x$industry) 的输出

structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
4L, 3L, 3L, 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 12L, 13L, 14L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 18L, 18L, 18L, 
18L, 19L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 25L, 26L, 27L, 28L
), .Label = c("", "{\"ImportId\":\"QID8_TEXT\"}", "Finance", 
"Financial ", "Financial services ", "Please indicate the industry you work in (e.g. technology, healthcare etc):", 
"Cleantech", "Delivery", "e-commerce/fashion", "Food", "Food & Bev", 
"Retail", "Service", "tech", "technology", "Technology", "IT, technology", 
"Software", "Technology ", "Tehcnology", "Consulting", "Digital advertising", 
"Education", "Higher education", "Technology, management consulting", 
"University professor; teaching, research and service", "Information Technology and Services", 
"mobile technology"), class = "factor")

编辑想通了。有些条款在结束后有额外的空格。例如,虽然当我调用 Prescreen$Industry 时,它会返回多个名称,如“Banking”和“Corporate Finance”,但它并没有告诉我在 level 后面有空格。银行业实际上是..“银行业”有一个在 R 中没有出现的不可见空间。如何确保这是可见的并且不会再次发生?

我可以在列中运行 len 函数吗?如果是这样,它是如何工作的?PrescreenF$Industry(“银行”)?


1 回答 1




x$industry <- as.character(x$industry)
x$industry <- str_trim(x$industry)
x$industry <- as.factor(x$industry)


于 2017-10-05T21:14:19.377 回答