通常,来自多个响应调查项目的数据是结构化的,没有足够的信息使整理变得非常容易。具体来说,我有一个调查问题,受访者从 8 个分类项目中选择一个或多个。生成的数据框最多有8 个用逗号分隔的字符串。某些单元格可能有两个、四个或没有用逗号分隔的 8 个选项。第八个项目是“其他”,可以填充自定义文本。
顺便说一句,这是 GoogleForms 多响应数据的典型格式。
以下是示例数据。第三行和最后一行包括对第八个“其他”选项的唯一响应:
structure(list(actvTypes = c(NA, NA, "Data collection, Results / findings / learnings, ate ants and milkweed",
NA, "Discussion of our research question, Planning for data collection",
"Data analysis, Collected data, apples are yummy")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
我想制作一组 8 个新列,其中响应记录为0 或 1。如何有效地做到这一点?
我有一个解决方案,但它很麻烦。我首先为每个响应选项创建新列:
atypes<- c("atype1","atype2","atype3","atype4","atype5","atype6","atype7","atype8")
log[atypes]<-NA
接下来,我写了八个ifelse
语句;前七个的格式如下所示:
log$atype7<-ifelse(str_detect(log$actvTypes,"Met with non-DASA team member (not data collection)"),1,0)
对于“其他”响应选项,我使用了字符串列表和sapply
解决方案:
alloptions<-c('Discussion of our research question' ,'Planning for data collection' ,'Data analysis','Discussion of results | findings | learnings' ,'Mid-course corrections to our project' ,'Collected data' ,'Met with non-DASA team member (not data collection)' )
log$atype8<-sapply(log$actvTypes, function(x)
ifelse(
any(sapply(alloptions, str_detect, string = x)==TRUE),1,0) )
这种编码方案如何更优雅?也许sapply
并使用索引?