2

我已经尝试了这个类似问题的几乎所有内容,但我无法得到其他人似乎得到的结果。这是我的问题:

我有一个这样的数据框,列出了每个老师的成绩:

> profs <- data.frame(teaches = c("1st", "1st, 2nd",
                                  "2nd, 3rd",
                                  "1st, 2nd, 3rd"))
> profs
        teaches
1           1st
2      1st, 2nd
3      2nd, 3rd
4 1st, 2nd, 3rd

我一直在寻找将teaches变量分成列的解决方案,如下所示:

  teaches1st teaches2nd teaches3rd
1          1          0          0
2          1          1          0
3          0          1          1
4          1          1          1

我理解这个涉及splitstackshape库的解决方案,鉴于回答者的解释,显然已弃用的concat.split.expanded函数应该完全符合我的要求。但是,我似乎无法达到相同的结果:

> concat.split.expanded(profs, "teaches", fill = 0, drop = TRUE)
Fehler in seq.default(min(vec), max(vec)) : 
  'from' cannot be NA, NaN or infinite

使用cSplit,我理解它取代了“大多数早期的 concat.split* 函数”,我得到了这个:

> cSplit(profs, "teaches")
   teaches_1 teaches_2 teaches_3
1:       1st        NA        NA
2:       1st       2nd        NA
3:       2nd       3rd        NA
4:       1st       2nd       3rd

我已经尝试使用cSplit's help 并调整每一个参数,但我就是无法拆分。我很感激任何帮助。

4

4 回答 4

4

由于您的连接数据是连接字符串(不是连接数值),您需要添加type = "character"才能使函数按预期工作。

该函数的默认设置是数值,因此错误NaN等。

命名与同一家族中其他功能的缩写形式更加一致。因此,它现在是cSplit_e(尽管旧的函数名称仍然可以使用)。

library(splitstackshape)
cSplit_e(profs, "teaches", ",", type = "character", fill = 0)
#         teaches teaches_1st teaches_2nd teaches_3rd
# 1           1st           1           0           0
# 2      1st, 2nd           1           1           0
# 3      2nd, 3rd           0           1           1
# 4 1st, 2nd, 3rd           1           1           1

的帮助页面与 的帮助页面?concat.split.expanded相同cSplit_e。如果您有任何使其更清晰易懂的提示,请在包的 GitHub 页面上提出问题。

于 2015-03-17T17:39:21.923 回答
2

这是另一种选择:

Vectorize(grepl, 'pattern')(c('1st', '2nd', '3rd'), profs$teaches)
#        1st   2nd   3rd
# [1,]  TRUE FALSE FALSE
# [2,]  TRUE  TRUE FALSE
# [3,] FALSE  TRUE  TRUE
# [4,]  TRUE  TRUE  TRUE
于 2015-03-17T14:37:24.993 回答
2

你可以尝试mtabulateqdapTools

library(qdapTools)
res <- mtabulate(strsplit(as.character(profs$teaches), ', '))
colnames(res) <- paste0('teaches', colnames(res))
res
#    teaches1st teaches2nd teaches3rd
#1          1          0          0
#2          1          1          0
#3          0          1          1
#4          1          1          1

或使用stringi

library(stringi)
(vapply(c('1st', '2nd', '3rd'), stri_detect_fixed, logical(4L), 
                          str=profs$teaches))+0L
#     1st 2nd 3rd
#[1,]   1   0   0
#[2,]   1   1   0
#[3,]   0   1   1
#[4,]   1   1   1
于 2015-03-17T14:43:15.987 回答
0

我找到了解决方法。如果你有一个只包含分隔符和数字的字符串变量,它似乎concat.split.expanded有效,即:

> profs <- data.frame(teaches = c("1", "1, 2", "2, 3", "1, 2, 3"))
> profs
  teaches
1       1
2    1, 2
3    2, 3
4 1, 2, 3

现在concat.split.expanded可以像处理来自字符串变量的虚拟变量一样工作:

> concat.split.expanded(profs, "teaches", fill = 0, drop = TRUE)
  teaches_1 teaches_2 teaches_3
1         1         0         0
2         1         1         0
3         0         1         1
4         1         1         1

但是,我仍在寻找不涉及从teaches变量中删除所有字母的解决方案。

于 2015-03-17T14:27:56.360 回答