如果没有特定的、可重现的示例,这样的文本解析可能会很困难。但是,听起来您的数据框看起来像这样:
df
#> ID medication gender
#> 1 1 9 f
#> 2 2 2;1;3 m
#> 3 3 6;2 d
#> 4 4 3 f
#> 5 5 7;8;7;1 f
#> 6 6 6;9;4;6 m
#> 7 7 9 d
#> 8 8 8;6;3 f
#> 9 9 9;7 d
#> 10 10 8;6 m
在这种情况下,在基数 R 中获得结果的行人方式将是这样的:
meds <- lapply(split(df, df$gender),
function(x) unlist(strsplit(x$medication, ";\\s?")))
genders <- rep(c("d", "f", "m"), times = lengths(meds))
table(gender = genders, medication = unlist(meds))
#> medication
#> gender 1 2 3 4 5 6 7 8 9 10
#> d 0 1 0 0 0 1 1 0 2 0
#> f 1 0 2 0 0 1 2 2 1 0
#> m 1 1 1 1 0 3 0 1 1 0
可重现的数据
set.seed(2)
medication <- sapply(rpois(10, 2), function(x) {
if(x == 0) x <- 1
x <- sample(1:10, x, TRUE)
paste(x, collapse = ";")
})
gender <- sample(c("m", "f", "d"), 10, TRUE, prob = c(2, 2, 1))
df <- data.frame(ID = 1:10, medication = medication, gender = gender)
由reprex 包于 2022-02-06 创建(v2.0.1)