我想问这个问题的后续问题,因为出现了另一个问题:我发现了属于多个类别(艺术与人文和社会科学)的科目(例如文化研究),即那里是必须考虑的重叠。
我有很长的类别列表,例如这个机器可读的示例:
AB <- c("Science","Arts & Humanities","Arts & Humanities; Social Sciences","Science","Arts & Humanities; Arts & Humanities; Social Sciences","Science","Science; Social Sciences","Social Sciences; Science")
所以它看起来像这样:
> AB
[1] "Science" "Arts & Humanities"
[3] "Arts & Humanities; Social Sciences" "Science"
[5] "Arts & Humanities; Arts & Humanities; Social Sciences" "Science"
[7] "Science; Social Sciences" "Social Sciences; Science"
我想编辑这些术语并消除重复项以获得此结果:
[1] "Science" "Arts & Humanities"
[3] "Arts & Humanities; Social Sciences" "Science"
[5] "Arts & Humanities; Social Sciences" "Science"
[7] "Science; Social Sciences" "Science; Social Sciences"
所以我正在寻找另一个循环来消除#5 中的重复项。我尝试使用strsplit()和unique()但这不起作用:
> unique(strsplit(AB, "; *"))
[[1]]
[1] "Science"
[[2]]
[1] "Arts & Humanities"
[[3]]
[1] "Arts & Humanities" "Social Sciences"
[[4]]
[1] "Arts & Humanities" "Arts & Humanities" "Social Sciences"
[[5]]
[1] "Social Sciences" "Science"
所以我想再次问你:我怎样才能实现上面提到的正确输出?非常感谢您的考虑!