58

给定一个字符串

test_1<-"abc def,ghi klm"
test_2<-"abc, def ghi klm"

我想获得

"abc"
"def"
"ghi"

但是,使用 strsplit,必须知道字符串中拆分值的顺序,因为 strsplit 使用第一个值进行第一次拆分,第二个进行第二次...然后循环使用。

但这不会:

strsplit(test_1, c(",", " "))
strsplit(test_2, c(" ", ","))

strsplit(test_2, split=c("[:punct:]","[:space:]"))[[1]]

我希望在一步中找到任何拆分值的地方拆分字符串。

4

4 回答 4

70

实际上strsplit也使用 grep 模式。(逗号是正则表达式元字符,而空格不是;因此需要对模式参数中的逗号进行双重转义。因此,使用 of"\\s"将更多地提高可读性而不是必要的):

> strsplit(test_1, "\\, |\\,| ")  # three possibilities OR'ed
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_2, "\\, |\\,| ")
[[1]]
[1] "abc" "def" "ghi" "klm"

如果不同时使用\\,and \\, (请注意 SO 未显示的额外空间),您将获得一些 character(0) 值。如果我写的话可能会更清楚:

> strsplit(test_2, "\\,\\s|\\,|\\s")
[[1]]
[1] "abc" "def" "ghi" "klm"

@Fojtasek 非常正确:使用字符类通常可以简化任务,因为它创建了一个隐含的逻辑 OR:

> strsplit(test_2, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_1, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"
于 2012-05-24T13:52:30.193 回答
9

如果你不喜欢正则表达式,你可以strsplit()多次调用:

strsplits <- function(x, splits, ...)
{
    for (split in splits)
    {
        x <- unlist(strsplit(x, split, ...))
    }
    return(x[!x == ""]) # Remove empty values
}

strsplits(test_1, c(" ", ","))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c(" ", ","))
# "abc" "def" "ghi" "klm"

为添加的示例更新

strsplits(test_1, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"

但是如果你要使用正则表达式,你不妨使用@DWin 的方法:

strsplit(test_1, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"
strsplit(test_2, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"
于 2012-05-24T13:57:00.847 回答
5

你可以和strsplit(test_1, "\\W").

于 2012-05-24T13:55:27.320 回答
1
 test_1<-"abc def,ghi klm"
 test_2<-"abc, def ghi klm"
 key_words <- c("abc","def","ghi")
 matches <- str_c(key_words, collapse ="|")
 str_extract_all(test_1, matches)
 str_extract_all(test_2, matches)
于 2016-04-24T21:57:44.300 回答