我正在尝试使用stringi
包在分隔符上拆分(可能重复分隔符)但保留分隔符。这类似于我之前问过的这个问题:R split on delimiter (split) keep the delimiter (split) but the delimiter can be repeating。我不认为 basestrsplit
可以处理这种类型的正则表达式。包可以,stringi
但我不知道如何格式化正则表达式,如果有重复,它会在分隔符上拆分,也不要在字符串末尾留下空字符串。
Base R 解决方案、stringr、stringi 等解决方案都受到欢迎。
后来的问题发生了,因为我在 greedy*
上使用了\\s
但空间没有得到保证,所以我只能想把它留在:
MWE
text.var <- c("I want to split here.But also||Why?",
"See! Split at end but no empty.",
"a third string. It has two sentences"
)
library(stringi)
stri_split_regex(text.var, "(?<=([?.!|]{1,10}))\\s*")
# 结果
## [[1]]
## [1] "I want to split here." "But also|" "|" "Why?"
## [5] ""
##
## [[2]]
## [1] "See!" "Split at end but no empty." ""
##
## [[3]]
## [1] "a third string." "It has two sentences"
# 期望的结果
## [[1]]
## [1] "I want to split here." "But also||" "Why?"
##
## [[2]]
## [1] "See!" "Split at end but no empty."
##
## [[3]]
## [1] "a third string." "It has two sentences"