我需要拆分单词和结束标记(某些类型的标点符号)。奇怪的管道(“|”)可以算作结束标记。在我尝试添加管道之前,我已经在结束标记上编写了代码。添加管道使strsplit
每个字符。逃避它会导致错误。如何在正则表达式中包含管道?
x <- "I like the dog|."
strsplit(x, "[[:space:]]|(?=[.!?*-])", perl=TRUE)
#[[1]]
#[1] "I" "like" "the" "dog|" "."
strsplit(x, "[[:space:]]|(?=[.!?*-\|])", perl=TRUE)
#Error: '\|' is an unrecognized escape in character string starting "[[:space:]]|(?=[.!?*-\|"
我想要的结果:
#[[1]]
#[1] "I" "like" "the" "dog" "|" "." #pipe is an element