8

我有以下正则表达式,可以拆分任何空格或标点符号。如何从中排除 1 个或多个标点符号:punct:?假设我想排除撇号和逗号。我知道我可以明确地使用[all punctuation marks in here]而不是,[[:punct:]]但我希望有一种排除方法。

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)

 [1] "I"       "'"       "m"       "not"     "that"    "good"    "at"      "regex"   "yet"    
[10] ","       ""        "but"     "am"      "getting" "better"  "!"
4

2 回答 2

9

我不清楚你想要的结果是什么,但你也许可以使用像这个答案这样的否定类。

R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"    
于 2012-11-14T03:56:41.443 回答
0

如果右侧的下一个字符是or ,您可以直接使用(?![',]) 否定前瞻对 PCRE 子模式施加限制,该匹配失败:',

[[:space:]]|(?=(?![',])[[:punct:]])
               ^^^^^^^^ 

请参阅正则表达式演示

细节

  • [[:space:]]- 任何空格
  • |- 或者
  • (?=(?![',])[[:punct:]])- 一个积极的前瞻,要求在当前位置的右侧没有'and,并且有任何 1 个不是'or的标点符号(实际上,需要除and之外的,任何标点符号)。',

查看R 在线演示

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"
于 2018-02-13T22:33:06.150 回答