尝试仅使用表达式s
而不是perl(s)
:
library(stringr)
s<-"(.+?)( PIAZZALE | SS)(.+?)([0-9]{5})"
a<-str_match("MONT SS DPR PIAZZALE CADORNA, 1A RICCIONE 47838",s)
b<-str_replace("MONT SS DPR PIAZZALE CADORNA, 1A RICCIONE 47838",s, "\\2")
a[3]
#[1] " PIAZZALE "
b
#[1] " PIAZZALE "
我查看了该库的文档:
http: //cran.r-project.org/web/packages/stringr/stringr.pdf
它表明,虽然该str_replace
方法默认可以接受 POSIX 模式,并且如果提供了 perl 模式,str_match
则只能接受 POSIX 样式模式,并且如果提供了 perl 模式,则将按原样处理该模式。他们提供不同值的原因是他们使用了不同的表达式引擎。str_detect
可以使用 perl 表达式并返回TRUEE
or FALSE
。您可以使用该str_detect
方法而不是 match 方法吗?
导致这种情况的 POSIX 和 perl 之间的区别:
POSIX 引擎不识别惰性(非贪婪)量词。
你的表情
(.+?)( PIAZZALE | SS)(.+?)([0-9]{5})
将被视为 perl 等价物
(.+)( PIAZZALE | SS)(.+)([0-9]{5})
在回溯和评估表达式的其余部分之前,第一个量化的类.+
将尽可能匹配(完整的字符串)。当第一个量化类.+
从字符串的末尾一直返回并消耗MONT SS DPR
仅留给 SS
第二个捕获组的字符时,它是成功的a[3]
发动机内部工作的简化说明
以下是不同引擎如何处理您的字符串的简化说明。您的所有量词/替代都直接包装在捕获组中,因此以下示例中的编号量词也是您的捕获组:
珀尔:
Quantifier 1: "M"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MO"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MON"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MONT"
Quantifier 2: " SS"
Quantifier 3: " "
Quantifier 4: FAILED - MUST BACKTRACK
Quantifier 1: "MONT"
Quantifier 2: " SS"
Quantifier 3: " D"
Quantifier 4: FAILED - MUST BACKTRACK
...
Quantifier 1: "MONT"
Quantifier 2: " SS"
Quantifier 3: " DPR PIAZZALE CADORNA, 1A RICCIONE "
Quantifier 4: "47838"
SUCCESS
POSIX:
Quantifier 1: "MONT SS DPR PIAZZALE CADORNA, 1A RICCIONE 47838"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MONT SS DPR PIAZZALE CADORNA, 1A RICCIONE 4783"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MONT SS DPR PIAZZALE CADORNA, 1A RICCIONE 478"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MONT SS DPR PIAZZALE CADORNA, 1A RICCIONE 47"
Quantifier 2: FAILED - MUST BACKTRACK
...
Quantifier 1: "MONT SS DPR P"
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MONT SS DPR "
Quantifier 2: FAILED - MUST BACKTRACK
Quantifier 1: "MONT SS DPR "
Quantifier 2: " PIZZALE "
Quantifier 3: "CADORNA, 1A RICCIONE 47838"
Quantifier 4: FAILED - MUST BACKTRACK
...
Quantifier 1: "MONT SS DPR "
Quantifier 2: " PIZZALE "
Quantifier 3: "CADORNA, 1A RICCIONE "
Quantifier 4: "47838"
SUCCESS