regex - 在 R 中使用 grepl 完成单词匹配

Question

考虑以下示例：

> testLines <- c("I don't want to match this","This is what I want to match")
> grepl('is',testLines)
> [1] TRUE TRUE

不过，我想要的是仅当它作为一个单词单独存在时才匹配“is”。通过阅读一些 perl 文档，似乎这样做的方法是使用 \b，一个可用于识别模式之前和之后的锚点，即 \bword\b 匹配 'word' 但不匹配 'sword '。所以我尝试了以下示例，使用的 Perl 语法设置为“TRUE”：

> grepl('\bis\b',testLines,perl=TRUE)
> [1] FALSE FALSE

我正在寻找的输出是FALSE TRUE.

score 30 · Accepted Answer

“\<”是单词开头的另一个转义序列，“\>”是结尾。在 R 字符串中，您需要将反斜杠加倍，因此：

> grepl("\\<is\\>", c("this", "who is it?", "is it?", "it is!", "iso"))
[1] FALSE  TRUE  TRUE  TRUE FALSE

请注意，这匹配“是！” 但不是“iso”。

score 20 · Accepted Answer

您需要双重转义才能将转义传递给正则表达式：

> grepl("\\bis\\b",testLines)
[1] FALSE  TRUE

score 6 · Accepted Answer

非常简单，匹配前导空格：

testLines <- c("I don't want to match this","This is what I want to match")
grepl(' is',testLines)
[1] FALSE  TRUE

正则表达式远不止这些，但本质上，模式需要更具体。在更一般的情况下，您需要的是一个巨大的话题。见 ? 正则表达式

适用于此示例的其他可能性：

grepl(' is ',testLines)
[1] FALSE  TRUE
grepl('\\sis',testLines)
[1] FALSE  TRUE
grepl('\\sis\\s',testLines)
[1] FALSE  TRUE

3 回答 3