python - 如何用正则表达式匹配整个单词？

Question

我无法为以下场景找到正确的正则表达式：

让我们说：

a = "this is a sample"

我想匹配整个单词 - 例如 match"hi"应该返回 False，因为"hi"它不是一个单词，并且"is"应该返回 True，因为左侧和右侧都没有字母字符。

score 75 · Accepted Answer

尝试

re.search(r'\bis\b', your_string)

从文档：

\b 匹配空字符串，但只匹配单词的开头或结尾。

请注意，该re模块使用“单词”的简单定义作为“字母数字或下划线字符的序列”，其中“字母数字”取决于区域设置或 unicode 选项。

另请注意，如果没有原始字符串前缀，\b则被视为“退格”而不是正则表达式单词边界。

score 7 · Accepted Answer

尝试在正则表达式模块中使用“单词边界”字符类，re：

x="this is a sample"
y="this isis a sample."
regex=re.compile(r"\bis\b")  # For ignore case: re.compile(r"\bis\b", re.IGNORECASE)

regex.findall(y)
[]

regex.findall(x)
['is']

从re.search().

\b匹配空字符串，但仅在单词的开头或结尾

...

例如r'\bfoo\b'匹配'foo', 'foo.', '(foo)','bar foo baz'但不匹配'foobar'or'foo3'

score 3 · Accepted Answer

我认为使用给出的答案并没有完全实现 OP 所需的行为。具体来说，没有实现所需的布尔值输出。给出的答案确实有助于说明这个概念，我认为它们非常好。也许我可以通过说明我认为 OP 使用所使用的示例来说明我的意思，因为以下原因。

给出的字符串是，

a = "this is a sample"

OP然后说，

我想匹配整个单词 - 例如 match"hi"应该返回False，因为"hi"不是一个单词......

据我了解，该引用是对搜索标记的引用，"hi"因为它在单词"this". 如果有人要在字符串中搜索a单词， "hi"他们应该会收到False响应。

OP继续，

...并且"is"应该返回True，因为左侧和右侧都没有字母字符。

在这种情况下，引用是"is"在 word 中找到的搜索标记"is"。我希望这有助于澄清我们为什么使用单词边界。其他答案的行为是“不返回一个词，除非该词是由它自己找到的——不在其他词的内部”。“单词边界”速记字符类很好地完成了这项工作。

"is"到目前为止，仅在示例中使用了该词。我认为这些答案是正确的，但我认为还有更多问题的基本含义需要解决。应注意其他搜索字符串的行为以理解该概念。换句话说，我们需要将@georg的（优秀）答案概括为@OmPrakash的答案中也使用了re.match(r"\bis\b", your_string)相同的r"\bis\b"概念，他通过展示开始概括讨论

>>> y="this isis a sample."
>>> regex=re.compile(r"\bis\b")  # For ignore case: re.compile(r"\bis\b", re.IGNORECASE)
>>> regex.findall(y)
[]

假设应该表现出我讨论过的行为的方法被命名为

find_only_whole_word(search_string, input_string)

然后应该会出现以下行为。

>>> a = "this is a sample"
>>> find_only_whole_word("hi", a)
False
>>> find_only_whole_word("is", a)
True

再一次，这就是我理解 OP 问题的方式。通过@georg 的回答，我们朝着这种行为迈出了一步，但这有点难以解释/实施。以机智

>>> import re
>>> a = "this is a sample"
>>> re.search(r"\bis\b", a)
<_sre.SRE_Match object; span=(5, 7), match='is'>
>>> re.search(r"\bhi\b", a)
>>>

第二个命令没有输出。@OmPrakesh 的有用答案显示输出，但不显示Trueor False。

这是预期行为的更完整示例。

>>> find_only_whole_word("this", a)
True
>>> find_only_whole_word("is", a)
True
>>> find_only_whole_word("a", a)
True
>>> find_only_whole_word("sample", a)
True
# Use "ample", part of the word, "sample": (s)ample
>>> find_only_whole_word("ample", a)
False
# (t)his
>>> find_only_whole_word("his", a)
False
# (sa)mpl(e)
>>> find_only_whole_word("mpl", a)
False
# Any random word
>>> find_only_whole_word("applesauce", a)
False
>>>

这可以通过以下代码完成：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
#@file find_only_whole_word.py

import re

def find_only_whole_word(search_string, input_string):
  # Create a raw string with word boundaries from the user's input_string
  raw_search_string = r"\b" + search_string + r"\b"

  match_output = re.search(raw_search_string, input_string)
  ##As noted by @OmPrakesh, if you want to ignore case, uncomment
  ##the next two lines
  #match_output = re.search(raw_search_string, input_string, 
  #                         flags=re.IGNORECASE)

  no_match_was_found = ( match_output is None )
  if no_match_was_found:
    return False
  else:
    return True

##endof:  find_only_whole_word(search_string, input_string)

下面是一个简单的演示。从保存文件的同一目录运行 Python 解释器，find_only_whole_word.py.

>>> from find_only_whole_word import find_only_whole_word
>>> a = "this is a sample"
>>> find_only_whole_word("hi", a)
False
>>> find_only_whole_word("is", a)
True
>>> find_only_whole_word("cucumber", a)
False
# The excellent example from @OmPrakash
>>> find_only_whole_word("is", "this isis a sample")
False
>>>

score -6 · Accepted Answer

正则表达式的问题在于，如果您要在另一个字符串中搜索的 hte 字符串具有正则表达式字符，它就会变得复杂。任何带括号的字符串都会失败。

这段代码会找到一个词

 word="is"
    srchedStr="this is a sample"
    if srchedStr.find(" "+word+" ") >=0  or \
       srchedStr.endswith(" "+word):
        <do stuff>

条件的第一部分搜索每边都有一个空格的文本，第二部分捕获字符串结尾的情况。请注意，endwith 是布尔值，而find返回一个整数

python - 如何用正则表达式匹配整个单词？

4 回答 4

Related

Reference