python - 找到字符串的完全匹配

Question

我使用以下函数来查找字符串中单词的完全匹配。

def exact_Match(str1, word):
    result = re.findall('\\b'+word+'\\b', str1, flags=re.IGNORECASE)
    if len(result)>0:
        return True
    else:
        return False

exact_Match(str1, word)

但是，当“award”和“award-winning”这两个词只应该为以下字符串获奖时，我得到了一个完全匹配的词。

str1 = "award-winning blueberries"
word1 = "award"
word2 = "award-winning"

我怎样才能让 re.findall 将整个单词与连字符和其他标点符号匹配？

score 6 · Accepted Answer

制作自己的单词边界：

def exact_Match(phrase, word):
    b = r'(\s|^|$)' 
    res = re.match(b + word + b, phrase, flags=re.IGNORECASE)
    return bool(res)

从这里复制粘贴到我的解释器：

>>> str1 = "award-winning blueberries"
>>> word1 = "award"
>>> word2 = "award-winning"
>>> exact_Match(str1, word1)
False
>>> exact_Match(str1, word2)
True

实际上，强制转换bool是不必要的，根本没有帮助。没有它，该功能会更好：

def exact_Match(phrase, word):
    b = r'(\s|^|$)' 
    return re.match(b + word + b, phrase, flags=re.IGNORECASE)

注意：exact_Match是非常非常规的外壳。就叫它exact_match。

score 1 · Accepted Answer

您的初始方法的问题在于，'\\b'它并不表示您正在寻找的零宽度断言搜索。（如果确实如此，我会改用反斜杠，r'\b'因为反斜杠在正则表达式中会成为真正的麻烦 -请参阅此链接）

来自正则表达式 HOWTO

\b

Word boundary. This is a zero-width assertion that matches only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character.

因为-是一个非字母数字字符，所以您的 findall 正则表达式将award在 inaward-wining但不在 in 中找到awards。

根据您搜索的短语，我也会考虑使用re.findall而不是re.match按照 Elazar 的建议。在您的示例re.match中有效，但是如果您要查找的单词嵌套在字符串开头之外的任何位置，re.match则不会成功。

python - 找到字符串的完全匹配

2 回答 2

Related

Reference