python - 在 Python 中使用正则表达式匹配一个单词

Question

我正在使用 PRAW 制作一个 reddit 机器人，该机器人将某人的评论作者说“很多”并将他们的用户名存储到列表中。我在使用正则表达式以及如何使字符串工作时遇到问题。这是我的代码。

#importing praw for reddit api and time to make intervals

import praw
import time
import re


username = "LewisTheRobot"
password = 



r = praw.Reddit(user_agent = "Counts people who say alot")

word_to_match = ['\balot\b']

storage = []

r.login(username, password)

def run_bot():
    subreddit = r.get_subreddit("test")
    print("Grabbing subreddit")
    comments = subreddit.get_comments(limit=200)
    print("Grabbing comments")
    for comment in comments:
        comment_text = comment.body.lower()
        isMatch = any(string in comment_text for string in word_to_match)
        if comment.id not in storage and isMatch:
            print("Match found! Storing username: " + str(comment.author) + " into list.")
            storage.append(comment.author)


    print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")


while True:
    run_bot()
    time.sleep(5)

所以我正在使用的正则表达式查找单词 alot 而不是 alot 作为字符串的一部分。例子很多。每当我运行它时，它都不会找到我所做的评论。有什么建议么？

score 3 · Accepted Answer

您正在检查字符串操作，而不是RE操作

isMatch = any(string in comment_text for string in word_to_match)

这里的第一个in检查子字符串——与 RE 无关。

将此更改为

isMatch = any(re.search(string, comment_text) for string in word_to_match)

此外，您的初始化有一个错误：

word_to_match = ['\balot\b']

'\b'是带代码的字符0x08（退格）。始终对 RE 模式使用原始字符串语法，以避免此类陷阱：

word_to_match = [r'\balot\b']

现在您将有几个字符，反斜杠 then b，RE 将其解释为“单词边界”。

可能还有其他错误，但我尽量不要为每个问题寻找两个以上的错误...:-)

python - 在 Python 中使用正则表达式匹配一个单词

1 回答 1

Related

Reference