python - 使用 Python 优化 Wordle Bot - 搜索包含 a、b 和 c 的单词？

Question

我一直在写一个 Wordle 机器人，想看看它是如何处理所有 13,000 个单词的。问题是我通过 for 循环运行它，效率非常低。运行 30 分钟后，它仅达到 5% 左右。我可以一直等，但最终会超过 10 个小时。必须有更有效的方法。我是python的新手，所以任何建议都将不胜感激。

这里的代码是用于限制每次猜测的代码。有没有办法搜索包含“a”、“b”和“c”的单词？而不是单独运行 3 次。现在，每次我需要搜索新字母时，containts、nocontains 和 isletter 都会运行。一起搜索它们将大大减少时间。

#Find the words that only match the criteria
def contains(letter, place):
    list.clear()
    for x in words:
        if x not in removed:
            if letter in x:
                if letter == x[place]:
                    removed.append(x)
                else:
                    list.append(x)
            else:
                removed.append(x)
def nocontains(letter):
    list.clear()
    for x in words:
        if x not in removed:
            if letter not in x:
                list.append(x)
            else:
                removed.append(x)
def isletter(letter, place):
    list.clear()
    for x in words:
        if x not in removed:
            if letter == x[place]:
                list.append(x)
            else:
                removed.append(x)

score 1 · Accepted Answer

使用集合可以大大减少性能问题。任何时候你想重复测试成员资格（即使只是几次），例如if x not in removed，你想尝试做一个集合。列表需要检查每个元素以找到 x，如果列表有数千个元素，这很糟糕。在 Python 集合中，如果有元素，if x not in removed则运行时间应该与或一样长，运行时间很短。removed100100,000

除此之外，尝试在任何地方都使用可变全局变量会遇到问题，例如 for list（需要重命名）和removed. 这样做没有任何好处，也有一些缺点，例如使您的代码更难推理或优化。Python 的一个好处是您可以将大型容器或对象传递给函数，而无需任何额外的时间或空间成本：调用函数f(huge_list)的速度和使用的内存一样多f(tiny_list)，就像您在其他语言中通过引用传递一样，所以不要犹豫将容器用作函数参数或返回类型。

总而言之，如果你去掉“list”和“removed”并将其存储为set可能的单词，那么这就是重构代码的方式：

all_words = []  # Huge word list to read in from text file
current_possible_words = set(all_words)

def contains_only_elsewhere(possible_words, letter, place):
    """Given letter and place, remove from possible_words
     all words containing letter but not at place"""
    to_remove = {word for word in possible_words
                 if letter not in word or word[place] == letter}
    return possible_words - to_remove

def must_not_contain(possible_words, letter):
    """Given a letter, remove from possible_words all words containing letter"""
    to_remove = {word for word in possible_words
                 if letter in word}
    return possible_words - to_remove

def exact_letter_match(possible_words, letter, place):
    """Given a letter and place, remove from possible_words
     all words not containing letter at place"""
    to_remove = {word for word in possible_words
                 if word[place] != letter}
    return possible_words - to_remove

外部代码会有所不同：例如，

current_possible_words = exact_letter_match(current_possible_words, 'a', 2)`

进一步的优化是可能的（现在更容易了）：只存储单词的索引而不是字符串；为每个字母预先计算包含该字母的所有单词的集合，等等。

score 0 · Accepted Answer

我刚刚写了一个 wordle 机器人，它可以在大约一秒钟内运行，包括网络抓取以获取 5 个字母单词的列表。

import urllib.request
from bs4 import BeautifulSoup

def getwords():
    source = "https://www.thefreedictionary.com/5-letter-words.htm"
    filehandle = urllib.request.urlopen(source)
    soup = BeautifulSoup(filehandle.read(), "html.parser")
    wordslis = soup.findAll("li", {"data-f": "15"})
    words = []
    for k in wordslis:
        words.append(k.getText())
    return words

words = getwords()

def hasLetterAtPosition(letter,position,word):
    return letter==word[position]

def hasLetterNotAtPosition(letter,position,word):
    return letter in word[:position]+word[position+1:]

def doesNotHaveLetter(letter,word):
    return not letter in word

lettersPositioned = [(0,"y")]
lettersMispositioned = [(0,"h")]
lettersNotHad = ["p"]

idx = 0
while idx<len(words):
    eliminated = False
    for criteria in lettersPositioned:
        if not hasLetterAtPosition(criteria[1],criteria[0],words[idx]):
            del words[idx]
            eliminated = True
            break
    if eliminated:
        continue
    for criteria in lettersMispositioned:
        if not hasLetterNotAtPosition(criteria[1],criteria[0],words[idx]):
            del words[idx]
            eliminated = True
            break
    if eliminated:
        continue
    for letter in lettersNotHad:
        if not doesNotHaveLetter(letter,words[idx]):
            del words[idx]
            eliminated = True
            break
    if eliminated:
        continue
    idx+=1

print(words) # ["youth"]

你的速度慢的原因是你有很多电话来检查是否删除了单词，除了检查每个检查的所有单词之外，还有一些多余的逻辑条件。

编辑：这是一个获取更多单词的获取单词功能。

def getwords():
    source = "https://wordfind-com.translate.goog/length/5-letter-words/?_x_tr_sl=es&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp"
    filehandle = urllib.request.urlopen(source)
    soup = BeautifulSoup(filehandle.read(), "html.parser")
    wordslis = soup.findAll("a", {"rel": "nofollow"})
    words = []
    for k in wordslis:
        words.append(k.getText())
    return words

python - 使用 Python 优化 Wordle Bot - 搜索包含 a、b 和 c 的单词？

2 回答 2

Related

Reference