python - 在文本文件中查找包含特定字符且具有特定长度的单词

Question

我试图在一个长度为 7 个字母并包含字母 a、b、c、e 和 r 的文本文件中查找单词。到目前为止，我有这个：

import re

file = open("dictionary.txt","r")
text = file.readlines()
file.close()


keyword = re.compile(r'\w{7}')

for line in text:
    result = keyword.search (line)
    if result:
       print (result.group())

谁能帮我？

score 2 · Accepted Answer

您不仅需要匹配单词字符，还需要匹配单词边界：

keyword = re.compile(r'\b\w{7}\b')

锚点匹配单词的\b开头或结尾，将单词限制为正好7 个字符。

如果您逐行遍历文件而不是一次性将其全部读入内存，效率会更高：

import re

keyword = re.compile(r'\b\w{7}\b')

with open("dictionary.txt","r") as dictionary:    
    for line in dictionary:
        for result in keyword.findall(line):
            print(result)

Usingkeyword.findall()为我们提供了在线上所有匹配项的列表。

要检查匹配项中是否至少包含一个必需的字符，我个人只会使用集合交集测试：

import re

keyword = re.compile(r'\b\w{7}\b')
required = set('abcer')

with open("dictionary.txt","r") as dictionary:    
    for line in dictionary:
        results = [required.intersection(word) for word in keyword.findall(line)]
        for result in results
            print(result)

score 1 · Accepted Answer

\b(?=\w{0,6}?[abcer])\w{7}\b

That's the regular expression you want. It works by using the basic form for a word of exactly seven letters (\b\w{7}\b) and adding a lookahead - a zero width assertion that looks forward and tries to find one of your required letters. A breakdown:

\b            A word boundary
(?=           Look ahead and find...
    \w        A word character (A-Za-z0-9_)
    {0,6}     Repeated 0 to 6 times
    ?         Lazily (not necessary, but marginally more efficient).
    [abcer]   Followed by one of a, b, c, e, or r
)             Go back to where we were before (just after the word boundary
\w            And match a word character
{7}           Exactly seven times.
\b            Then one more word Boundary.

python - 在文本文件中查找包含特定字符且具有特定长度的单词

2 回答 2

Related

Reference