python - 如何在单词文件中找到一个没有字母s且只包含一个元音的7个字母单词？

Question

我被要求编写一个代码，该代码将打印“dictionary.txt”（一个 250,000 个单词的文件）中的单词，该单词仅包含一个元音，没有字母“s”，长度为 7 个字母。我知道我必须定义一个打开文件并搜索这些要求的函数。

我不允许使用正则表达式，并且文件是每行一个单词。

这是我当前的python脚本：

a="a"
e="e"
i="i"
o="o"
u="u"
y="y"



def search():    
    Input=open("dictionary.txt","r") 
    for word in Input:
        word=Input.lower()
        vowel=len(word-a)==6 or len(word-e)==6 or len(word-i)==6 or len(word-o)==6 or len(word-u)==6 or len(word-y)==6
        if len(word)==7 and "s" not in word and vowel==True:
            return word

 print(search())

score 2 · Accepted Answer

不需要正则表达式。套路相当快。

text = open('dictionary.txt').read()

vowels = 'aeiou'
vowelsSet = set(vowels)

for word in text.split():
    word = word.lower()
    if len(word)==7 and not 's' in word and len(set(word)-vowelsSet)==6:
        print (word)

第一行中的开放阅读组合占用了单词集合——假设它在单词中不包含除撇号之外的标点符号并且不超过一行。

通过比较任何给定单词中字符集的大小与元音集的大小，可以确定元音是否重复。原理是，比如moan中的字符集大小为4，moon中的字符集大小为3。

score 2 · Accepted Answer

一个班轮正则表达式，用于挑战：

^(?:[b-df-hj-np-rtv-z])*[aeiou](?:[b-df-hj-np-rtv-z])*(?<=\w{7})$

(?:[b-df-hj-np-rtv-z])*不捕获 0 到多个辅音，除了 s
[aeiou]正好一个元音
(?:[b-df-hj-np-rtv-z])*不捕获 0 到多个辅音，除了 s

你现在有了“正好一个元音”的规则

(?<=\w{7})从这一点回到开头，看看是否匹配：正好 7 个字母

当然，我同意可以进行三个简单的测试以进行更好的维护。

score 1 · Accepted Answer

假设您的 dictionary.txt 仅包含空格分隔的单词和换行符，可以通过以下方式完成：

# Open the file and construct a list of single words
with open("dictionary.txt", "r") as infile:
    x = [i.strip() for i in infile.read().split(" ")]

# Function for finding number of vowels in a word
def vowels(word):
    count = 0
    for i in word:
        if i in 'aeoui':
            count += 1
    return count

# Check the length of each word, if it contains s and if the number of vowels is one at most
for i in x:
    if len(i) == 7 and "s" not in i and vowels(i) <= 1:
        print(i)

score 1 · Accepted Answer

通过使用正则表达式，这可能是完成任务的最简单和最简单的方法。

 with open("dictionary.txt","r") as file: #use r to open in read only mode to not mess with file
    words=[]
    for line in file: #loop through every line to get all words
        words.append(line)
import re

for word in words:
    if len(re.findall('[aeiou]', word)) == 1 and len(word)==7 and "s" not in word: #checks if there is only one vowel and length is 7
        print(word)

编辑： 因为您已经编辑说您不允许使用正则表达式，所以您可以这样做。

with open("dictionary.txt","r") as file: 
        words=[]
        for line in file: #loop through every line to get all words
            words.append(line)

for word in words:
    if sum(letter in "aeiou" for letter in word)==1 and "s" not in word and len(word)==7:
        print(word)

score 0 · Accepted Answer

我不在办公桌前，所以我不能给你一个编码的答案，但我的第一反应是使用正则表达式来选择你想要的单词。“re”库是您要开始的地方。

https://pymotw.com/2/re/

他们需要一点时间来适应，但它们对于筛选字符串非常宝贵。

如果您对它们完全陌生，那么有很多像这个 ( https://regexone.com/ ) 这样的交互式教程可以帮助您入门。

score 0 · Accepted Answer

假设您将整个字典文件读入一个数组，然后遍历该数组（使用“单词”作为循环变量），将其放在循环之前：

import re

# this to make sure there is no 's' in word and its length is exactly 7 characters
no_s_re = re.compile(r'^[a-rt-z]{7}$', re.IGNORECASE)

# this to count vowels (later)
vowels_re = re.compile(r'[aioue]', re.IGNORECASE)

这是循环体：

if no_s_re.match(word) and len(vowels_re.findall(word)) == 1:
     print word

python - 如何在单词文件中找到一个没有字母s且只包含一个元音的7个字母单词？

6 回答 6

Related

Reference