python - 在 python 2.6 中将单词定义为 2 个或更多字母

Question

我有一个我正在为课堂作业编写的 python 脚本，它计算文本文档中最常见的 10 个单词并显示这些单词及其频率。我能够让这部分脚本工作得很好，但是作业说一个单词被定义为2 个字母或更多。由于某种原因，我似乎无法将一个单词定义为 2 个或更多字母，当我运行脚本时，什么也没有发生。

# Most Frequent Words:
from string import punctuation
from collections import defaultdict

def sort_words(x, y):
    return cmp(x[1], y[1]) or cmp(y[0], x[0])

number = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("charactermask.txt")
                                             for word in line.split())
words = defaultdict(int)
for word in words_gen:
    words[word] +=1

letters = len(word)

while letters >= 2:
    top_words = sorted(words.iteritems(),
                        key=lambda(word, count): (-count, word))[:number] 

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

score 2 · Accepted Answer

您的脚本的一个问题是循环

while letters >= 2:
    top_words = sorted(words.iteritems(),
                        key=lambda(word, count): (-count, word))[:number]

您不是在循环这里的单词；这个循环将永远循环。您需要更改脚本，以便脚本的这一部分实际上迭代所有单词。（此外，您可能需要更改while为，if因为您只需要每个单词执行一次该代码。）

score 1 · Accepted Answer

我会重构你的代码~~并使用一个collections.Counter对象~~：

import collections
import string

with open("charactermask.txt") as f:
  words = [x.strip(string.punctuation).lower() for x in f.read().split()]

counter = collections.defaultdict(int):
for word in words:
  if len(word) >= 2:
    counter[word] += 1

python - 在 python 2.6 中将单词定义为 2 个或更多字母

2 回答 2

Related

Reference