1

我见过类似的问题,但没有真正帮助我。我需要读入一个文本文件,拆分它,然后计算单词的长度。我还尝试将它们打印在表格中,左侧是单词的长度,右侧是实际单词。我的代码现在都搞砸了,因为我到了决定寻求帮助的地步。

a = open('owlcreek.txt').read().split()
lengths = dict()
for word in a:
    length = len(word)

if length not in lengths:
    for length, counter in lengths.items():
        print "Words of length %d: %d" % (length, counter)

#words=[line for line in a]
#print ("\n" .join(counts))

另外我想我需要编写一个小解析器来完成所有"!--工作。我尝试使用The Counter,但我想我不知道如何正确使用它。

4

2 回答 2

3

它应该是这样的:

a=open('owlcreek.txt').read().split()
lengths=dict()
for word in a:
    length = len(word)
    # if the key is not present, add it
    if not lengths.has_key(length):
        # the value should be the list of words
        lengths[length] = []
    # append the word to the list for length key
    lengths[length].append(word)

# print them out as length, count(words of that length)
for length, wrds in lengths.items():
    print "Words of length %d: %d" % (length, len(wrds))

希望这可以帮助!

于 2013-07-08T01:19:19.583 回答
0

一个简单的正则表达式就足以清除标点符号和空格。

编辑:如果我正确理解您的问题,您需要文本文件中的所有唯一单词,按长度排序。在这种情况下:

import re
import itertools

with open('README.txt', 'r') as file:
    words = set(re.findall(r"\w+'\w+|\w+", file.read())) # discard duplicates
    sorted_words = sorted(words, key=len)

for length, words in itertools.groupby(sorted_words, len):
    words = list(words)
    print("Words of length {0}: {1}".format(length, len(words)))
    for word in words:
        print(word)
于 2013-07-08T01:35:45.607 回答