7

我有一个名为test.txt. 我想阅读它并从文件中返回所有单词的列表(删除了换行符)。

这是我当前的代码:

def read_words(test.txt):
    open_file = open(words_file, 'r')
    words_list =[]
    contents = open_file.readlines()
    for i in range(len(contents)):
         words_list.append(contents[i].strip('\n'))
    return words_list    
    open_file.close()  

运行此代码会生成此列表:

['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']

我希望列表看起来像这样:

['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']
4

5 回答 5

20

根据文件的大小,这似乎很简单:

with open(file) as f:
    words = f.read().split()
于 2012-11-06T21:21:12.543 回答
14

words_list.append(...)将for 循环中的行替换为以下内容:

words_list.extend(contents[i].split())

这将在空白字符上分割每一行,然后将结果列表的每个元素添加到words_list.

或者作为将整个函数重写为列表理解的替代方法:

def read_words(words_file):
    return [word for line in open(words_file, 'r') for word in line.split()]
于 2012-11-06T21:00:26.557 回答
5

这是我的写法:

def read_words(words_file):
  with open(words_file, 'r') as f:
    ret = []
    for line in f:
      ret += line.split()
    return ret

print read_words('test.txt')

使用 可以稍微缩短该函数itertools,但我个人发现结果可读性较差:

import itertools

def read_words(words_file):
  with open(words_file, 'r') as f:
    return list(itertools.chain.from_iterable(line.split() for line in f))

print read_words('test.txt')

第二个版本的好处是它可以完全基于生成器,从而避免一次将所有文件的单词保存在内存中。

于 2012-11-06T21:06:47.757 回答
3

有几种方法可以做到这一点。这里有几个:

如果您不关心重复的单词

def getWords(filepath):
    with open('filepath') as f:
        return list(itertools.chain(line.split() for line in f))

如果要返回每个单词仅出现一次的单词列表

注意:这不会保留单词的顺序

def getWords(filepath):
    with open('filepath') as f:
        return {word for word in line.split() for line in f} # python2.7
        return set((word for word in line.split() for line in f)) # python 2.6

如果你想要一个集合——并且——想要保留单词的顺序

def getWords(filepath):
    with open('filepath') as f:
        words = []
        pos = {}
        position = itertools.count()
        for line in f:
            for word in line.split():
                if word not in pos:
                    pos[word] = position.next()
                        words.append(word)
    return sorted(words, key=pos.__getitem__)

如果你想要一个词频词典

def getWords(filepath):
    with open('filepath') as f:
        return collections.Counter(itertools.chain(line.split() for line in file))

希望这些帮助

于 2012-11-06T21:34:08.350 回答
0

实际问题已经得到解答,但我想指出,f.close() 行将不会被执行,因为函数在该行之前返回。尝试在 return 语句之前编写 f.close() 。

于 2019-10-28T23:25:07.433 回答