python - 在python中读取文件后返回单词列表

Question

我有一个名为test.txt. 我想阅读它并从文件中返回所有单词的列表（删除了换行符）。

这是我当前的代码：

def read_words(test.txt):
    open_file = open(words_file, 'r')
    words_list =[]
    contents = open_file.readlines()
    for i in range(len(contents)):
         words_list.append(contents[i].strip('\n'))
    return words_list    
    open_file.close()

运行此代码会生成此列表：

['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']

我希望列表看起来像这样：

['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']

score 20 · Accepted Answer

20

根据文件的大小，这似乎很简单：

with open(file) as f:
    words = f.read().split()

于 2012-11-06T21:21:12.543 回答

score 14 · Accepted Answer

words_list.append(...)将for 循环中的行替换为以下内容：

words_list.extend(contents[i].split())

这将在空白字符上分割每一行，然后将结果列表的每个元素添加到words_list.

或者作为将整个函数重写为列表理解的替代方法：

def read_words(words_file):
    return [word for line in open(words_file, 'r') for word in line.split()]

score 5 · Accepted Answer

这是我的写法：

def read_words(words_file):
  with open(words_file, 'r') as f:
    ret = []
    for line in f:
      ret += line.split()
    return ret

print read_words('test.txt')

使用可以稍微缩短该函数itertools，但我个人发现结果可读性较差：

import itertools

def read_words(words_file):
  with open(words_file, 'r') as f:
    return list(itertools.chain.from_iterable(line.split() for line in f))

print read_words('test.txt')

第二个版本的好处是它可以完全基于生成器，从而避免一次将所有文件的单词保存在内存中。

score 3 · Accepted Answer

有几种方法可以做到这一点。这里有几个：

如果您不关心重复的单词：

def getWords(filepath):
    with open('filepath') as f:
        return list(itertools.chain(line.split() for line in f))

如果要返回每个单词仅出现一次的单词列表：

注意：这不会保留单词的顺序

def getWords(filepath):
    with open('filepath') as f:
        return {word for word in line.split() for line in f} # python2.7
        return set((word for word in line.split() for line in f)) # python 2.6

如果你想要一个集合——并且——想要保留单词的顺序：

def getWords(filepath):
    with open('filepath') as f:
        words = []
        pos = {}
        position = itertools.count()
        for line in f:
            for word in line.split():
                if word not in pos:
                    pos[word] = position.next()
                        words.append(word)
    return sorted(words, key=pos.__getitem__)

如果你想要一个词频词典：

def getWords(filepath):
    with open('filepath') as f:
        return collections.Counter(itertools.chain(line.split() for line in file))

希望这些帮助

score 0 · Accepted Answer

实际问题已经得到解答，但我想指出，f.close() 行将不会被执行，因为函数在该行之前返回。尝试在 return 语句之前编写 f.close() 。

python - 在python中读取文件后返回单词列表

5 回答 5

Related

Reference