python - 如何在 Python 中迭代以空格分隔的 ASCII 文件

Question

奇怪的问题在这里。

我有一个.txt要迭代的文件。我可以将文件中的所有单词放入一个数组中，这很好，但我想知道如何做的是，我如何遍历整个文件，而不是单个字母，而是单词本身。

我希望能够通过包含文件中所有文本的数组，并基本上计算其中出现单词的所有实例。

唯一的问题是我不知道如何为它编写代码。

我尝试使用 for 循环，但是当我想要整个单词时，它只会遍历每个字母。

score 12 · Accepted Answer

此代码读取空格分隔的 file.txt

f = open("file.txt", "r")
words = f.read().split()
for w in words:
    print w

score 3 · Accepted Answer

3

file = open("test")
for line in file:
    for word in line.split(" "):
         print word

于 2012-05-04T05:25:13.303 回答

score 1 · Accepted Answer

未经测试：

def produce_words(file_):
   for line in file_:
     for word in line.split():
        yield word

def main():
   with open('in.txt', 'r') as file_:
      for word in produce_words(file_):
         print word

score 1 · Accepted Answer

如果您想遍历整个文件，那么明智的做法是对其进行迭代，获取行并将它们拆分为单词。逐行工作是最好的，因为这意味着我们不会先将整个文件读入内存（对于大文件，这可能会花费大量时间或导致内存不足）：

with open('in.txt') as input:
    for line in input:
        for word in line.split():
            ...

请注意，line.split(" ")如果您想保留更多的空白，您可以使用，因为这line.split()将删除所有多余的空白。

还要注意我使用该with语句来打开文件，因为它更具可读性并且可以处理关闭文件，即使出现异常也是如此。

虽然这是一个很好的解决方案，但如果您在第一个循环中没有做任何事情，它也有点低效。要将其减少到一个循环，我们可以使用itertools.chain.from_iterable和生成器表达式：

import itertools
with open('in.txt') as input:
    for word in itertools.chain.from_iterable(line.split() for line in input):
            ...

python - 如何在 Python 中迭代以空格分隔的 ASCII 文件

4 回答 4

Related

Reference