python - 在 Python 中迭代文件对象不起作用，但 readlines() 可以但效率低下

Question

在以下代码中，如果我使用：

for line in fin:

它只为'a'执行

但如果我使用：

wordlist = fin.readlines()
for line in wordlist:

然后它执行一个到 z。

但是readlines()一次读取整个文件，这是我不想要的。

如何避免这种情况？

def avoids():
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    num_words = {}

    fin = open('words.txt')

    for char in alphabet:
      num_words[char] = 0
      for line in fin:
        not_found = True
        word = line.strip()
        if word.lower().find(char.lower()) != -1:
          num_words[char] += 1
    fin.close()
    return num_words

score 8 · Accepted Answer

该语法for line in fin只能使用一次。完成此操作后，您已经用尽了文件，并且无法再次读取它，除非您“重置文件指针”通过fin.seek(0). 相反，fin.readlines()会给你一个列表，你可以一遍又一遍地迭代。

我认为使用Counter(python2.7+) 进行简单的重构可以为您省去这个头疼的问题：

from collections import Counter
with open('file') as fin:
    result = Counter()
    for line in fin:
        result += Counter(set(line.strip().lower()))

这将计算文件中包含特定字符的单词数（每行 1 个单词）（我相信这是您的原始代码......如果我错了，请纠正我）

您也可以使用defaultdict(python2.5+) 轻松完成此操作：

from collections import defaultdict
with open('file') as fin:
    result = defaultdict(int)
    for line in fin:
        chars = set(line.strip().lower())
        for c in chars:
            result[c] += 1

最后，把它踢老派——我什至不知道什么时候setdefault被介绍的……：

fin = open('file')
result = dict()
for line in fin:
    chars = set(line.strip().lower())
    for c in chars:
        result[c] = result.setdefault(c,0) + 1

fin.close()

score 5 · Accepted Answer

你有三个选择：

无论如何都要读入整个文件。
在尝试再次对其进行迭代之前，请返回文件的开头。
重新构建您的代码，以便它不需要多次迭代文件。

score 0 · Accepted Answer

尝试：

from collections import defaultdict
from itertools import product

def avoids():
    alphabet = 'abcdefghijklmnopqrstuvwxyz'

    num_words = defaultdict(int)

    with open('words.txt') as fin:
        words = [x.strip() for x in fin.readlines() if x.strip()]

    for ch, word in product(alphabet, words):
        if ch not in word:
             continue
        num_words[ch] += 1

    return num_words

python - 在 Python 中迭代文件对象不起作用，但 readlines() 可以但效率低下

3 回答 3

Related

Reference