python - 有什么方法可以提高 Python 读取文本文件的速度吗？

Question

我正在读取一个 587 kb 的文件，其中包含来自 az 的单词，例如：aa bb cc ...等等...现在，无论我编写什么代码，读取文件实际上需要 38 秒！

with open('dictionary.txt', encoding = 'utf-8') as dictionary:
      dictionary.read().splitlines()

我的问题是：如何在至少 4 秒内读取文件？此外，它必须返回列表中的所有单词。

问题已解决
“我明白了！我不是只在问题中选择一个随机单词，而是打印所有单词，我很傻。现在当我这样做时，它给了我一小部分单词一秒钟：pastie.org/8149529“

score 2 · Accepted Answer

这应该需要更少的内存，因为它会遍历行：

words = []
with open('dictionary.txt', encoding='utf-8') as dictionary:
    for line in dictionary:
        words.extend(line.split())

score 0 · Accepted Answer

将read()整个文件读入一个字符串，然后通过复制数据来拆分行。

按行流式传输数据将有助于：

with open( 'dictionary.txt', .... ) as dictionary:
    for line in dictionary:
         <do something with the line>

文件结构是否为每行一个单词？如果不是，那么可能会更加分裂。

score 0 · Accepted Answer

获取文件中所有单词的最佳方法：

>>> with open('dictionary.txt', encoding='utf-8') as dictionary:
    words = dictionary.read().split()

score 0 · Accepted Answer

你说“立即得到消息，但打印单词列表又要花很长时间。”

因此，您的问题提出的问题不存在。继续使用您发布的代码，并意识到打印到控制台需要时间，特别是如果您逐行打印，而不是一次保留/创建换行符和打印。

score 0 · Accepted Answer

with open('dictionary.txt', encoding = 'utf-8') as dictionary:
     list(dictionary)

也许？？？如果需要那么长时间我很好奇你的规格是什么

你能把结果贴出来吗

import time
s = time.time()
with open('dictionary.txt', encoding = 'utf-8') as dictionary:
     x=list(dictionary)
print time.time()-s

score 0 · Accepted Answer

我在一个 4 MB 的文本文件上运行了你的代码片段，在我的装有 OS X 的笔记本电脑上花了大约半秒钟。它确实打印了整个文件（出奇的快），而在 Windows 上，我预计这会非常慢。尝试将结果保存到一个变量中，这样它就不会打印它：

with open('dictionary.txt', encoding = 'utf-8') as dictionary:
    lines = dictionary.read().splitlines()

6 回答 6