python - 有效读取文件中的某一行

Question

遇到了一些在 Python 中读取文件的不同方法，我想知道哪种方法最快。

例如读取文件的最后一行，可以这样做

input_file = open('mytext.txt', 'r')
lastLine = ""
  for line in input_file:
    lastLine = line

print lastLine # This is the last line

或者

fileHandle = open('mytext.txt', 'r')
lineList = fileHandle.readlines()
print lineList[-1] #This is the last line

我假设对于这种特殊情况，这可能与讨论效率无关......

问题：

1.哪种方法选择随机线更快

2.我们可以在 Python 中处理“SEEK”之类的概念吗（如果可以的话会更快吗？）

score 1 · Accepted Answer

如果您不需要均匀分布（即，可以选择某些线的机会对于所有线不相等）和/或如果您的线的长度都差不多，那么选择随机线的问题可以简化为：

确定文件的大小（以字节为单位）
寻找一个随机位置
如果有，则搜索最后一个换行符（如果没有前一行，则可能没有）
选择直到下一个换行符或文件末尾的所有文本，以先到者为准。

对于 (2)，您可以有根据地猜测要向后搜索多远才能找到上一个换行符。如果你能分辨出一行平均是字节，那么你可以一步n读取前面的字节。n

score 0 · Accepted Answer

几天前我遇到了这个问题，我使用了这个解决方案。我的解决方案类似于@Frerich Raabe 的解决方案，但没有随机性，只是逻辑:)

def get_last_line(f):
    """ f is a file object in read mode, I just extract the algorithm from a bigger function """
    tries = 0
    offs = -512

    while tries < 5:
        # Put the cursor at n*512nth character before the end.
        # If we reach the max fsize, it puts the cursor at the beginning (fsize * -1 means move the cursor of -fsize from the end)
        f.seek(max(fsize * -1, offs), 2)
        lines = f.readlines()
        if len(lines) > 1:   # If there's more than 1 lines found, then we have the last complete line
            return lines[-1]  # Returns the last complete line
        offs *= 2
        tries += 1

    raise ValueError("No end line found, after 5 tries (Your file may has only 1 line or the last line is longer than %s characters)" % offs)

tries如果文件也有一行（最后一行非常长），则计数器避免被阻塞。该算法尝试从最后 512 个字符中获取最后一行，然后是 1024、2048 ......如果在th迭代中仍然没有完整的行，则停止。

python - 有效读取文件中的某一行

2 回答 2

Related

Reference