python - 将文件下载到内存中

Question

我正在编写一个 python 脚本，我只需要一系列非常小的文本文件的第二行。我想在不将文件保存到我的硬盘驱动器的情况下提取它，就像我目前所做的那样。

我发现了一些引用 TempFile 和 StringIO 模块的线程，但我无法理解它们。

目前我下载所有文件并按顺序命名它们，如 1.txt、2.txt 等，然后遍历所有文件并提取第二行。我想打开文件抓住线然后继续查找并打开和读取下一个文件。

这是我目前将其写入硬盘的操作：

while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    linkFile = open('Summary.txt', 'a')
    linkFile.write(cand_summary)
    linkFile.write("\n")
    count4 = count4 + 1
    linkFile.close()

score 0 · Accepted Answer

您在每次迭代中打开和关闭输出文件。

为什么不简单地做

with open("Summary.txt", "w") as linkfile:
    while (count4 <= num_files):
        file_p = [directory,str(count4),'.txt']
        file_path = ''.join(file_p)        
        cand_summary = linecache.getline(file_path, 2).strip() # string module is deprecated
        linkFile.write(cand_summary)
        linkFile.write("\n")
        count4 = count4 + 1

此外，linecache这里可能不是正确的工具，因为它针对从同一个文件中读取多行进行了优化，而不是从多个文件中读取同一行。

相反，最好做

with open(file_path, "r") as infile:
    dummy = infile.readline()
    cand_summary = infile.readline.strip()

此外，如果您放弃该strip()方法，则不必重新添加\n，但谁知道您为什么要在其中添加该方法。也许.lstrip()会更好？

最后，手动 while 循环有什么用？为什么不使用 for 循环？

最后，在您发表评论后，我了解您希望将结果放入列表而不是文件中。好的。

总而言之：

summary = []
for count in xrange(num_files):
    file_p = [directory,str(count),'.txt'] # or count+1, if you start at 1
    file_path = ''.join(file_p)        
    with open(file_path, "r") as infile:
        dummy = infile.readline()
        cand_summary = infile.readline().strip()
        summary.append(cand_summary)

score 0 · Accepted Answer

只需将文件写入替换为append()对列表的调用即可。例如：

summary = []
while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    summary.append(cand_summary)
    count4 = count4 + 1

顺便说一句，您通常会写count += 1. 它看起来也count4使用基于 1 的索引。这对 Python 来说似乎很不寻常。

python - 将文件下载到内存中

2 回答 2

Related

Reference