python - 使用python限制bz2文件解压？

Question

我有许多以 bz2 格式压缩的文件，我正在尝试使用 python 将它们解压缩到一个临时目录中，然后进行分析。有数十万个文件，因此手动解压缩文件是不可行的，所以我编写了以下脚本。

我的问题是，每当我尝试这样做时，最大文件大小为 900 kb，即使手动解压缩每个文件大约为 6 MB。我不确定这是否是我的代码中的一个缺陷，以及我如何将数据保存为字符串然后复制到文件或其他问题。我已经尝试过使用不同的文件，并且我知道它适用于小于 900 kb 的文件。有没有其他人遇到过类似的问题并知道解决方案？

我的代码如下：

import numpy as np
import bz2
import os
import glob

def unzip_f(filepath):
    '''
    Input a filepath specifying a group of Himiwari .bz2 files with common names
    Outputs the path of all the temporary files that have been uncompressed

    '''


    cpath = os.getcwd() #get current path
    filenames_ = []  #list to add filenames to for future use

    for zipped_file in glob.glob(filepath):  #loop over the files that meet the name criterea
        with bz2.BZ2File(zipped_file,'rb') as zipfile:   #Read in the bz2 files
            newfilepath = cpath +'/temp/'+zipped_file[-47:-4]     #create a temporary file
            with open(newfilepath, "wb") as tmpfile: #open the temporary file
                for i,line in enumerate(zipfile.readlines()):
                    tmpfile.write(line) #write the data from the compressed file to the temporary file



            filenames_.append(newfilepath)
    return filenames_


path_='test/HS_H08_20180930_0710_B13_FLDK_R20_S*bz2'
unzip_f(path_)

它返回正确的文件路径，但大小错误，上限为 900 kb。

score 1 · Accepted Answer

事实证明，这个问题是由于文件是多流，在 python 2.7 中不起作用。正如 jasonharper 和这里所提到的，这里有更多信息。下面是一个解决方案，只需使用 Unix 命令解压缩 bz2 文件，然后将它们移动到我想要的临时目录。它不那么漂亮，但它有效。

import numpy as np
import os
import glob
import shutil

def unzip_f(filepath):
    '''
    Input a filepath specifying a group of Himiwari .bz2 files with common names
    Outputs the path of all the temporary files that have been uncompressed

    '''


    cpath = os.getcwd() #get current path
    filenames_ = []  #list to add filenames to for future use

    for zipped_file in glob.glob(filepath):  #loop over the files that meet the name criterea
        newfilepath = cpath +'/temp/'   #create a temporary file
        newfilename = newfilepath + zipped_file[-47:-4]

        os.popen('bzip2 -kd ' + zipped_file)
        shutil.move(zipped_file[-47:-4],newfilepath)

        filenames_.append(newfilename)
    return filenames_



path_='test/HS_H08_20180930_0710_B13_FLDK_R20_S0*bz2'

unzip_f(path_)

score 0 · Accepted Answer

这是 Python2 中的一个已知限制，其中BZ2File该类不支持多个流。这可以通过使用https://pypi.org/project/bz2file/bz2file轻松解决，它是 Python3 实现的反向移植，可以用作替代品。

运行后pip install bz2file你可以bz2用它替换： import bz2file as bz2一切都应该正常工作:)

原始 Python 错误报告：https ://bugs.python.org/issue1625

python - 使用python限制bz2文件解压？

2 回答 2

Related

Reference