python - 如何从日志文件加载所有 cPickle 转储？

Question

我将运行将大量（~1000）相对较小（50 个键：值对字符串）字典写入日志文件的代码。我将通过一个自动执行此操作的程序来执行此操作。我正在考虑运行如下命令：

import random
import string
import cPickle as pickle
import zlib

fieldNames = ['AICc','Npix','Nparameters','DoF','chi-square','chi-square_nu']

tempDict = {}
overview = {}
iterList = []

# Create example dictionary to add to the log.
for item in fieldNames:
  tempDict[item] = random.choice([random.uniform(2,5), '', ''.join([random.choice(string.lowercase) for x in range(5)])])

# Compress and pickle and add the example dictionary to the log.
# tried  with 'ab' and 'wb' 
# is .p.gz the right extension for this kind of file??
# with open('google.p.gz', 'wb') as fp: 
with open('google.p.gz', 'ab') as fp:
  fp.write(zlib.compress(pickle.dumps(tempDict, pickle.HIGHEST_PROTOCOL),9))

# Attempt to read in entire log
i = 0
with open('google.p.gz', 'rb') as fp:
  # Call pickle.loads until all dictionaries loaded. 
  while 1:
    try:     
      i += 1
      iterList.append(i)
      overview[i] = {}
      overview[i] = pickle.loads(zlib.decompress(fp.read()))
    except:
      break

print tempDict
print overview

我希望能够加载写入日志文件（google.p.gz）的最后一个字典，但它目前只加载第一个 pickle.dump。

另外，有没有更好的方法来做我正在做的所有事情？我四处寻找，感觉就像我是唯一一个做这种事情的人，我发现过去这是一个不好的迹象。

score 1 · Accepted Answer

您的输入和输出不匹配。当你输出你的记录时，你单独获取每条记录，腌制它，压缩它，然后将结果单独写入文件：

fp.write(zlib.compress(pickle.dumps(tempDict, pickle.HIGHEST_PROTOCOL),9))

但是当你输入你的记录时，你会读取整个文件，解压缩它，然后从中取出一个对象：

pickle.loads(zlib.decompress(fp.read()))

所以下次你打电话fp.read()时什么都没有了：你第一次阅读了整个文件。

因此，您必须将输入与输出相匹配。如何做到这一点取决于您的确切要求。假设您的要求是：

会有很多记录，需要在磁盘上压缩文件。
所有记录都一次性写入文件（您无需附加单独的记录）。
您不需要随机访问文件中的记录（您总是很乐意阅读整个文件以获取最后一条记录）。

有了这些要求，使用 . 单独压缩每个记录是个坏主意zlib。使用的DEFLATE 算法zlib通过查找重复序列来工作，因此最适用于大量数据。它不会对单张唱片起多大作用。所以让我们使用gzip模块来压缩和解压整个文件。

当我浏览它时，我对您的代码进行了一些其他改进。

import cPickle as pickle
import gzip
import random
import string

field_names = 'AICc Npix Nparameters DoF chi-square chi-square_nu'.split()

random_value_constructors = [
    lambda: random.uniform(2,5),
    lambda: ''.join(random.choice(string.lowercase)
                    for x in xrange(random.randint(0, 5)))]

def random_value():
    """
    Return a random value, either a small floating-point number or a
    short string.
    """
    return random.choice(random_value_constructors)()

def random_record():
    """
    Create and return a random example record.
    """
    return {name: random_value() for name in field_names}

def write_records(filename, records):
    """
    Pickle each record in `records` and compress them to `filename`.
    """
    with gzip.open(filename, 'wb') as f:
        for r in records:
            pickle.dump(r, f, pickle.HIGHEST_PROTOCOL)

def read_records(filename):
    """
    Decompress `filename`, unpickle records from it, and yield them.
    """
    with gzip.open(filename, 'rb') as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                return

python - 如何从日志文件加载所有 cPickle 转储？

1 回答 1

Related

Reference