32

我想计算文件的CRC并得到如下输出:E45A12AC. 这是我的代码:

#!/usr/bin/env python 
import os, sys
import zlib

def crc(fileName):
    fd = open(fileName,"rb")
    content = fd.readlines()
    fd.close()
    for eachLine in content:
        zlib.crc32(eachLine)

for eachFile in sys.argv[1:]:
    crc(eachFile)

这会计算每一行的 CRC,但它的输出(例如-1767935985)不是我想要的。

Hashlib 以我想要的方式工作,但它计算 md5:

import hashlib
m = hashlib.md5()
for line in open('data.txt', 'rb'):
    m.update(line)
print m.hexdigest()

是否可以使用 获得类似的东西zlib.crc32

4

10 回答 10

33

更紧凑和优化的代码

def crc(fileName):
    prev = 0
    for eachLine in open(fileName,"rb"):
        prev = zlib.crc32(eachLine, prev)
    return "%X"%(prev & 0xFFFFFFFF)

PS2:旧 PS 已被弃用 - 因此已删除 - 因为评论中的建议。谢谢你。我不明白,我怎么错过了这个,但它真的很好。

于 2010-03-05T15:35:38.070 回答
18

kobor42 答案的修改版本,通过读取固定大小的块而不是“行”,性能提高了 2-3 倍:

import zlib

def crc32(fileName):
    with open(fileName, 'rb') as fh:
        hash = 0
        while True:
            s = fh.read(65536)
            if not s:
                break
            hash = zlib.crc32(s, hash)
        return "%08X" % (hash & 0xFFFFFFFF)

还包括返回字符串中的前导零。

于 2019-09-27T20:47:09.117 回答
13

用于 CRC-32 支持的hashlib兼容接口:

导入 zlib

crc32 类(对象):
    名称 = 'crc32'
    摘要大小 = 4
    块大小 = 1

    def __init__(self, arg=''):
        self.__digest = 0
        自我更新(arg)

    定义副本(自我):
        复制=超级(self.__class__,self).__new__(self.__class__)
        复制.__digest = self.__digest
        返回副本

    def 摘要(自我):
        返回自我.__摘要

    def hexdigest(自我):
        返回 '​​{:08x}'.format(self.__digest)

    定义更新(自我,arg):
        self.__digest = zlib.crc32(arg, self.__digest) & 0xffffffff

# 现在你可以定义 hashlib.crc32 = crc32
导入哈希库
hashlib.crc32 = crc32

# Python > 2.7: hashlib.algorithms += ('crc32',)
# Python > 3.2: hashlib.algorithms_available.add('crc32')
于 2011-02-21T02:59:22.900 回答
9

要将任何整数的最低 32 位显示为 8 个不带符号的十六进制数字,您可以按位“屏蔽”该值,并使用由 32 位组成的掩码将其全部设为值 1,然后应用格式设置。IE:

>>> x = -1767935985
>>> format(x & 0xFFFFFFFF, '08x')
'969f700f'

您因此格式化的整数来自zlib.crc32或任何其他计算都完全无关紧要。

于 2009-11-16T15:43:08.120 回答
5

Python 3.8+(使用海象运算符):

import zlib

def crc32(filename, chunksize=65536):
    """Compute the CRC-32 checksum of the contents of the given filename"""
    with open(filename, "rb") as f:
        checksum = 0
        while (chunk := f.read(chunksize)) :
            checksum = zlib.crc32(chunk, checksum)
        return checksum

chunksize是您一次读取文件的字节数。不管你把它设置成什么,你都会为同一个文件得到相同的哈希值(设置得太低可能会使你的代码变慢,太高可能会占用太多内存)。

结果是一个 32 位整数。空文件的 CRC-32 校验和为0.

于 2020-01-29T19:51:38.677 回答
3

编辑为在下面包含 Altren 的解决方案。

CrouZ 答案的修改和更紧凑的版本,性能略有提高,使用 for 循环和文件缓冲:

def forLoopCrc(fpath):
    """With for loop and buffer."""
    crc = 0
    with open(fpath, 'rb', 65536) as ins:
        for x in range(int((os.stat(fpath).st_size / 65536)) + 1):
            crc = zlib.crc32(ins.read(65536), crc)
    return '%08X' % (crc & 0xFFFFFFFF)

结果,在 6700k 硬盘中:

(注意:经过多次重新测试,它始终更快。)

Warming up the machine...
Finished.

Beginning tests...
File size: 90288KB
Test cycles: 500

With for loop and buffer.
Result 45.24728019630359 

CrouZ solution
Result 45.433838356097894 

kobor42 solution
Result 104.16215688703986 

Altren solution
Result 101.7247863946586  

使用以下脚本在 Python 3.6.4 x64 中测试:

import os, timeit, zlib, random, binascii

def forLoopCrc(fpath):
    """With for loop and buffer."""
    crc = 0
    with open(fpath, 'rb', 65536) as ins:
        for x in range(int((os.stat(fpath).st_size / 65536)) + 1):
            crc = zlib.crc32(ins.read(65536), crc)
    return '%08X' % (crc & 0xFFFFFFFF)

def crc32(fileName):
    """CrouZ solution"""
    with open(fileName, 'rb') as fh:
        hash = 0
        while True:
            s = fh.read(65536)
            if not s:
                break
            hash = zlib.crc32(s, hash)
        return "%08X" % (hash & 0xFFFFFFFF)

def crc(fileName):
    """kobor42 solution"""
    prev = 0
    for eachLine in open(fileName,"rb"):
        prev = zlib.crc32(eachLine, prev)
    return "%X"%(prev & 0xFFFFFFFF)

def crc32altren(filename):
    """Altren solution"""
    buf = open(filename,'rb').read()
    hash = binascii.crc32(buf) & 0xFFFFFFFF
    return "%08X" % hash

fpath = r'D:\test\test.dat'
tests = {forLoopCrc: 'With for loop and buffer.', 
     crc32: 'CrouZ solution', crc: 'kobor42 solution',
         crc32altren: 'Altren solution'}
count = 500

# CPU, HDD warmup
randomItm = [x for x in tests.keys()]
random.shuffle(randomItm)
print('\nWarming up the machine...')
for c in range(count):
    randomItm[0](fpath)
print('Finished.\n')

# Begin test
print('Beginning tests...\nFile size: %dKB\nTest cycles: %d\n' % (
    os.stat(fpath).st_size/1024, count))
for x in tests:
    print(tests[x])
    start_time = timeit.default_timer()
    for c in range(count):
        x(fpath)
    print('Result', timeit.default_timer() - start_time, '\n')

它更快,因为for循环比while循环更快(来源:herehere)。

于 2020-04-25T01:03:12.693 回答
2

合并以上2个代码如下:

try:
    fd = open(decompressedFile,"rb")
except IOError:
    logging.error("Unable to open the file in readmode:" + decompressedFile)
    return 4
eachLine = fd.readline()
prev = 0
while eachLine:
    prev = zlib.crc32(eachLine, prev)
    eachLine = fd.readline()
fd.close()
于 2012-03-16T07:54:27.940 回答
0

您可以使用 base64 像 [ERD45FTR] 一样退出。zlib.crc32 提供更新选项。

import os, sys
import zlib
import base64

def crc(fileName): fd = open(fileName,"rb") content = fd.readlines() fd.close() prev = None for eachLine in content: if not prev: prev = zlib.crc32(eachLine) else: prev = zlib.crc32(eachLine, prev) return prev

for eachFile in sys.argv[1:]: print base64.b64encode(str(crc(eachFile)))

于 2009-11-16T15:38:39.843 回答
0

解决方案:

import os, sys
import zlib

def crc(fileName, excludeLine="", includeLine=""):
  try:
        fd = open(fileName,"rb")
  except IOError:
        print "Unable to open the file in readmode:", filename
        return
  eachLine = fd.readline()
  prev = None
  while eachLine:
      if excludeLine and eachLine.startswith(excludeLine):
            continue   
      if not prev:
        prev = zlib.crc32(eachLine)
      else:
        prev = zlib.crc32(eachLine, prev)
      eachLine = fd.readline()
  fd.close()    
  return format(prev & 0xFFFFFFFF, '08x') #returns 8 digits crc

for eachFile in sys.argv[1:]:
    print crc(eachFile)

真的不知道是什么 (excludeLine="", includeLine="")...

于 2009-11-16T19:04:34.017 回答
0

使用 binascii 计算 CRC 有更快、更紧凑的方法:

import binascii

def crc32(filename):
    buf = open(filename,'rb').read()
    hash = binascii.crc32(buf) & 0xFFFFFFFF
    return "%08X" % hash
于 2020-12-20T01:19:29.207 回答