python - 如何在 Python 中计算文件的 MD5 校验和？

Question

我用 Python 编写了一些代码来检查文件中的 MD5 散列，并确保散列与原始散列匹配。

这是我开发的：

# Defines filename
filename = "file.exe"

# Gets MD5 from file 
def getmd5(filename):
    return m.hexdigest()

md5 = dict()

for fname in filename:
    md5[fname] = getmd5(fname)

# If statement for alerting the user whether the checksum passed or failed

if md5 == '>md5 will go here<': 
    print("MD5 Checksum passed. You may now close this window")
    input ("press enter")
else:
    print("MD5 Checksum failed. Incorrect MD5 in file 'filename'. Please download a new copy")
    input("press enter") 
exit

但是每当我运行代码时，我都会收到以下错误：

Traceback (most recent call last):
File "C:\Users\Username\md5check.py", line 13, in <module>
 md5[fname] = getmd5(fname)
File "C:\Users\Username\md5check.py, line 9, in getmd5
  return m.hexdigest()
NameError: global name 'm' is not defined

我的代码中有什么遗漏吗？

score 266 · Accepted Answer

关于您的错误以及代码中缺少的内容。m是一个没有为getmd5()函数定义的名称。

没有冒犯，我知道你是一个初学者，但你的代码到处都是。让我们一一看看你的问题:)

首先，您没有hashlib.md5.hexdigest()正确使用方法。请参考Python Doc Library中关于 hashlib 函数的解释。为提供的字符串返回 MD5 的正确方法是执行以下操作：

>>> import hashlib
>>> hashlib.md5("filename.exe").hexdigest()
'2a53375ff139d9837e93a38a279d63e5'

但是，这里有一个更大的问题。您正在对文件名字符串计算 MD5 ，而实际上 MD5 是根据文件内容计算的。您将需要基本上读取文件内容并通过 MD5 管道传输它。我的下一个例子效率不高，但是是这样的：

>>> import hashlib
>>> hashlib.md5(open('filename.exe','rb').read()).hexdigest()
'd41d8cd98f00b204e9800998ecf8427e'

如您所见，第二个 MD5 哈希与第一个完全不同。原因是我们正在推送文件的内容，而不仅仅是文件名。

一个简单的解决方案可能是这样的：

# Import hashlib library (md5 method is part of it)
import hashlib

# File to check
file_name = 'filename.exe'

# Correct original md5 goes here
original_md5 = '5d41402abc4b2a76b9719d911017c592'  

# Open,close, read file and calculate MD5 on its contents 
with open(file_name, 'rb') as file_to_check:
    # read contents of the file
    data = file_to_check.read()    
    # pipe contents of the file through
    md5_returned = hashlib.md5(data).hexdigest()

# Finally compare original MD5 with freshly calculated
if original_md5 == md5_returned:
    print "MD5 verified."
else:
    print "MD5 verification failed!."

请看帖子Python: Generate a MD5 checksum of a file。它详细解释了如何有效实现它的几种方法。

祝你好运。

score 42 · Accepted Answer

在 Python 3.8+你可以做

import hashlib

with open("your_filename.png", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)

print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes

在 Python 3.7 及以下版本上：

with open("your_filename.png", "rb") as f:
    file_hash = hashlib.md5()
    chunk = f.read(8192)
    while chunk:
        file_hash.update(chunk)
        chunk = f.read(8192)

print(file_hash.hexdigest())

这一次读取文件 8192（或 2¹³）字节，而不是一次读取所有字节，f.read()以使用更少的内存。

考虑使用hashlib.blake2b代替md5（只需在上面的片段中替换）md5。它比 MD5blake2b加密安全且速度更快。

score 3 · Accepted Answer

hashlib方法也支持mmap模块，所以我经常使用

from hashlib import md5
from mmap import mmap, ACCESS_READ

path = ...
with open(path) as file, mmap(file.fileno(), 0, access=ACCESS_READ) as file:
    print(md5(file).hexdigest())

path你的文件的路径在哪里。

参考：https ://docs.python.org/library/mmap.html#mmap.mmap

编辑：与普通阅读方法的比较。

时间和内存使用情况图

from hashlib import md5
from mmap import ACCESS_READ, mmap

from matplotlib.pyplot import grid, legend, plot, show, tight_layout, xlabel, ylabel
from memory_profiler import memory_usage
from numpy import arange

def MemoryMap():
    with open(path) as file, mmap(file.fileno(), 0, access=ACCESS_READ) as file:
        print(md5(file).hexdigest())

def PlainRead():
    with open(path, 'rb') as file:
        print(md5(file.read()).hexdigest())

if __name__ == '__main__':
    path = ...
    y = memory_usage(MemoryMap, interval=0.01)
    plot(arange(len(y)) / 100, y, label='mmap')
    y = memory_usage(PlainRead, interval=0.01)
    plot(arange(len(y)) / 100, y, label='read')
    ylabel('Memory Usage (MiB)')
    xlabel('Time (s)')
    legend()
    grid()
    tight_layout()
    show()

path是 3.77GiB csv 文件的路径。

score -2 · Accepted Answer

您可以通过读取二进制数据并使用hashlib.md5().hexdigest(). 执行此操作的函数如下所示：

def File_Checksum_Dis(dirname):
    
    if not os.path.exists(dirname):
        print(dirname+" directory is not existing");
    
    for fname in os.listdir(dirname):
        if not fname.endswith('~'):
            fnaav = os.path.join(dirname, fname);
            fd = open(fnaav, 'rb');
            data = fd.read();
            fd.close();
        
            print("-"*70);
            print("File Name is: ",fname);          
            print(hashlib.md5(data).hexdigest())
            print("-"*70);

python - 如何在 Python 中计算文件的 MD5 校验和？

4 回答 4

Related

Reference