I'm writing a program that calculates the checksum of a list of files then compares it to a reference file.
I'm trying to convert the bytes buffer from the hashfile
method into a file size with the same units as os.stat(path).st_size
uses so I can update a tqdm progress bar accordingly. (trying to implement the last example here)
I tried a number of things (len(buf)
: gives me a processed size far greater than what the total is, int.from_bytes()
: OverflowError - int too large to convert to float, struct.unpack_from(buf)
: requires to read a single byte at a time, various functions to convert bytes) but nothing worked so far. It seems I don't understand bytes enough to know what to search for or to implement the solutions I find.
Here's an excerpt from the code:
import hashlib
import os
from tqdm import tqdm
# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size
def hashfile(self, progress, afile, hasher, blocksize=65536):
"""
Checksum buffer
:param progress: progress bar object
:param afile: file to process
:param hasher: checksum algorithm
:param blocksize: size of the buffer
:return: hash digest
"""
buf = afile.read(blocksize)
while len(buf) > 0:
self.processed_size += buf # need to convert from bytes to file size
hasher.update(buf)
progress.update(self.processed_size) # tqdm update
buf = afile.read(blocksize)
afile.seek(0)
return hasher.digest()
def process_file(self, progress, fichier):
"""
Checks if the file is in the reference dictionary;
If so, checks if the size of the file matches the one stored in the dictionary;
If so, calculates the checksum of the file and compares it to the one in the dictionary
:param progress: progress bar object
:param fichier: asset file to process
:return: string outcome of the process
"""
checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
# check if checksum matches
return outcome
def main_process(self):
"""
Launches and monitors the process and writes a report of the results
:return: application end
"""
with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
all_results = []
for f in self.assets.keys():
results = self.process_file(pbar, f)
all_results.append(results)
for r in all_results:
print(r)