0

I'm writing a program that calculates the checksum of a list of files then compares it to a reference file.

I'm trying to convert the bytes buffer from the hashfile method into a file size with the same units as os.stat(path).st_size uses so I can update a tqdm progress bar accordingly. (trying to implement the last example here)

I tried a number of things (len(buf): gives me a processed size far greater than what the total is, int.from_bytes(): OverflowError - int too large to convert to float, struct.unpack_from(buf): requires to read a single byte at a time, various functions to convert bytes) but nothing worked so far. It seems I don't understand bytes enough to know what to search for or to implement the solutions I find.

Here's an excerpt from the code:

import hashlib
import os
from tqdm import tqdm

# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size

def hashfile(self, progress, afile, hasher, blocksize=65536):
    """
    Checksum buffer
    :param progress: progress bar object
    :param afile: file to process
    :param hasher: checksum algorithm
    :param blocksize: size of the buffer
    :return: hash digest
    """
    buf = afile.read(blocksize)

    while len(buf) > 0:
        self.processed_size += buf  # need to convert from bytes to file size
        hasher.update(buf)
        progress.update(self.processed_size)  # tqdm update
        buf = afile.read(blocksize)

    afile.seek(0)
    return hasher.digest()

def process_file(self, progress, fichier):
    """
    Checks if the file is in the reference dictionary;
    If so, checks if the size of the file matches the one stored in the dictionary;
    If so, calculates the checksum of the file and compares it to the one in the dictionary
    :param progress: progress bar object
    :param fichier: asset file to process
    :return: string outcome of the process
    """
    checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
    # check if checksum matches
    return outcome

def main_process(self):
    """
    Launches and monitors the process and writes a report of the results
    :return: application end
    """
    with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
        all_results = []

        for f in self.assets.keys():
            results = self.process_file(pbar, f)
            all_results.append(results)

    for r in all_results:
        print(r)
4

1 回答 1

0

感谢@RadosławCybulski,我找到了解决方案,我现在了解 tqdm.update() 函数的工作原理:它不会将进度状态设置为参数,而是添加它。我像这样更新了 hashfile 方法:

    while len(buf) > 0:
        hasher.update(buf)
        progress.update(len(buf))
        buf = afile.read(blocksize)
于 2017-06-25T19:12:49.253 回答