3

I have multiple massive csv files I am processing in parallel. I'd like to have a progress bar for each file.

However, while I am displayed 5 bars, only the last one is being updated - seemingly by all processes at once. As I can't read the whole csv file into memory I am using filesize to display progress.

inputArg is the folder path ending with a number.

def worker(inputArg):
        with open(inputArg + '/data.csv') as csvfile:
                size = os.path.getsize(inputArg + '/data.csv')
                text = "progresser #{}".format(inputArg[-1])
                pb = tqdm(total=size, unit="B", unit_scale=True, desc=text, position=int(inputArg[-1]))
                reader = csv.reader(csvfile, delimiter=',')
                for row in reader:
                        pb.update(len(row))
                        session.execute(*INSERT QUERY*)

    def scheduler(inputData):
            p = multiprocessing.Pool(multiprocessing.cpu_count()+1)
            p.map(worker, inputData)
            p.close()
            p.join()

    if __name__ == '__main__':
            folders = glob.glob('FILEPATH/*')
            print ('--------------------Insert started---------------')
            scheduler(folders)
            print('---------------------All Done---------------------')

Any hint would be appreciated!

EDIT: I did check the other answer, but I explicitly said I want multiple progress bars, and that answer only gives you ONE. Hence, this is not a duplicate.

EDIT2: Here's what it looks like @bouteillebleu, I do get my bars, but only the last one is updated for some reason. Current progress bars

4

1 回答 1

1

try using the latest version of tqdm (v4.18.0 or later, see https://github.com/tqdm/tqdm/releases)

于 2017-10-01T23:29:39.967 回答