I have multiple massive csv files I am processing in parallel. I'd like to have a progress bar for each file.
However, while I am displayed 5 bars, only the last one is being updated - seemingly by all processes at once. As I can't read the whole csv file into memory I am using filesize to display progress.
inputArg is the folder path ending with a number.
def worker(inputArg):
with open(inputArg + '/data.csv') as csvfile:
size = os.path.getsize(inputArg + '/data.csv')
text = "progresser #{}".format(inputArg[-1])
pb = tqdm(total=size, unit="B", unit_scale=True, desc=text, position=int(inputArg[-1]))
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
pb.update(len(row))
session.execute(*INSERT QUERY*)
def scheduler(inputData):
p = multiprocessing.Pool(multiprocessing.cpu_count()+1)
p.map(worker, inputData)
p.close()
p.join()
if __name__ == '__main__':
folders = glob.glob('FILEPATH/*')
print ('--------------------Insert started---------------')
scheduler(folders)
print('---------------------All Done---------------------')
Any hint would be appreciated!
EDIT: I did check the other answer, but I explicitly said I want multiple progress bars, and that answer only gives you ONE. Hence, this is not a duplicate.
EDIT2: Here's what it looks like @bouteillebleu, I do get my bars, but only the last one is updated for some reason. Current progress bars