python - 在运行时同步python线程

Question

我有一个 python 脚本，它发出 800,000 个 http 请求，以确保它们返回 200 个。如果 404 在变量中捕获 url 路径。该 url 被参数化为采用 800,000 个不同的 id。我正在使用 100 个不同的线程来节省时间，最后我将它们全部加入以获取 404'ed 等的 url 数量，

但是大约需要2个小时才能完成，并且必须等待才能获得结果。我应该能够在运行时的任何时候知道到目前为止有多少 ids 完成了，有多少 404 等，我该怎么做？

runners = []
nthreads=100

chunk_size = ceil(len(ids)/float(nthreads))
for i in range(nthreads):
    runners.append(HeadendChecker(i*chunk_size, min(len(dac_ids), chunk_size*(i+1))))

for thread in runners:
    thread.start()

list_of_bad_ids = []
for thread in runners:
    thread.join()
    if thread.get_bad_ids() != None:
        list_of_bad_ids = list_of_bad_ids + thread.get_bad_ids()

score 1 · Accepted Answer

您可以使用队列/队列对象，而不是每个线程存储 200 和 404 。

您可以将现有线程转变为生产者：它们产生 (status, url id) 元组，这些元组被放入共享队列。

然后，您可以添加一个分析器线程，该线程使用此队列中的项目，沿途打印状态消息，并以方便的方式存储结果以供进一步处理（“进一步处理”是指在所有工作线程完成后完成的任何处理）完成的）

python - 在运行时同步python线程

1 回答 1

Related

Reference