我正在开发一个 Python 后端 Web 服务器,该服务器从付费的第 3 方 API 获取实时数据。我需要非常快速地查询这个 API(每 10 秒大约 150 个查询)。因此,我创建了一个小的概念证明,它产生 200 个线程并将 url 写入队列。然后线程从队列中读取 url 并发送 HTTP 请求。第 3 方 API 返回一个称为延迟的值,即他们的服务器处理请求所用的时间。这是仅下载所有 url 的 POC 代码(不重复)。
_http_pool = urllib3.PoolManager()
def getPooledResponse(url):
return _http_pool.request("GET", url, timeout=30)
class POC:
_worker_threads = []
WORKER_THREAD_COUNT = 200
q = Queue.Queue()
@staticmethod
def worker():
while True:
url = POC.q.get()
t0 = datetime.datetime.now()
r = getPooledResponse(item)
print "thread %s took %d seconds to process the url (service delay %d)" % (threading.currentThread().ident, (datetime.datetime.now() - t0).seconds, getDelayFromResponse(r))
POC.q.task_done()
@staticmethod
def run():
# start the threads if we have less than the desired amount
if len(POC._worker_threads) < POC.WORKER_THREAD_COUNT:
for i in range(POC.WORKER_THREAD_COUNT - len(POC._worker_threads)):
t = threading.Thread(target=POC.worker)
t.daemon = True
t.start()
POC._worker_threads.append(t)
# put the urls in the queue
for url in urls:
POC.q.put(url)
# sleep for just a bit so that the requests don't get sent out together (this is a limitation of the API I am using)
time.sleep(0.3)
POC.run()
当我运行它时,前几个结果会以合理的延迟返回:
thread 140544300453053 took 2 seconds to process the url (service delay 1.782)
但是,大约 10-20 秒后,我得到了这些东西:
thread 140548049958656 took 23 seconds to process the url (service delay 1.754)
换句话说,即使服务器返回的延迟很小,我的线程也需要更长的时间才能完成......
如何测试以查看其他 21 秒的运行时间花在了哪里?
谢谢!