python - 为什么多进程无法通过 eventlet 加速发送 HTTP 请求

Question

我有一个应用程序来发送一堆 HTTP 请求。首先，我使用eventlet和requests来实现它。但是性能太低了。因此，我希望使用多进程来加速它。需要知道的是，服务器处理单个请求大约需要 200 毫秒（不包括网络传输）。

但是，多进程比原始版本慢。我对这个结果感到非常惊讶！为什么？

如下所示的代码，我使用timeit来测量时间。

import eventlet
eventlet.monkey_patch(all=False, socket=True)

import requests

URL = 'http://....'

def send():
    pile = eventlet.GreenPile(20)
    for x in xrange(100):
        pile.spawn(requests.get, URL)
    for x in pile:
        pass

import multiprocessing

def main():
    procs = [multiprocessing.Process(target=send) for _ in range(3)]
    for p in procs:
        p.start()
    for p in procs:
        p.join()

import timeit

if __name__ == '__main__':
    print timeit.timeit(main, number=1)

score 1 · Accepted Answer

TL;DR: not enough information. By pure speculation, the server is the limiting factor (which may be caused by intentional artificial limits or resource starvation), so by adding more concurrent requests you are making each slower on average.

Here's one way to reason about this: you have limited amount of resources on both client and server: CPU cycles per time, memory accesses per time, memory size, network bandwidth. OS and eventlet make reasonable use of these resources. Such that you can do estimates on how much resources it takes to make a single request and software will scale it out in a reasonable pattern (that is close to linear). To benefit from multiprocessing would require your client process makes single CPU unit 100% busy. And specifically requests library is known to be good at wasting hardware resources, it incurs the most CPU overhead of all I tried (httplib, httplib2, urllib). But you have to make really lots (thousands) of concurrent requests or have really bad/busy CPU to make it bottleneck.

Exact answer requires information:

Whether HTTP client and server concur for any resources? I.e. do they run on same physical hardware?
What is the maximum request frequency (count per second) you were able to generate in single process mode? Adjust GreenPile size to vary number of concurrent requests.
What is the maximum frequency of requests you were able to generate using multiple processes? Adjust both GreenPile size and number of processes. Try running several independent Python interpreters without multiprocessing.
Was server the bottleneck? Check by adding another client on separate hardware. If request frequency is higher then the server is not bottleneck. If request frequency drops with more clients, then the server was already running at limit and multiprocessing could only help to make things worse.
What is the request time distribution? What percentage of requests was completed in 220/300/400/500/1000/more ms?
What is network latency/bandwidth? What is request/response size? Do you saturate network then?

Answering these questions will provide you with excellent intuition on what's going on inside and between your client/server.

Relevant Github issue: https://github.com/eventlet/eventlet/issues/195

python - 为什么多进程无法通过 eventlet 加速发送 HTTP 请求

1 回答 1

Related

Reference