I am working on a web service implemented on top of nginx+gunicorn+django. The clients are smartphone applications. The application needs to make some long running calls to external APIs (Facebook, Amazon S3...), so the server simply queues the job to a job server (using Celery over Redis).
Whenever possible, once the server has queued the job, it returns right away, and the HTTP connection is closed. This works fine and allows the server to sustain very high load.
client server job server
. | |
. | |
|------HTTP request----->| |
| |--------queue job------>|
|<--------close----------| |
. | |
. | |
But in some cases, the client needs to get the result as soon as the job is finished. Unfortunately, there's no way the server can contact the client once the HTTP connection is closed. One solution would be to rely on the client application polling the server every few seconds until the job is completed. I would like to avoid this solution, if possible, mostly because it would hinder the reactiveness of the service, and also because it would load the server with many unnecessary poll requests.
In short, I would like to keep the HTTP connection up and running, doing nothing (except perhaps sending a whitespace every once in a while to keep the TCP connection alive, just like Amazon S3 does), until the job is done, and the server returns the result.
client server job server
. | |
. | |
|------HTTP request----->| |
| |--------queue job------>|
|<------keep-alive-------| |
| [...] | |
|<------keep-alive-------| |
| |<--------result---------|
|<----result + close-----| |
. | |
. | |
How can I implement long running HTTP connections in an efficient way, assuming the server is under very high load (it is not the case yet, but the goal to be able to sustain the highest possible load, with hundreds or thousands of requests per second)?
Offloading the actual jobs to other servers should ensure a low CPU usage on the server, but how can I avoid processes piling up and using all the server's RAM, or incoming requests being dropped because of too many open connections?
This is probably mostly a matter of configuring nginx and gunicorn properly. I have read a bit about async workers based on greenlets in gunicorn: the documentation says that async workers are used by "Applications making long blocking calls (Ie, external web services)", this sounds perfect. It also says "In general, an application should be able to make use of these worker classes with no changes". This sounds great. Any feedback on this?
Thanks for your advices.