2

I am working on a web backend that frequently grabs realtime market data from the web, and puts the data in a MySQL database.

Currently I have my main thread push tasks into a Queue object. I then have about 20 threads that read from that queue, and if a task is available, they execute it.

Unfortunately, I am running into performance issues, and after doing a lot of research, I can't make up my mind.

As I see it, I have 3 options: Should I take a distributed task approach with something like Celery? Should I switch to JPython or IronPython to avoid the GIL issues? Or should I simply spawn different processes instead of threads using processing? If I go for the latter, how many processes is a good amount? What is a good multi process producer / consumer design?

Thanks!

4

2 回答 2

1

First, profile your code to determine what is bottlenecking your performance.

If each of your threads are frequently writing to your MySQL database, the problem may be disk I/O, in which case you should consider using an in-memory database and periodically write it to disk.

If you discover that CPU performance is the limiting factor, then consider using the multiprocessing module instead of the threading module. Use a multiprocessing.Queue object to push your tasks. Also make sure that your tasks are big enough to keep each core busy for a while, so that the granularity of communication doesn't kill performance. If you are currently using threading, then switching to multiprocessing would be the easiest way forward for now.

于 2012-05-24T18:33:21.863 回答
1

Maybe you should use an event-driven approach, and use an event-driven oriented frameworks like twisted(python) or node.js(javascript), for example this frameworks make use of the UNIX domain sockets, so your consumer listens at some port, and your event generator object pushes all the info to the consumer, so your consumer don't have to check every time to see if there's something in the queue.

于 2012-05-24T18:34:23.340 回答