0

I am working on a project, to be deployed on Heroku in Django, which has around 12 update functions. They take around 15 minutes to run each. Let's call them update1(), update2()...update10().

I am deploying with one worker dyno on Heroku, and I would like to run up to n or more of these at once (They are not really computationally intensive, they are all HTML parsers, but the data is time-sensitive, so I would like them to be called as often as possible).

I've read a lot of Celery and APScheduler documentation, but I'm not really sure which is the best/easiest for me. Do scheduled tasks run concurrently if the times overlap with one another (ie. if I run one every 2 minutes, and another every 3 minutes, or do they wait until each one finishes?)

Any way I can queue these functions, so at least a few of them are running at once? What is the suggested number of simultaneous calls for this use-case?

4

1 回答 1

0

根据您的用例描述,您不需要调度程序,因此 APScheduler 不能很好地满足您的要求。

除了工人测功机之外,您还有网络测功机吗?此类处理的通常设计模式是设置一个接受请求的控制线程或控制进程(您的网络测功机)。然后将这些请求放在请求队列中。

该队列由一个或多个工作线程或工作进程(您的工作人员测功机)读取。我没有与 Celery 合作过,但它看起来符合您的要求。根据您的描述,很难确定您需要多少工作线程或工作测功机。您还需要指定每秒需要处理多少更新请求。此外,您需要指定请求是受 CPU 限制还是受 IO 限制。

于 2012-09-30T20:35:01.173 回答