2

Here is a case, when one must bring in parallelism into the backend server.

I am willing to query N ELB's, each for 5 different queries, and send the result back to the web client.

The backend is Tornado, and according to what I have read many times in the docs, in the past, I should be able to get several tasks processed in parallel if I use @gen.Task or gen.coroutine.

However, I must be missing something in here, as all my requests are (20 in number, 4 elbs * 5 queries) are processed one after another.

def query_elb(fn, region, elb_name, period, callback):
    callback(fn (region, elb_name, period))

class DashboardELBHandler(RequestHandler):

    @tornado.gen.coroutine
    def get_elb_info(self, region, elb_name, period):
        elbReq = yield gen.Task(query_elb, ELBSumRequest, region, elb_name, period)
        elb2XX = yield gen.Task(query_elb, ELBBackend2XX, region, elb_name, period)
        elb3XX = yield gen.Task(query_elb, ELBBackend3XX, region, elb_name, period)
        elb4XX = yield gen.Task(query_elb, ELBBackend4XX, region, elb_name, period)
        elb5XX = yield gen.Task(query_elb, ELBBackend5XX, region, elb_name, period)

        raise tornado.gen.Return( 
            [
                elbReq,
                elb2XX,
                elb3XX,
                elb4XX,
                elb5XX,
            ]
        )

    @tornado.web.authenticated
    @tornado.web.asynchronous
    @tornado.gen.coroutine
    def post(self):
        ret = []

        period = self.get_argument("period", "5m")

        cloud_deployment = db.foo.bar.baz()
        for region, deployment in cloud_deployment.iteritems():

            elb_name = deployment["elb"][0]
            res = yield self.get_elb_info(region, elb_name, period)
            ret.append(res)

        self.push_json(ret)



def ELBQuery(region, elb_name,  range_name, metric, statistic, unit):
    dimensions = { u"LoadBalancerName": [elb_name] }

    (start_stop , period) = calc_range(range_name)

    cw = boto.ec2.cloudwatch.connect_to_region(region)
    data_points = cw.get_metric_statistics( period, start, stop, 
        metric, "AWS/ELB", statistic, dimensions, unit)    

    return data_points

ELBSumRequest   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "RequestCount", "Sum", "Count")
ELBLatency      = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "Latency", "Average", "Seconds")
ELBBackend2XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_2XX", "Sum", "Count")
ELBBackend3XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_3XX", "Sum", "Count")
ELBBackend4XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_4XX", "Sum", "Count")
ELBBackend5XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_5XX", "Sum", "Count")
4

1 回答 1

3

问题是这ELBQuery是一个阻塞功能。如果某个地方没有yield另一个协程,协程调度程序就无法交错调用。(这就是协程的全部意义——它们是合作的,而不是先发制人的。)

如果问题是calc_range调用之类的问题,那可能很容易处理——将它分解成更小的部分,每个部分都让给下一个部分,这让调度程序有机会介入每个部分。

但最有可能的是,阻塞的是 boto 调用,您的函数的大部分时间都花在等待get_metric_statistics返回,而没有其他任何东西可以运行。

那么,你如何解决这个问题?

  1. 为每个 boto 任务衍生一个线程。Tornado 使透明地围绕线程或线程池任务包装协程变得非常容易,这神奇地解除了所有阻塞。当然,使用线程也是有代价的。
  2. 在线程池而不是每个线程上安排 boto 任务。与 #1 类似的权衡,尤其是在您只有少数任务的情况下。(但如果您可以为 500 个不同的用户分别执行 5 个任务,那么您可能需要一个共享池。)
  3. 重写或monkeypatch boto 以使用协程。这将是理想的解决方案……但它的工作量最大(并且最有可能破坏您不理解的代码,并且必须将其维护为 boto 更新等)。但是,有些人至少已经开始这样做了,比如这个asyncboto项目。
  4. 使用greenlets和monkeypatch足够的库依赖项来欺骗它成为异步的。这听起来很老套,但实际上可能是最好的解决方案;为此,请参阅将Boto 嫁给 Tornado
  5. 使用greenlets和monkeypatch整个stdlib alagevent来欺骗boto和tornado一起工作,甚至没有意识到这一点。这听起来是个糟糕的主意。您最好将整个应用程序移植到gevent.
  6. 使用使用类似gevent.

在不了解更多细节的情况下,我建议先查看 #2 和 #4,但我不能保证它们会成为您的最佳答案。

于 2013-08-21T20:04:56.687 回答