google-app-engine - Do redundant ndb.Model.put_async() calls end up being sent only once to the datastore?

Question

I have a NDB model that exposes a few instance methods to manipulate its state. In some request handlers, I need to call a few of these instance methods. In order to prevent calling put() more than once on the same entity, the pattern I've used so far is similar to this:

class Foo(ndb.Model):
    prop_a = ndb.StringProperty()
    prop_b = ndb.StringProperty()
    prop_c = ndb.StringProperty()

    def some_method_1(self):
        self.prop_a = "The result of some computation"
        return True

    def some_method_2(self):
        if some_condition:
            self.prop_b = "Some new value"
            return True
        return False

    def some_method_3(self):
        if some_condition:
            self.prop_b = "Some new value"
            return True
        if some_other_condition:
            self.prop_b = "Some new value"
            self.prop_c = "Some new value"
            return True
        return False

def manipulate_foo(f):
    updated = False
    updated = f.some_method_1() or updated
    updated = f.some_method_2() or updated
    updated = f.some_method_3() or updated
    if updated:
        f.put()

Basically, each method that can potentially update the entity returns a bool to indicate if the entity has been updated and therefore needs to be saved. When calling these methods in sequence, I make sure to call put() if any of the methods returned True.

However, this pattern can be complex to implement in situations where other subroutines are involved. In that case, I need to make the updated boolean value returned from subroutines bubble up to the top-level methods.

I am now in the process of optimizing a lot of my request handlers, trying to limit as much as possibles the waterfalls reported by AppStat, using as much async APIs as I can and converting a lot of methods to tasklets.

This effort lead me to read the NDB Async documentation, which mentions that NDB implements an autobatcher which combines multiple requests in a single RPC call to the datastore. I understand that this applies to requests involving different keys, but does it also apply to redundant calls to the same entity?

In other words, my question is: could the above code pattern be replaced by this one?

class FooAsync(ndb.Model):
    prop_a = ndb.StringProperty()
    prop_b = ndb.StringProperty()
    prop_c = ndb.StringProperty()

    @ndb.tasklet
    def some_method_1(self):
        self.prop_a = "The result of some computation"
        yield self.put_async()

    @ndb.tasklet
    def some_method_2(self):
        if some_condition:
            self.prop_b = "Some new value"
            yield self.put_async()

    @ndb.tasklet
    def some_method_3(self):
        if some_condition:
            self.prop_b = "Some new value"
            yield self.put_async()
        elif some_other_condition:
            self.prop_b = "Some new value"
            self.prop_c = "Some new value"
            yield self.put_async()

@ndb.tasklet
def manipulate_foo(f):
    yield f.some_method_1()
    yield f.some_method_2()
    yield f.some_method_3()

Would all calls to put_async() be combined into a single put call on the entity? If yes, are there any caveats to using this approach vs sticking to manually checking for an updated return value and calling put once at the end of the call sequence?

score 6 · Accepted Answer

好吧，我咬紧牙关，在启用 AppStat 的测试 GAE 应用程序中测试了这 3 个场景，以查看正在进行的 RPC 调用：

class Foo(ndb.Model):
    prop_a = ndb.DateTimeProperty()
    prop_b = ndb.StringProperty()
    prop_c = ndb.IntegerProperty()

class ThreePutsHandler(webapp2.RequestHandler):
    def post(self):
        foo = Foo.get_or_insert('singleton')
        foo.prop_a = datetime.utcnow()
        foo.put()
        foo.prop_b = str(foo.prop_a)
        foo.put()
        foo.prop_c = foo.prop_a.microsecond
        foo.put()

class ThreePutsAsyncHandler(webapp2.RequestHandler):
    @ndb.toplevel
    def post(self):
        foo = Foo.get_or_insert('singleton')
        foo.prop_a = datetime.utcnow()
        foo.put_async()
        foo.prop_b = str(foo.prop_a)
        foo.put_async()
        foo.prop_c = foo.prop_a.microsecond
        foo.put_async()

class ThreePutsTaskletHandler(webapp2.RequestHandler):
    @ndb.tasklet
    def update_a(self, foo):
        foo.prop_a = datetime.utcnow()
        yield foo.put_async()

    @ndb.tasklet
    def update_b(self, foo):
        foo.prop_b = str(foo.prop_a)
        yield foo.put_async()

    @ndb.tasklet
    def update_c(self, foo):
        foo.prop_c = foo.prop_a.microsecond
        yield foo.put_async()

    @ndb.toplevel
    def post(self):
        foo = Foo.get_or_insert('singleton')
        self.update_a(foo)
        self.update_b(foo)
        self.update_c(foo)

app = webapp2.WSGIApplication([
    ('/ndb-batching/3-puts', ThreePutsHandler),
    ('/ndb-batching/3-puts-async', ThreePutsAsyncHandler),
    ('/ndb-batching/3-puts-tasklet', ThreePutsTaskletHandler),
], debug=True)

第一个，ThreePutsHandler，显然最终调用了Put3 次。

ThreePutsHandler AppStat 跟踪

但是，调用的其他 2 个测试put_async()以一次调用结束Put：

ThreePutsAsyncHandler AppStat 跟踪 ThreePutsTaskletHandler AppStat 跟踪

所以我的问题的答案是：是的，冗余的ndb.Model.put_async () 调用正在由 NDB 的自动批处理功能进行批处理，并最终作为单个datastore_v3.Put调用。这些put_async()调用是否在 tasklet 中进行并不重要。

关于测试结果中观察到的数据存储写入操作数量的注释：正如 Shay 在评论中指出的那样，每个修改的索引属性值有 4 次写入加上实体的 1 次写入。所以在第一个测试（3 个顺序put）中，我们观察到 (4+1) * 3 = 15 个写操作。在其他 2 个测试（异步）中，我们观察到 (4*3) + 1 = 13 个写操作。

因此，底线是 NDBput_async对同一个实体进行批量多次调用，通过对数据存储的一次调用为我们节省了很多延迟，并且通过只写入一次实体为我们节省了一些写入操作。

score 1 · Accepted Answer

尝试注释对象本身，并在返回响应之前进行检查。就像 Zope 中的 _p_changed 属性一样。另一种选择可能是在返回之前需要写入的修改对象的请求/线程本地注册表。有关 GAE 中 threadlocal 的示例，请查看 google/appengine/runtime/request_environment.py

google-app-engine - Do redundant ndb.Model.put_async() calls end up being sent only once to the datastore?

2 回答 2

Related

Reference