0

我们正在尝试在我们的项目中大量使用MapReduce 。现在我们遇到了这个问题,日志中有很多“ DeadlineExceededError ”错误......

它的一个例子(回溯每次都不同):

Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 207, in Handle
    result = handler(dict(self._environ), self._StartResponse)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/base_handler.py", line 65, in post
    self.handle()
  File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/handlers.py", line 208, in handle
    ctx.flush()
  File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 333, in flush
    pool.flush()
  File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 221, in flush
    self.__flush_ndb_puts()
  File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 239, in __flush_ndb_puts
    ndb.put_multi(self.ndb_puts.items, config=self.__create_config())
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3625, in put_multi
    for future in put_multi_async(entities, **ctx_options)]
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 323, in get_result
    self.check_success()
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 318, in check_success
    self.wait()
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 302, in wait
    if not ev.run1():
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 219, in run1
    delay = self.run0()
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 181, in run0
    callback(*args, **kwds)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 365, in _help_tasklet_along
    value = gen.send(val)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 274, in _put_tasklet
    keys = yield self._conn.async_put(options, datastore_entities)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1560, in async_put
    for pbs, indexes in pbsgen:
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1350, in __generate_pb_lists
    incr_size = pb.lengthString(pb.ByteSize()) + 1
DeadlineExceededError

我的问题是:

  • 我们怎样才能避免这个错误?
  • 工作会发生什么,它会被重试(如果是这样,我们如何控制它?)还是不重试?
  • 到底会不会导致数据不一致?
4

2 回答 2

3

显然,您执行的 put 太多,无法在一个数据存储调用中插入。您在这里有多种选择:

  1. 如果这是一个相对罕见的事件 - 忽略它。Mapreduce 将重试切片并降低放置池的大小。确保您的地图是幂等的。
  2. 看看http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/src/mapreduce/context.py - 在你的 main.py 你可以降低DATASTORE_DEADLINEMAX_ENTITY_COUNTMAX_POOL_SIZE降低大小整个mapreduce的池。
于 2012-10-18T18:21:57.013 回答
2

如果您使用的是 InputReader,则可以调整默认的 batch_size 以减少每个任务处理的实体数量。

我相信任务队列会重试任务,但您可能不希望它这样做,因为它可能会遇到相同的 DeadlineExceededError。

数据不一致是可能的。

也看到这个问题。 App Engine - 使用 Mapper API 的任务队列重试计数

于 2012-10-16T15:13:31.907 回答