2

I'm trying to run a standard MapReduce job over a datastore. The Map Pipeline goes through fine, but then the job gets stuck in the ShufflePipeline. I'm getting about eight of these error logs:

2013-05-13 08:26:18.154 /mapreduce/kickoffjob_callback 500 19978ms 2kb AppEngine-Google

0.1.0.2 - - [13/May/2013:08:26:18 -0700] 
"POST /mapreduce/kickoffjob_callback HTTP/1.1" 500 2511 
"http://x.appspot.com/mapreduce/pipeline/run" "AppEngine-Google;  
"x" ms=19979 cpu_ms=9814 cpm_usd=0.000281 queue_name=default  
task_name=15467899496029413827 app_engine_release=1.8.0  
instance=00c61b117c2368b09b3a28374853f2e040692c68


E 2013-05-13 08:26:18.055

Task size must be less than 102400; found 105564
Traceback (most recent call last):
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1536, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1530, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/base_handler.py", line 65, in post
    self.handle()
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 692, in handle
    spec, input_readers, output_writers, queue_name, self.base_path())
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 767, in _schedule_shards
    queue_name=queue_name)
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 369, in _schedule_slice
    worker_task.add(queue_name, parent=shard_state)
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/util.py", line 265, in add
    countdown=self.countdown)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/taskqueue/taskqueue.py", line 769, in __init__
    (max_task_size_bytes, self.size))
TaskTooLargeError: Task size must be less than 102400; found 105564

How can I fix this? It seems like it's a problem caused by the MR library's internal workings and how it breaks up its tasks. If so, how can I work around this?

4

1 回答 1

0

This was a bug. It was fixed here: https://code.google.com/p/appengine-mapreduce/source/detail?r=453

于 2014-04-15T20:13:27.397 回答