1

在我们的应用引擎 Java 实例上运行 Map Reduce 作业时,它会在一段时间后停止,并且我收到请求太大的异常。从异常中不清楚如何解决这个问题,因为它发生在应该是无缝的洗牌阶段。

我了解实体大小、超时等方面的限制,但我不确定在哪里向 API 指出这一点?

我使用的代码与此处的代码非常相似,只是计算了不同的属性属性并使 servlet 几乎相同。

这就是我得到的:

2012-12-02 13:18:53.632 /mapreduce/controllerCallback 500 535ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.2 - - [02/Dec/2012:03:18:53 -0800] "POST /mapreduce/controllerCallback HTTP/1.1" 500 0 "http://xxx/mapreduce/controllerCallback" "AppEngine-Google; (+http://code.google.com/appengine)" "xxx.com" ms=535 cpu_ms=288 queue_name=default task_name=16764257756372630651 instance=00c61b117cc2653091fefc2f9b795b790f1a
I 2012-12-02 13:18:53.203
com.google.appengine.tools.mapreduce.impl.shardedjob.ShardedJobRunner pollTaskStates: Polling task states for job 002b0ae1-643b-420d-b3cd-20f03cc054ff-reduce, sequence number 166
W 2012-12-02 13:18:53.630
/mapreduce/controllerCallback
com.google.apphosting.api.ApiProxy$RequestTooLargeException: The request to API call datastore_v3.Put() was too large.
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.success(ApiProxyImpl.java:494)
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.success(ApiProxyImpl.java:392)
    at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher$1.runInContext(RpcStub.java:781)
    at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:461)
    at com.google.tracing.TraceContext.runInContext(TraceContext.java:703)
    at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:338)
    at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:330)
    at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:458)
    at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher.rpcFinished(RpcStub.java:823)
    at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher.success(RpcStub.java:808)
    at com.google.net.rpc3.impl.client.RpcClientInternalContext.runCallbacks(RpcClientInternalContext.java:898)
    at com.google.net.rpc3.impl.client.RpcClientInternalContext.finishRpcAndNotifyApp(RpcClientInternalContext.java:803)
    at com.google.net.rpc3.impl.client.RpcNetChannel.afterFinishingActiveRpc(RpcNetChannel.java:1063)
    at com.google.net.rpc3.impl.client.RpcNetChannel.finishRpc(RpcNetChannel.java:911)
    at com.google.net.rpc3.impl.client.RpcNetChannel.handleResponse(RpcNetChannel.java:2267)
    at com.google.net.rpc3.impl.client.RpcNetChannel.messageReceived(RpcNetChannel.java:2068)
    at com.google.net.rpc3.impl.client.RpcNetChannel.access$2000(RpcNetChannel.java:143)
    at com.google.net.rpc3.impl.client.RpcNetChannel$TransportCallback.receivedMessage(RpcNetChannel.java:3129)
    at com.google.net.rpc3.impl.client.RpcChannelTransportData$TransportCallback.receivedMessage(RpcChannelTransportData.java:599)
    at com.google.net.rpc3.impl.wire.RpcBaseTransport.receivedMessage(RpcBaseTransport.java:417)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport$ClientAdapter.receivedMessage(UdrpcTransport.java:424)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport.dispatchPacket(UdrpcTransport.java:265)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport.readPackets(UdrpcTransport.java:217)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport$1.run(UdrpcTransport.java:81)
    at com.google.net.eventmanager.AbstractFutureTask$Sync.innerRun(AbstractFutureTask.java:260)
    at com.google.net.eventmanager.AbstractFutureTask.run(AbstractFutureTask.java:121)
    at com.google.net.eventmanager.EventManagerImpl.runTask(EventManagerImpl.java:578)
    at com.google.net.eventmanager.EventManagerImpl.internalRunWorkerLoop(EventManagerImpl.java:1002)
    at com.google.net.eventmanager.EventManagerImpl.runWorkerLoop(EventManagerImpl.java:884)
    at com.google.net.eventmanager.WorkerThreadInfo.runWorkerLoop(WorkerThreadInfo.java:136)
    at com.google.net.eventmanager.EventManagerImpl$WorkerThread.run(EventManagerImpl.java:1855)
C 2012-12-02 13:18:53.631
Uncaught exception from servlet
com.google.apphosting.api.ApiProxy$RequestTooLargeException: The request to API call datastore_v3.Put() was too large.
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.success(ApiProxyImpl.java:494)
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.success(ApiProxyImpl.java:392)
    at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher$1.runInContext(RpcStub.java:781)
    at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:461)
    at com.google.tracing.TraceContext.runInContext(TraceContext.java:703)
    at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:338)
    at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:330)
    at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:458)
    at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher.rpcFinished(RpcStub.java:823)
    at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher.success(RpcStub.java:808)
    at com.google.net.rpc3.impl.client.RpcClientInternalContext.runCallbacks(RpcClientInternalContext.java:898)
    at com.google.net.rpc3.impl.client.RpcClientInternalContext.finishRpcAndNotifyApp(RpcClientInternalContext.java:803)
    at com.google.net.rpc3.impl.client.RpcNetChannel.afterFinishingActiveRpc(RpcNetChannel.java:1063)
    at com.google.net.rpc3.impl.client.RpcNetChannel.finishRpc(RpcNetChannel.java:911)
    at com.google.net.rpc3.impl.client.RpcNetChannel.handleResponse(RpcNetChannel.java:2267)
    at com.google.net.rpc3.impl.client.RpcNetChannel.messageReceived(RpcNetChannel.java:2068)
    at com.google.net.rpc3.impl.client.RpcNetChannel.access$2000(RpcNetChannel.java:143)
    at com.google.net.rpc3.impl.client.RpcNetChannel$TransportCallback.receivedMessage(RpcNetChannel.java:3129)
    at com.google.net.rpc3.impl.client.RpcChannelTransportData$TransportCallback.receivedMessage(RpcChannelTransportData.java:599)
    at com.google.net.rpc3.impl.wire.RpcBaseTransport.receivedMessage(RpcBaseTransport.java:417)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport$ClientAdapter.receivedMessage(UdrpcTransport.java:424)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport.dispatchPacket(UdrpcTransport.java:265)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport.readPackets(UdrpcTransport.java:217)
    at com.google.apphosting.runtime.udrpc.UdrpcTransport$1.run(UdrpcTransport.java:81)
    at com.google.net.eventmanager.AbstractFutureTask$Sync.innerRun(AbstractFutureTask.java:260)
    at com.google.net.eventmanager.AbstractFutureTask.run(AbstractFutureTask.java:121)
    at com.google.net.eventmanager.EventManagerImpl.runTask(EventManagerImpl.java:578)
    at com.google.net.eventmanager.EventManagerImpl.internalRunWorkerLoop(EventManagerImpl.java:1002)
    at com.google.net.eventmanager.EventManagerImpl.runWorkerLoop(EventManagerImpl.java:884)
    at com.google.net.eventmanager.WorkerThreadInfo.runWorkerLoop(WorkerThreadInfo.java:136)
    at com.google.net.eventmanager.EventManagerImpl$WorkerThread.run(EventManagerImpl.java:1855)

我发现了这个,但我没有使用任何实体,我猜测正在创建实体来存储数据,但由于我不确定在哪里定义我应该写多少?

4

3 回答 3

4

运行 Map Reduce 时出现同样的异常。问题是 Map Reduce 包含大量元数据,这些元数据是在作业期间生成的,之后数据作为单个实体保存在数据存储中。当实体变得比允许的大时,抛出异常并且 Map Reduce 停止。

我的问题是我写了很多日志,比如

context.getCounter("some log message", "value");

在 Map Reduce 的主体中。解决方案是减少日志大小和数量。

于 2012-12-05T13:01:23.750 回答
2

您可以在一次调用中写入的最大实体数为 500,目前,实体的大小最多为 1 MB。此处对此进行了描述:

Quotas_and_Limits

因此,您试图在实体中存储太多数据,因此会引发此异常。以下是您可以尝试的一些解决方法:

  • 将您的数据拆分为多个小于 1 MB 块的实体(不知道您的确切用例以具体评论)
  • 对特别大的文件使用S3
于 2012-12-15T15:56:26.850 回答
1

我最近遇到了同样的问题,并将其追溯到我的 Mapper 类。该类的成员数据被序列化为/形成切片之间的数据存储(我认为),并且我拥有大量超过 1MB 的数据。它与我的班级正在进行的对数据存储的任何显式调用无关。

于 2013-10-24T17:49:39.593 回答