1

The application I'm working on has a lot of mapreduce cron jobs running, and from time to time some of them produce errors (mosty ApplicationErrors, TransientErrors, DatabaseErrors, TimeOuts, etc), that are somewhat sporadic and for the most part don't bother me.

However, while debugging and testing, I find it's impossible to attribute which jobs caused which errors. The logs usually just give me the instance, but no hint even to the id of the job. The url is just the generic /mapreduce/worker_callback so no help there either.

I feel like I am missing something, or is there really no way of determining which log belongs to which MR pipeline, or the other way around - to find logs specific to a certain MR pipeline?

4

1 回答 1

1

在您的日志中,您有task_name=appengine-mrshard-158112310423699B53FC1-22-0. 该158112310423699B53FC1部分对应于特定的作业 ID。此作业的详细信息通常可以在 url-to-your-app/mapreduce 中找到。这样,您就可以找到您给该工作起的名字。

查看作业的详细信息

要查看特定作业 ID 的详细信息(例如158112310423699B53FC1):

appid.appspot.com/mapreduce/detail?mapreduce_id=158112310423699B53FC1

查看整个管道

可以使用以下步骤从作业 ID 中查找根管道 ID。

  1. _AE_MR_MapreduceState使用 Job ID查询表。使用数据存储查看器:

    SELECT * FROM _AE_MR_MapreduceState WHERE __key__ = Key('_AE_MR_MapreduceState','158112310423699B53FC1')
    

    管道 ID 可以在mapreduce_spec列中找到pipeline_id

  2. 找到的管道 ID 可能不是根管道 ID。要查找根管道 ID,请查询_AE_Pipeline_Record. 使用数据存储查看器:

    SELECT * FROM _AE_Pipeline_Record WHERE __key__ = Key('_AE_Pipeline_Record', '653a3bd9a90f11e28ff6a3556e435fbc')
    

    列 root_pipeline 的键是 MapReduce 作业的根管道 ID。

  3. 最后,使用根管道键的名称,您可以在此处查看整个 MapReduce 管道:

    appid.appspot.com/mapreduce/pipeline/status?root=0607a90aa90f11e2bbfea3556e435fbc

于 2013-05-16T13:18:53.213 回答