我有大约 4000 个文件(每个平均约 7MB)输入。
当数据大小达到大约 4GB 时,我的管道在步骤 CoGroupByKey 上总是失败。我试图限制只使用 300 个文件然后它运行得很好。
如果失败,GCP 数据流上的日志仅显示:
Workflow failed. Causes: S24:CoGroup Geo data/GroupByKey/Read+CoGroup Geo data/GroupByKey/GroupByWindow+CoGroup Geo data/Map(_merge_tagged_vals_under_key) failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers:
store-migration-10212040-aoi4-harness-m7j7
Root cause: The worker lost contact with the service.,
store-migration-xxxxx
Root cause: The worker lost contact with the service.,
store-migration-xxxxx
Root cause: The worker lost contact with the service.,
store-migration-xxxxx
Root cause: The worker lost contact with the service.
我在日志资源管理器中挖掘所有日志。除上述内容外,没有其他任何指示错误,即使是我的logging.info
和try...except
代码。
认为这与实例的记忆有关,但我没有深入那个方向。因为它有点像我在使用 GCP 服务时不想担心的事情。
谢谢。