我正在尝试使用 flex 模板运行我的 python 数据流作业。当我使用直接运行器(没有弹性模板)运行时,作业在本地运行良好,但是当我尝试使用弹性模板运行它时,作业停留在“排队”状态一段时间,然后因超时而失败。
这是我在 GCE 控制台中找到的一些日志:
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/dataflow/template/requirements.txt', '--exists-action', 'i', '--no-binary', ':all:'
Shutting down the GCE instance, launcher-202011121540156428385273524285797, used for launching.
Timeout in polling result file: gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service my_service_account@developer.gserviceaccount.com may not have enough permissions to pull container image gcr.io/indigo-computer-272415/samples/dataflow/streaming-beam-py:latest or create new objects in gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
3. Transient errors occurred, please try again.
对于 1,我看不到有用的 lo。对于 2,服务帐户是默认服务帐户,因此它应该具有所有权限。
我该如何进一步调试呢?
这是我的 Docker 文件:
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
ADD localdeps localdeps
COPY requirements.txt .
COPY main.py .
COPY setup.py .
COPY bq_field_pb2.py .
COPY bq_table_pb2.py .
COPY core_pb2.py .
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
RUN pip install -U --no-cache-dir -r ./requirements.txt
我正在关注本指南 - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates