1

我们使用最新的气流版本部署了 GCP ComposerV2。它完美地工作。但有时“airflow_monitoring”预定义的 DAG 会崩溃。

以下是该问题的日志:

*** Log file is not found: gs://********/logs/airflow_monitoring/echo/2021-12-14T12:36:55+00:00/1.log. The task might not have been executed or worker executing it might have finished abnormally (e.g. was evicted)
*** 404 GET https://storage.googleapis.com/download/storage/v1/b/********/o/logs%2Fairflow_monitoring%2Fecho%2F2021-12-14T12%3A36%3A55%2B00%3A00%2F1.log?alt=media: No such object: ********/logs/airflow_monitoring/echo/2021-12-14T12:36:55+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)

我们没有改变任何东西,这个问题是随机发生的。

这是“airflow_monitoring”预定义 DAG 的代码:

"""A liveness prober dag for monitoring composer.googleapis.com/environment/healthy."""
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta

default_args = {
'start_date': airflow.utils.dates.days_ago(0),
'retries': 1,
'retry_delay': timedelta(minutes=5)
}

dag = DAG(
'airflow_monitoring',
default_args=default_args,
description='liveness monitoring dag',
schedule_interval=None,
dagrun_timeout=timedelta(minutes=20))

# priority_weight has type int in Airflow DB, uses the maximum.
t1 = BashOperator(
task_id='echo',
bash_command='echo test',
dag=dag,
depends_on_past=False,
priority_weight=2**31-1)
4

1 回答 1

0

我认为日志说明了一切:

*** Log file is not found: gs://********/logs/airflow_monitoring/echo/2021-12-14T12:36:55+00:00/1.log. The task might not have been executed or worker executing it might have finished abnormally (e.g. was evicted)

Kubernetes 环境可能会不时驱逐一个正在运行的任务(例如,当它因为磁盘崩溃或因为机器需要重新启动而故障转移到另一个节点时。

我认为您应该将重试设置为 2,并且在这种情况下它应该自动重试。

于 2021-12-14T17:55:14.747 回答