我正在将 Airflow 从版本 1.10 升级到 2.1.0。我的项目用于KubernetesPodOperator
在KubernetesExecutor
. 在 Airflow 1.10 中一切正常。但是当我升级 Airflow 2.1.0 时,Pod 能够运行任务,并且在成功完成后,它会以CrashLoopBackoff
状态重新启动。我已经检查过了livenessProbe
,它按预期工作。我检查了其他日志,但在指定的任何容器或 pod 中都找不到任何问题。
部署.yaml 文件:
# Airflows
apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow
spec:
selector:
matchLabels:
app: airflow
replicas: 1
template:
metadata:
labels:
app: airflow
spec:
hostAliases:
- ip: "xx.xx.xx.xx"
hostnames:
- "xxx.xxx.xxx"
initContainers:
- name: init-db
image: "{{ .Values.dags_image.repository }}:{{ .Values.dags_image.tag }}"
imagePullPolicy: Always
command:
- "/bin/sh"
args:
- "-c"
- "/usr/local/bin/bootstrap.sh"
env:
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: AIRFLOW__CORE__SQL_ALCHEMY_CONN
name: airflow-secrets
- name: AFPW
valueFrom:
secretKeyRef:
key: AFPW
name: airflow-secrets
containers:
- name: web
image: "{{ .Values.dags_image.repository }}:{{ .Values.dags_image.tag }}"
imagePullPolicy: Always
ports:
- name: web
containerPort: 8080
command:
- "airflow"
args:
- "webserver"
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 240
periodSeconds: 60
env:
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: AIRFLOW__CORE__SQL_ALCHEMY_CONN
name: airflow-secrets
## The following values have been created as part of production setup
- name: scheduler
image: "{{ .Values.dags_image.repository }}:{{ .Values.dags_image.tag }}"
imagePullPolicy: Always
command:
- "airflow"
args:
- "scheduler"
env:
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: AIRFLOW__CORE__SQL_ALCHEMY_CONN
name: airflow-secrets
描述吊舱:
Name: airflow-66776dc57c-z98vd
Namespace: default
Priority: 0
Node: gke-gke-xxxxx-de-nodes-xxxxx--ccb62dc3-24us/xxx.xx.xx.xx
Start Time: Sat, 19 Jun 2021 17:49:16 +0000
Labels: app=airflow
pod-template-hash=66776dc57c
Annotations: <none>
Status: Running
IP: xxx.xx.xx.xx
IPs:
IP: xxx.xx.xx.xx
Controlled By: ReplicaSet/airflow-66776dc57c
Init Containers:
init-db:
Container ID: xxxxxxxxx
Image: xxxxxxxxx
Image ID: xxxxxxxxx
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
/usr/local/bin/bootstrap.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jun 2021 17:50:04 +0000
Finished: Sat, 19 Jun 2021 17:50:23 +0000
Ready: True
Restart Count: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kw529 (ro)
Containers:
web:
Container ID: xxxxxxxxx
Image: xxxxxxxxx
Image ID: xxxxxxxxx
Port: 8080/TCP
Host Port: 0/TCP
Command:
airflow
Args:
webserver
State: Running
Started: Sat, 19 Jun 2021 17:50:24 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/ delay=240s timeout=1s period=60s #success=1 #failure=3
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kw529 (ro)
scheduler:
Container ID: xxxxxxxxx
Image: xxxxxxxxx
Image ID: xxxxxxxxx
Port: <none>
Host Port: <none>
Command:
airflow
Args:
scheduler
State: Running
Started: Sat, 19 Jun 2021 17:50:25 +0000
Ready: True
Restart Count: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kw529 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-kw529:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kw529
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s