我在 kubernetes 集群上运行气流(v1.10.14),每个任务都使用 KubernetesExecutor 运行(env var AIRFLOW__CORE__EXECUTOR:“KubernetesExecutor”)并且想在失败时发送电子邮件,所以我生成了一些故意失败的东西测试...但是当我'email_on_failure': True
在我的 dag 气流日志上设置报告失败但邮件从未发送时,任务永远在进行中,并且执行程序创建的 kubernetes pod 不会死...当设置为False
一切按预期失败时(任务标记为失败,pod 死亡)
我没有在气流和 kubernetes 日志或事件上获得相关日志。
我正在通过环境变量(AIRFLOW__SMTP__SMTP_HOST、AIRFLOW__SMTP__SMTP_MAIL_FROM、AIRFLOW__SMTP__SMTP_USER 和 AIRFLOW__SMTP__SMTP_PASSWORD)将 SMTP 设置为 Web 服务器和调度程序,但我的猜测是 kubernetes 执行程序可能正在尝试发送此邮件并且没有所需的信息(尽管我做到了怎么找不到)?
这是我的简单 dag 的代码:
import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'bla',
'depends_on_past': False,
'start_date': datetime(2021, 03, 10),
'email': ['some@email.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=1)
}
dag = DAG(
'example_send_mail',
default_args=default_args,
dagrun_timeout=timedelta(minutes=8)
)
t1 = BashOperator(
task_id='task_1',
bash_command='ech "Hello World from Task 1"', #intentional failure
execution_timeout=timedelta(minutes=2),
dag=dag
)
t1
在 Airflow 的任务日志中,我得到:
*** Reading local file: /opt/airflow/logs/example_send_mail/task_1/2021-03-10T21:06:11.812006+00:00/2.log
Dependencies all met for <TaskInstance: example_send_mail.task_1 2021-03-10T21:06:11.812006+00:00 [queued]>
Dependencies all met for <TaskInstance: example_send_mail.task_1 2021-03-10T21:06:11.812006+00:00 [queued]>
--------------------------------------------------------------------------------
Starting attempt 2 of 2
--------------------------------------------------------------------------------
Executing <Task(BashOperator): task_1> on 2021-03-10T21:06:11.812006+00:00
Started process 12 to run task
Running: ['airflow', 'run', 'example_send_mail', 'task_1', '2021-03-10T21:06:11.812006+00:00', '--job_id', '2149', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/example_send_mail.py', '--cfg_path', '/tmp/tmpuq1r9xca']
Job 2149: Subtask task_1
Running <TaskInstance: example_send_mail.task_1 2021-03-10T21:06:11.812006+00:00 [running]> on host examplesendmailtask1-6b930c5a2217466584c9205e337bb9d6
Tmp dir root location:
/tmp
Temporary script location: /tmp/airflowtmpsqpgnz3b/task_1k_f821as
Running command: ech "Hello World from Task 1"
Output:
/tmp/airflowtmpsqpgnz3b/task_1k_f821as: line 1: ech: command not found
Command exited with return code 127
Bash command failed
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 979, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/bash_operator.py", line 165, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
Marking task as FAILED. dag_id=example_send_mail, task_id=task_1, execution_date=20210310T210611, start_date=20210310T210730, end_date=20210310T210731
但是 dag 仍然处于“运行”状态,尽管它说它正在将其标记为 FAILED
在我得到的 pod 的 kubernetes 日志中:
kubectl logs -n airflow examplesendmailtask1-6b930c5a2217466584c9205e337bb9d6
[2021-03-10 21:07:30,677] {__init__.py:50} INFO - Using executor LocalExecutor
[2021-03-10 21:07:30,677] {dagbag.py:417} INFO - Filling up the DagBag from /opt/airflow/dags/git/example_send_mail.py
Running <TaskInstance: example_send_mail.task_1 2021-03-10T21:06:11.812006+00:00 [queued]> on host examplesendmailtask1-6b930c5a2217466584c9205e337bb9d6
它一直在运行
有任何想法吗?