r/apache_airflow

Just wrapped an Airflow 2.8 to 3.1 migration on EKS for a client. 18 DAGs, 6 weeks, zero downtime. Posting from our company account, I'm Amjad, founder of Tasrie. Happy to answer technical stuff in comments or DMs.

The DAG code changes were almost nothing. About 2 days of work:

# Out
from airflow.contrib.operators.ssh_operator import SSHOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.db import provide_session

# In
from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.operators.empty import EmptyOperator
from airflow.utils.session import provide_session

Plus schedule_interval to schedule. Ruff with --select AIR301,AIR302 --fix caught 80% of it automatically.

The infra was the real work. Key decisions:

  • Green field over in-place. Old metadata DB had years of drift. Fresh cluster + DNS cutover beat nursing a schema migration.
  • KubernetesExecutor, no Celery, no Redis.
  • 2 schedulers with pod anti-affinity. HA is finally native in 3.x.
  • Triggerer as StatefulSet, capacity 1000 for deferrable sensors.
  • Git-sync sidecar, SSH on port 443 to bypass corp firewalls.
  • EFS for DAGs. EBS RWO breaks the moment you have a second node.

Stuff that surprised me:

  • Webserver command is now api-server. Wasted an hour before I caught it.
  • DAG processor as a separate process actually works. No more heavy top-level imports stalling the scheduler.
  • LDAP gotcha: FAB auth manager still gives you the old Flask login page, not the new Airflow 3 UI. Functional but ugly. There's an open discussion in apache/airflow about a native LDAP auth manager but nothing shipped.

Two things I'm curious about:

How are you sizing the dag-processor vs the scheduler? Same pod or split out?

Anyone running Airflow 3 with non-FAB auth that handles LDAP or SAML cleanly?

Full writeup with all the manifests, RBAC, EFS storageclass, and pod template is here: https://tasrieit.com/blog/upgrade-airflow-2-to-3-kubernetes-migration

Airflow 2 EOL is April 2026. If you're still on 2.x, it's less scary than it looks.

reddit.com
u/tasrieitservices — 7 days ago

I manage a bunch of Airflow Instances for my organization, and have been educating people on writing better DAGs which don't over load the DB, while making improvements to bring stability to all the instances.

I have one instance in particular where around 100 DAGs run at the same time, and some of these DAGs run tasks for hours. Is that a good use of Airflow, or should I be breaking these tasks down to finish up and quit faster and break down into batches of tasks?

reddit.com
u/WhatASave83 — 10 days ago