improve grid/ti_summaries and grid/runs by henry3260 · Pull Request #64034 · apache/airflow

henry3260 · 2026-03-21T13:36:33Z

Why

The previous implementation scaled poorly for DAGs that combine:

deep TaskGroup nesting
large dynamic task mapping
many task instances per run

In particular, grid/ti_summaries was keeping details during aggregation even though those details are not part of the response model. This meant we were paying extra CPU and memory cost for intermediate data that was never returned to the client.

I don't think we should keep details in the summary path, because the response does not include them. Keeping them only increases temporary allocations and repeated copying when task-group summaries are rolled up.

What

This improves the Grid API performance for DAGs with large mapped task instance cardinality.

There are two main changes:

grid/runs no longer loads all task_instances / task_instances_histories for every DagRun on the page just to compute dag_versions.
Instead, it fetches the paginated runs first and then performs a slimmer lookup for the distinct DAG versions needed for the response.
grid/ti_summaries no longer builds and propagates full per-task-instance details lists while aggregating task groups.
It now keeps only the summary data needed for the response, such as child state counts and min/max dates.

Why

The previous implementation scaled poorly for DAGs that combine:

deep TaskGroup nesting
large dynamic task mapping
many task instances per run

In particular, grid/ti_summaries was keeping details during aggregation even though those details are not part of the response model. This meant we were paying extra CPU and memory cost for intermediate data that was never returned to the client.

I don't think we should keep details in the summary path, because the response does not include them. Keeping them only increases temporary allocations and repeated copying when task-group summaries are rolled up.

Result

This keeps the API contract unchanged, but reduces unnecessary ORM loading and Python-side aggregation work.

On a local benchmark with 5 DAG runs and about 4002 task instances per run:

grid/runs median latency improved from ~2955 ms to ~43 ms
grid/runs tracemalloc peak dropped from ~104.6 MiB to ~1.7 MiB
grid/ti_summaries tracemalloc peak dropped from ~5.4 MiB to ~1.7 MiB

related: #63975

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

boring-cyborg bot added the area:API Airflow's REST/HTTP API label Mar 21, 2026

henry3260 force-pushed the improve-grid--routes branch 2 times, most recently from 5d0add7 to 4691a38 Compare March 21, 2026 16:58

henry3260 marked this pull request as ready for review March 21, 2026 18:02

henry3260 requested review from bugraoz93, choo121600, ephraimbuddy, jason810496, pierrejeambrun, rawwar and shubhamraj-git as code owners March 21, 2026 18:02

improve grid/ti_summaries and grid/runs

de430af

henry3260 force-pushed the improve-grid--routes branch from 4691a38 to de430af Compare March 22, 2026 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve grid/ti_summaries and grid/runs#64034

improve grid/ti_summaries and grid/runs#64034
henry3260 wants to merge 1 commit intoapache:mainfrom
henry3260:improve-grid--routes

henry3260 commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

henry3260 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Why

Result

Was generative AI tooling used to co-author this PR?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

henry3260 commented Mar 21, 2026 •

edited

Loading