Skip to content

improve grid/ti_summaries and grid/runs#64034

Open
henry3260 wants to merge 1 commit intoapache:mainfrom
henry3260:improve-grid--routes
Open

improve grid/ti_summaries and grid/runs#64034
henry3260 wants to merge 1 commit intoapache:mainfrom
henry3260:improve-grid--routes

Conversation

@henry3260
Copy link
Contributor

@henry3260 henry3260 commented Mar 21, 2026

Why

The previous implementation scaled poorly for DAGs that combine:

  • deep TaskGroup nesting
  • large dynamic task mapping
  • many task instances per run

In particular, grid/ti_summaries was keeping details during aggregation even though those details are not part of the response model. This meant we were paying extra CPU and memory cost for intermediate data that was never returned to the client.

I don't think we should keep details in the summary path, because the response does not include them. Keeping them only increases temporary allocations and repeated copying when task-group summaries are rolled up.

What

This improves the Grid API performance for DAGs with large mapped task instance cardinality.

There are two main changes:

  1. grid/runs no longer loads all task_instances / task_instances_histories for every DagRun on the page just to compute dag_versions.
    Instead, it fetches the paginated runs first and then performs a slimmer lookup for the distinct DAG versions needed for the response.

  2. grid/ti_summaries no longer builds and propagates full per-task-instance details lists while aggregating task groups.
    It now keeps only the summary data needed for the response, such as child state counts and min/max dates.

Why

The previous implementation scaled poorly for DAGs that combine:

  • deep TaskGroup nesting
  • large dynamic task mapping
  • many task instances per run

In particular, grid/ti_summaries was keeping details during aggregation even though those details are not part of the response model. This meant we were paying extra CPU and memory cost for intermediate data that was never returned to the client.

I don't think we should keep details in the summary path, because the response does not include them. Keeping them only increases temporary allocations and repeated copying when task-group summaries are rolled up.

Result

This keeps the API contract unchanged, but reduces unnecessary ORM loading and Python-side aggregation work.

On a local benchmark with 5 DAG runs and about 4002 task instances per run:

  • grid/runs median latency improved from ~2955 ms to ~43 ms
  • grid/runs tracemalloc peak dropped from ~104.6 MiB to ~1.7 MiB
  • grid/ti_summaries tracemalloc peak dropped from ~5.4 MiB to ~1.7 MiB

related: #63975

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Mar 21, 2026
@henry3260 henry3260 force-pushed the improve-grid--routes branch 2 times, most recently from 5d0add7 to 4691a38 Compare March 21, 2026 16:58
@henry3260 henry3260 marked this pull request as ready for review March 21, 2026 18:02
@henry3260 henry3260 force-pushed the improve-grid--routes branch from 4691a38 to de430af Compare March 22, 2026 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant