Skip to content

[SPARK-31561][SQL] Add QUALIFY Clause#55019

Open
sunchao wants to merge 1 commit intoapache:masterfrom
sunchao:SPARK-31561
Open

[SPARK-31561][SQL] Add QUALIFY Clause#55019
sunchao wants to merge 1 commit intoapache:masterfrom
sunchao:SPARK-31561

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Mar 25, 2026

What changes were proposed in this pull request?

This PR adds support for the QUALIFY clause in Spark SQL. The implementation updates the SQL parser and AST builder to recognize QUALIFY, carries it through analysis with a dedicated marker expression, and rewrites it after window extraction so the predicate is evaluated after window functions are materialized.

In addition to basic QUALIFY support, the analyzer enforces the intended semantics for the current query:

  • QUALIFY requires at least one window function in the current SELECT list or QUALIFY predicate.
  • Standalone aggregate functions in the QUALIFY predicate are rejected with a targeted error that points at the offending aggregate expression.
  • Aggregate aliases are allowed in QUALIFY
  • HAVING ... QUALIFY ... is handled correctly so HAVING filters grouped rows before window evaluation and QUALIFY filters after the window result is computed.

The PR also adds documentation and parser/analyzer coverage for positive and negative cases, including alias-based predicates, non-ANSI keyword handling, and grouped queries.

Why are the changes needed?

QUALIFY is supported by several popular SQL engines including Snowflake, Databricks SQL etc, and users expect it when porting SQL that filters on window-function results. Without it, equivalent Spark queries need an extra subquery or CTE just to filter on a window alias.

This change closes that gap and makes Spark SQL more compatible with existing SQL workloads while preserving clear analyzer rules around window and aggregate semantics.

Does this PR introduce any user-facing change?

Yes. Spark SQL can now parse and analyze queries that use QUALIFY, for example:

SELECT a, ROW_NUMBER() OVER (ORDER BY b) AS rn
FROM t
QUALIFY rn = 1

This PR also introduces user-visible analysis errors for invalid QUALIFY usage, such as using aggregate functions directly in the QUALIFY predicate.

How was this patch tested?

Added new unit tests and e2e tests

Was this patch authored or co-authored using generative AI tooling?

Yes. Generated by ChatGPT 5.4 High.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant