Skip to content

[SPARK-54509][DOCS] Add Documentation for Spark Data Source V2#55046

Closed
szehon-ho wants to merge 5 commits intoapache:masterfrom
szehon-ho:doc_dsv2
Closed

[SPARK-54509][DOCS] Add Documentation for Spark Data Source V2#55046
szehon-ho wants to merge 5 commits intoapache:masterfrom
szehon-ho:doc_dsv2

Conversation

@szehon-ho
Copy link
Copy Markdown
Member

Summary

  • Adds a new developer-facing documentation page (docs/sql-data-sources-v2.md) covering the Spark Data Source V2 API.
  • Documents the full DSv2 architecture: entry points (TableProvider, CatalogPlugin), catalog interfaces (TableCatalog, StagingTableCatalog, SupportsNamespaces, FunctionCatalog, ProcedureCatalog), Table abstraction and capabilities, read path (scan builder, pushdown, scan mix-ins), write path (write builder, batch write, distribution/ordering), row-level DML (SupportsDeleteV2, SupportsRowLevelOperations), expressions, and streaming.
  • Includes a feature comparison with DSv1, forward links between sections for navigation, and references to notable DSv2 users (Iceberg, Delta Lake, Lance, JDBC).

How was this patch tested?

Documentation-only change. Verified rendering locally via bundle exec jekyll serve.

Was this patch authored or co-authored using generative AI tooling?

Yes.

@szehon-ho
Copy link
Copy Markdown
Member Author

szehon-ho commented Mar 27, 2026

Screenshot 2026-03-26 at 6 11 47 PM

| `SupportsNamespaces` | Create, alter, drop, and list namespaces |
| `FunctionCatalog` | List and load functions |
| `ProcedureCatalog` | Load and list stored procedures |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ViewCatalog seems to be missing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, yes, initially i didnt put it as its not hooked up yet. But i put it now with a note.

for the full interface reference.
- [Data Sources](sql-data-sources.html) for the user-facing guide to built-in data sources (DSv1).
- [Storage Partition Join](sql-performance-tuning.html#storage-partition-join) for how DSv2
partitioning reporting enables join optimizations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention Python Data Source API here as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done

Copy link
Copy Markdown
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just nits.

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants