[SPARK-56175][SQL] FileTable implements SupportsPartitionManagement and V2 catalog table loading by LuciferYang · Pull Request #55034 · apache/spark

LuciferYang · 2026-03-26T13:42:11Z

What changes were proposed in this pull request?

This PR is part of SPARK-56170 (Remove file source V2 gate and unify V1/V2 file source paths). It makes three major changes:

1. FileTable implements SupportsPartitionManagement

FileTable now extends SupportsPartitionManagement with filesystem-based partition operations:

createPartition: creates partition directory and syncs to catalog metastore
dropPartition: deletes partition directory and syncs to catalog metastore
listPartitionIdentifiers: discovers partitions from filesystem directory structure
partitionSchema: returns partition schema from fileIndex, userSpecifiedPartitioning, or catalogTable

2. V2SessionCatalog.loadTable returns FileTable

V2SessionCatalog.loadTable now returns the V2 FileTable (e.g., ParquetTable, OrcTable, CSVTable) instead of V1Table for file-based catalog tables. This enables V2 capabilities for catalog tables:

Sets catalogTable and useCatalogFileIndex on FileTable for catalog metadata access
getDataSourceOptions includes storage.properties for proper option propagation (CSV header, ORC bloom filter columns, etc.)
FileTable.columns() restores NOT NULL constraints from catalog table metadata
FileTable.partitioning() falls back to catalog partition columns when fileIndex has no partition info
FileTable.fileIndex uses CatalogFileIndex when catalog has registered partitions with custom locations
FileTable.schema checks column name duplication for non-catalog tables only (catalog tables are handled by the analyzer)

3. Gate removal and V1 fallbacks

DataSourceV2Utils.getTableProvider: Removed FileDataSourceV2 gate that prevented V2 provider resolution for file sources
DataFrameWriter.insertInto: Enabled V2 path for file sources (previously blocked with TODO)
DataFrameWriter.saveAsTable: Kept V1 fallback because Overwrite mode creates ReplaceTableAsSelect which requires StagingTableCatalog (TODO: SPARK-56230)
ResolveSessionCatalog: Added V1 command fallbacks for FileTable-backed session catalog tables. Since FileTable doesn't match V1 extractors (ResolvedV1TableIdentifier, etc.), we intercept these commands and delegate to V1 using catalogTable metadata:
- AnalyzeTable, AnalyzeColumn
- TruncateTable, TruncatePartition
- ShowPartitions
- RecoverPartitions
- AddPartitions, RenamePartitions, DropPartitions
- SetTableLocation
- CREATE TABLE data type validation and REPLACE TABLE blocking for file sources
FindDataSourceTable: Added streaming V1 fallback for FileTable. Since FileTable lacks MICRO_BATCH_READ/CONTINUOUS_READ capabilities, streaming reads from catalog file tables fall back to V1 StreamingRelation (TODO: SPARK-56233)
DataSource.planForWritingFileFormat: Changed .collect{}.head to .collectFirst + .flatMap to gracefully handle V2 tables (which resolve to DataSourceV2Relation instead of LogicalRelation)

Helper: `partSpecToMap`

Added a partSpecToMap helper in ResolveSessionCatalog that converts PartitionSpec (either ResolvedPartitionSpec or UnresolvedPartitionSpec) to V1's Map[String, String] format. This is needed because ResolvePartitionSpec may resolve partition specs before ResolveSessionCatalog runs (both are in the Resolution batch), and calling asUnresolvedPartitionSpecs on already-resolved specs would throw ClassCastException.

Why are the changes needed?

This is a key step toward unifying V1/V2 file source paths (SPARK-56170). By having V2SessionCatalog return native V2 FileTable instances for file-based catalog tables, we enable:

V2 partition management for file tables
V2 read/write paths for catalog table operations
Gradual removal of V1 fallback code as V2 capabilities mature

The V1 fallbacks in ResolveSessionCatalog ensure backward compatibility for commands that don't yet have V2-native implementations.

Does this PR introduce any user-facing change?

No. The behavior is functionally equivalent. Internally, catalog file tables are now loaded as V2 FileTable instances, but all user-visible operations produce the same results through V1 command fallbacks where needed.

How was this patch tested?

Existing tests updated to handle both V1 and V2 plan structures:

DataStreamTableAPISuite: streaming read with file source tables
InMemoryColumnarQuerySuite: catalog stats after ANALYZE TABLE (TODO: SPARK-56232 for V2 stats propagation)
CSVv2Suite: CSV parsing with char/varchar type columns
JsonV2Suite: case sensitivity of filter references
OrcSourceV2Suite: bloom filter creation and selective dictionary encoding
OrcV2QuerySuite: ORC file format detection in query plans
ParquetV2QuerySuite: INT96 to TIMESTAMP_MICROS migration
FileSourceSQLInsertTestSuite: partition spec handling with keepPartitionSpecAsStringLiteral

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 4.6

…Frame API writes and delete FallBackFileSourceV2 Key changes: - FileWrite: added partitionSchema, customPartitionLocations, dynamicPartitionOverwrite, isTruncate; path creation and truncate logic; dynamic partition overwrite via FileCommitProtocol - FileTable: createFileWriteBuilder with SupportsDynamicOverwrite and SupportsTruncate; capabilities now include TRUNCATE and OVERWRITE_DYNAMIC; fileIndex skips file existence checks when userSpecifiedSchema is provided (write path) - All file format writes (Parquet, ORC, CSV, JSON, Text, Avro) use createFileWriteBuilder with partition/truncate/overwrite support - DataFrameWriter.lookupV2Provider: enabled FileDataSourceV2 for non-partitioned Append and Overwrite via df.write.save(path) - DataFrameWriter.insertInto: V1 fallback for file sources (TODO: SPARK-56175) - DataFrameWriter.saveAsTable: V1 fallback for file sources (TODO: SPARK-56230, needs StagingTableCatalog) - DataSourceV2Utils.getTableProvider: V1 fallback for file sources (TODO: SPARK-56175) - Removed FallBackFileSourceV2 rule - V2SessionCatalog.createTable: V1 FileFormat data type validation

…catalog table loading, and gate removal Key changes: - FileTable extends SupportsPartitionManagement with createPartition, dropPartition, listPartitionIdentifiers, partitionSchema - Partition operations sync to catalog metastore (best-effort) - V2SessionCatalog.loadTable returns FileTable instead of V1Table, sets catalogTable and useCatalogFileIndex on FileTable - V2SessionCatalog.getDataSourceOptions includes storage.properties for proper option propagation (header, ORC bloom filter, etc.) - V2SessionCatalog.createTable validates data types via FileTable - FileTable.columns() restores NOT NULL constraints from catalogTable - FileTable.partitioning() falls back to userSpecifiedPartitioning or catalog partition columns - FileTable.fileIndex uses CatalogFileIndex when catalog has registered partitions (custom partition locations) - FileTable.schema checks column name duplication for non-catalog tables only - DataSourceV2Utils.getTableProvider: removed FileDataSourceV2 gate - DataFrameWriter.insertInto: enabled V2 for file sources - DataFrameWriter.saveAsTable: V1 fallback (TODO: SPARK-56230) - ResolveSessionCatalog: V1 fallback for FileTable-backed commands (AnalyzeTable, AnalyzeColumn, TruncateTable, TruncatePartition, ShowPartitions, RecoverPartitions, AddPartitions, RenamePartitions, DropPartitions, SetTableLocation, CREATE TABLE validation, REPLACE TABLE blocking) - FindDataSourceTable: streaming V1 fallback for FileTable (TODO: SPARK-56233) - DataSource.planForWritingFileFormat: graceful V2 handling

LuciferYang · 2026-03-26T13:45:24Z

This pr takes #54998 as the baseline. It is the second step for SPARK-56170. Commit f853912 contains the actual changes for SPARK-56175. SPARK-56170 seems to involve a considerable number of tasks, and I'm not sure if all of them can be completed before the 4.2 release.

LuciferYang added 2 commits March 26, 2026 14:55

LuciferYang marked this pull request as draft March 26, 2026 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56175][SQL] FileTable implements SupportsPartitionManagement and V2 catalog table loading#55034

[SPARK-56175][SQL] FileTable implements SupportsPartitionManagement and V2 catalog table loading#55034
LuciferYang wants to merge 2 commits intoapache:masterfrom
LuciferYang:SPARK-56175

LuciferYang commented Mar 26, 2026 •

edited

Loading

Uh oh!

LuciferYang commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

1. FileTable implements SupportsPartitionManagement

2. V2SessionCatalog.loadTable returns FileTable

3. Gate removal and V1 fallbacks

Helper: partSpecToMap

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LuciferYang commented Mar 26, 2026 •

edited

Loading

Helper: `partSpecToMap`