docs: document datetime rebasing and V2 API limitations for DataFusion-based scans#3259
Merged
comphead merged 1 commit intoapache:mainfrom Jan 25, 2026
Merged
Conversation
…n-based scans Add two new limitations to the shared limitations section for native_datafusion and native_iceberg_compat scan implementations: 1. No support for datetime rebasing detection or the spark.comet.exceptionOnDatetimeRebase configuration. When reading Parquet files with dates/timestamps written before Spark 3.0 (hybrid Julian/Gregorian calendar), these implementations cannot detect legacy values and may produce incorrect results for dates before October 15, 1582. 2. No support for Spark's Datasource V2 API. When V2 is enabled, Comet falls back to native_comet. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
comphead
approved these changes
Jan 25, 2026
Contributor
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove for documenting this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
native_datafusionandnative_iceberg_compatdo not support datetime rebasing detectionBackground
While investigating
ParquetDatetimeRebaseSuitetests that explicitly setnative_comet, we discovered these are intentional limitations of the DataFusion-based scan implementations, not test issues.Datetime Rebasing
Parquet files written before Spark 3.0 may contain dates/timestamps using the hybrid Julian/Gregorian calendar. The
native_cometimplementation:SparkExceptionwhenspark.comet.exceptionOnDatetimeRebase=trueThe DataFusion-based implementations (
native_datafusion,native_iceberg_compat) do not have this detection capability and read all dates/timestamps as Proleptic Gregorian, which may produce incorrect results for dates before October 15, 1582.Datasource V2 API
The DataFusion-based implementations only support Spark's V1 datasource API. When
spark.sql.sources.useV1SourceListdoes not includeparquet, Comet falls back tonative_comet.Test plan
🤖 Generated with Claude Code