fix: enable more Spark SQL tests for native_datafusion (DynamicPartitionPruningSuite / ExplainSuite)#3694
Open
andygrove wants to merge 7 commits intoapache:mainfrom
Open
Conversation
Update CometNativeScan.isDynamicPruningFilter to check for DynamicPruningExpression in addition to PlanExpression. The previous check only caught dynamic DPP (with subqueries) but missed static DPP where Spark resolves the pruning expression to a literal wrapped in DynamicPruningExpression. Closes apache#3313
- Add verboseStringWithOperatorId() with Location abbreviation so EXPLAIN FORMATTED shows scan metadata correctly - Share driver metrics (numFiles, filesSize, numPartitions, etc.) from the underlying CometScanExec so Spark scan metric tests pass - Add CometNativeScanExec to SparkPlanInfo metadata extraction and DPP test scan pattern matching in the Spark diff
The isDynamicPruningFilter check in CometNativeScan only needs to check for PlanExpression since DPP subqueries always contain one. The DynamicPruningExpression wrapper check is not needed.
This test is now covered by the Spark SQL test suite with native_datafusion scan mode enabled.
No changes to CometNativeScan.scala are needed for this PR.
native_datafusion
native_datafusionnative_datafusion (DynamicPartitionPruningSuite / ExplainSuite)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #3313.
Rationale for this change
Two Spark SQL tests were skipped for native_datafusion scan mode via
IgnoreCometNativeDataFusion:DynamicPartitionPruningSuite- "static scan metrics"ExplainSuite- "explain formatted - check presence of subquery in case of DPP"These tests failed because
CometNativeScanExecwas missing driver-side scan metrics (numFiles,filesSize,numPartitions, etc.) and properEXPLAIN FORMATTEDoutput with Location abbreviation.What changes are included in this PR?
CometNativeScanExec (Comet code):
verboseStringWithOperatorId()with Location path abbreviation, matchingFileSourceScanExecandCometScanExecbehavior soEXPLAIN FORMATTEDoutput is correctnumFiles,filesSize,numPartitions,metadataTime,staticFilesNum,staticFilesSize,pruningTime) from the underlyingCometScanExecso Spark scan metric assertions passSpark diff (
3.5.8.diff):CometNativeScanExectoSparkPlanInfometadata extraction for event loggingCometNativeScanExecto DPP test scan pattern matchingIgnoreCometNativeDataFusiontags from both testsHow are these changes tested?
Tested locally by running both Spark SQL tests with
COMET_PARQUET_SCAN_IMPL=native_datafusion:DynamicPartitionPruningV1SuiteAEOff- "static scan metrics" - PASSEDExplainSuite- "explain formatted - check presence of subquery in case of DPP" - PASSEDAlso verified both tests pass with default (
auto) scan mode.