branch-4.0: [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode #60814#61011
Closed
github-actions[bot] wants to merge 1 commit intobranch-4.0from
Closed
Conversation
…g term in lucene mode (#60814) ### What problem does this PR solve? Issue Number: close #DORIS-24545 Problem Summary: In `search()` function's lucene mode, queries with mixed explicit and implicit operators produce different results from Elasticsearch. For example: - Query: `"Sumer" OR Ptolemaic\ dynasty Limonene` with `default_operator=AND` - ES result: 1 row - Doris result: 0 rows (before fix) **Root cause:** In Lucene's `QueryParserBase.addClause()`, only explicit `CONJ_AND`/`CONJ_OR` modify the preceding term's occur. Implicit conjunction (`CONJ_NONE`, i.e., space-separated terms without an explicit operator) only affects the **current** term via `default_operator`, without modifying the preceding term. The FE `SearchDslParser.hasExplicitAndBefore()` incorrectly returned `true` (based on `default_operator`) when no explicit AND token was found. This caused implicit conjunction to be treated identically to explicit AND, making it modify the preceding term's occur — diverging from Lucene/ES semantics. **Example of the bug:** For `a OR b c` with `default_operator=AND`: - Before fix: `SHOULD(a) MUST(b) MUST(c)` — wrong, implicit space before `c` incorrectly upgraded `b` from SHOULD to MUST - After fix: `SHOULD(a) SHOULD(b) MUST(c)` — correct, matches ES behavior. Only `c` gets MUST (from default_operator), `b` retains SHOULD (from the preceding OR) **Fix:** `hasExplicitAndBefore()` now returns `false` when no explicit AND token is found, regardless of `default_operator`. Only explicit AND tokens trigger the "introduced by AND" logic that modifies preceding terms.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
Contributor
FE UT Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #60814