Skip to content

branch-4.0: [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode #60814#61011

Closed
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-60814-branch-4.0
Closed

branch-4.0: [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode #60814#61011
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-60814-branch-4.0

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Mar 3, 2026

Cherry-picked from #60814

…g term in lucene mode (#60814)

### What problem does this PR solve?

Issue Number: close #DORIS-24545

Problem Summary:

In `search()` function's lucene mode, queries with mixed explicit and
implicit operators produce different results from Elasticsearch. For
example:

- Query: `"Sumer" OR Ptolemaic\ dynasty Limonene` with
`default_operator=AND`
- ES result: 1 row
- Doris result: 0 rows (before fix)

**Root cause:** In Lucene's `QueryParserBase.addClause()`, only explicit
`CONJ_AND`/`CONJ_OR` modify the preceding term's occur. Implicit
conjunction (`CONJ_NONE`, i.e., space-separated terms without an
explicit operator) only affects the **current** term via
`default_operator`, without modifying the preceding term.

The FE `SearchDslParser.hasExplicitAndBefore()` incorrectly returned
`true` (based on `default_operator`) when no explicit AND token was
found. This caused implicit conjunction to be treated identically to
explicit AND, making it modify the preceding term's occur — diverging
from Lucene/ES semantics.

**Example of the bug:**

For `a OR b c` with `default_operator=AND`:
- Before fix: `SHOULD(a) MUST(b) MUST(c)` — wrong, implicit space before
`c` incorrectly upgraded `b` from SHOULD to MUST
- After fix: `SHOULD(a) SHOULD(b) MUST(c)` — correct, matches ES
behavior. Only `c` gets MUST (from default_operator), `b` retains SHOULD
(from the preceding OR)

**Fix:** `hasExplicitAndBefore()` now returns `false` when no explicit
AND token is found, regardless of `default_operator`. Only explicit AND
tokens trigger the "introduced by AND" logic that modifies preceding
terms.
@github-actions github-actions bot requested a review from yiguolei as a code owner March 3, 2026 16:37
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Mar 3, 2026
@dataroaring dataroaring reopened this Mar 3, 2026
@hello-stephen
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

@yiguolei yiguolei closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants