UN-3215 [FIX] Add LLMCompat bridge class to fix retriever LLM compatibility with llama-index#1788
UN-3215 [FIX] Add LLMCompat bridge class to fix retriever LLM compatibility with llama-index#1788pk-zipstack wants to merge 10 commits intomainfrom
Conversation
Summary by CodeRabbit
WalkthroughThe changes introduce a compatibility layer to adapt the SDK1 LLM interface for use with llama-index retriever components. This includes new emulated llama-index types (MessageRole, ChatMessage, ChatResponse, CompletionResponse, LLMMetadata) and a wrapper class LLMCompat in SDK1, along with a RetrieverLLM adapter in the prompt service that bridges the two interfaces. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Retriever as BaseRetriever
participant Converter as _get_llm()
participant LLMCompat
participant RetrieverLLM
participant LlamaIndex as llama-index<br/>Components
Client->>Retriever: Initialize with LLM
Retriever->>Converter: _get_llm(llm)
Converter->>LLMCompat: Wrap LLM instance
LLMCompat-->>Converter: Return LLMCompat
Converter->>RetrieverLLM: Initialize with LLMCompat
RetrieverLLM-->>Converter: Return RetrieverLLM instance
Converter-->>Retriever: Return RetrieverLLM
Retriever->>LlamaIndex: Pass RetrieverLLM to retriever ops
LlamaIndex->>RetrieverLLM: Call chat/complete methods
RetrieverLLM->>LLMCompat: Delegate to internal LLMCompat
LLMCompat->>LLMCompat: Route to chat/complete/predict
LLMCompat-->>RetrieverLLM: Return ChatResponse/CompletionResponse
RetrieverLLM-->>LlamaIndex: Return response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
…endency (#1793) * [FEAT] Rewrite LLMCompat to emulate llama-index interface without dependency - Add emulated llama-index types (MessageRole, ChatMessage, ChatResponse, CompletionResponse, LLMMetadata) as dataclasses - Rewrite LLMCompat to match llama-index LLM interface without inheritance - Implement chat(), complete(), achat(), acomplete() for retriever compatibility - Follow EmbeddingCompat initialization pattern (takes adapter params directly) - Remove llama-index imports from llm.py - Raise NotImplementedError for streaming methods not needed by retrievers This allows SubQuestionQueryEngine, QueryFusionRetriever, and other llama-index components to use LLMCompat without requiring llama-index as a dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Added retrieverLLM class to handle issues with retreivers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: pk-zipstack <praveen@zipstack.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
unstract/sdk1/src/unstract/sdk1/llm.py (4)
36-43: Preferenum.StrEnumover(str, Enum)dual inheritance.Python 3.11+ provides
StrEnumwhich is the idiomatic replacement. Since the codebase already uses|union types (Python 3.10+),StrEnumshould be available.♻️ Suggested change
-from enum import Enum +from enum import StrEnum -class MessageRole(str, Enum): +class MessageRole(StrEnum): """Emulates llama_index.core.base.llms.types.MessageRole."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 36 - 43, Replace the MessageRole class to inherit from enum.StrEnum instead of (str, Enum): import StrEnum from enum and change the class definition for MessageRole to subclass StrEnum; keep the same member names and values (SYSTEM, USER, ASSISTANT, FUNCTION, TOOL) so behavior and comparisons remain identical.
659-676:predictandapredictassume aPromptTemplate-like interface onprompt.These methods call
prompt.format_messages(llm=self, ...)andprompt.format(llm=self, ...), expecting a llama-indexPromptTemplateobject. The type is annotated asAny, which hides this contract. Since llama-index'sLLM.predictreceives aBasePromptTemplate, this should work — but if ever called with a plain string, it'll raiseAttributeError.This is acceptable for the bridge use case, but documenting the expected type (even as a comment) would help maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 659 - 676, predict and apredict assume a PromptTemplate-like object (they call prompt.format_messages(...) and prompt.format(...)) but are annotated as Any; update the contract by changing the type annotation from Any to the appropriate llama-index type (e.g. BasePromptTemplate or PromptTemplate) or at minimum add a clear inline comment/docstring stating the expected type, and add a simple runtime check in predict and apredict that raises a descriptive TypeError if the prompt lacks format/format_messages (e.g. if not hasattr(prompt, "format") and not hasattr(prompt, "format_messages")), referencing the predict/apredict methods and self.metadata.is_chat_model, so callers get a clear error instead of an AttributeError.
574-628:LLMCompatcreates a secondLLMinstance — consider accepting an existing one.
RetrieverLLM.__init__(inretriever_llm.py, Line 33-43) reads private attributes (llm._adapter_id,llm._adapter_metadata, etc.) from the SDK1LLMto constructLLMCompat, which then creates anotherLLMinternally (Line 613). This means every retriever bridge creates a duplicateLLMwith duplicate adapter validation, platform config fetching, and callback setup.Consider allowing
LLMCompatto accept an existingLLMinstance directly rather than always constructing a new one, to avoid this duplication and the reliance on private attributes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 574 - 628, LLMCompat currently always constructs a new LLM (in LLMCompat.__init__) causing duplicate LLMs; change the constructor to accept an optional existing LLM instance parameter (e.g., llm_instance: LLM | None = None) and if provided set self._llm_instance = llm_instance and skip creating a new LLM, while still setting self._tool, self._adapter_instance_id, self.model_name = self._llm_instance.get_model_name(), and self.callback_manager = self._llm_instance.callback_manager (or None) to preserve behavior; retain the original parameter list and only construct a new LLM when llm_instance is None so RetrieverLLM can pass the SDK1 LLM directly and avoid duplicated adapter validation and config fetching.
804-815: Add comment documenting the duck-typing contract for ChatMessage compatibility.
_to_litellm_messagesrelies on duck typing to accept both the emulatedChatMessage(defined in this file) and realllama_index.core.base.llms.types.ChatMessageobjects, sinceachat()andacomplete()may receive either. Both provide.roleand.contentattributes, with thegetattr(m.role, "value", ...)pattern handling both enum and string variants.While the
.contentattribute is maintained in llama-index as a backward-compatibility property (returningstr | Nonefrom the underlying block-based structure), a brief comment here clarifying the intentional duck-typing contract would improve maintainability and signal to future maintainers why both types are accepted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 804 - 815, Add a brief doc-comment above the _to_litellm_messages function explaining the duck-typing contract: that the function accepts both the emulated ChatMessage defined in this module and llama_index.core.base.llms.types.ChatMessage objects because both expose .role and .content, and that getattr(m.role, "value", str(m.role)) handles enum or string roles while m.content may be str | None; mention this is intentional to support inputs from achat() and acomplete().prompt-service/src/unstract/prompt_service/core/retrievers/retriever_llm.py (1)
15-15: Remove unusednoqadirectives flagged by Ruff.Static analysis (Ruff RUF100) reports 10 unused
noqadirectives across this file (N811,ANN401). These should be removed to keep the codebase clean.Also applies to: 33-33, 57-57, 72-72, 80-80, 88-88, 97-97, 112-112, 120-120, 128-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prompt-service/src/unstract/prompt_service/core/retrievers/retriever_llm.py` at line 15, Remove the redundant noqa directives reported by Ruff (RUF100) in retriever_llm.py: delete the unused "# noqa: N811" on the import "from llama_index.core.llms.llm import LLM as LlamaIndexBaseLLM" and likewise remove the other unused "# noqa" tags on the affected lines (the import/annotation lines referencing LlamaIndexBaseLLM and any annotated functions/classes flagged with ANN401). Ensure you only remove the "# noqa" comments (not the imports or annotations themselves), run Ruff/pytest to verify no new linter errors, and keep the symbols like LlamaIndexBaseLLM unchanged so references in the file remain valid.llm-predict-error-investigation.md (1)
1-221: Debug artifact should not be committed tomain.This file reads as a transient investigation note, not stable documentation. Committing it to
mainintroduces noise with no durable value: it already contains stale content (see below) and will continue diverging from the codebase over time.If a record of the design decision is needed, consider:
- Converting it to an Architecture Decision Record (ADR) in a
docs/adr/folder, keeping only the decision, rationale, and final chosen approach.- Moving the content into the PR description (where it already lives as context) and dropping the file entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@llm-predict-error-investigation.md` around lines 1 - 221, This investigation file is a debug artifact and should not be committed to main; remove llm-predict-error-investigation.md from the branch (or revert the commit) and either convert its essential outcome into a concise ADR under docs/adr/ (summarizing the chosen approach such as the SDK1LLMBridge, use of LiteLLM, or SimpleKeywordTableIndex and the BaseRetriever/KeywordTableIndex impact) or move the full investigative content into the PR description/temporary notes, keeping only the final decision and rationale in the repo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@llm-predict-error-investigation.md`:
- Around line 150-184: Update the document's Option 1 example to match the
actual two-layer implementation: show LLMCompat (from
unstract/sdk1/src/unstract/sdk1/llm.py) that emulates llama-index types and
RetrieverLLM (from prompt-service/.../core/retrievers/retriever_llm.py) which
inherits from llama_index.core.llms.llm.LLM and delegates to LLMCompat; replace
the single-class SDK1LLMBridge/CustomLLM example with a short description and
class names matching LLMCompat -> RetrieverLLM delegation and note that
RetrieverLLM inherits from llama_index.core.llms.llm.LLM (not CustomLLM).
- Line 5: Add explicit language specifiers to the two fenced code blocks to
satisfy MD040: change the traceback block that starts with "ERROR:
unstract.prompt_service.core.retrievers.keyword_table:" to use ```text and
change the block containing "BaseLLM -> LLM (has predict()) ->
CustomLLM/FunctionCallingLLM -> Provider" to use ```python so markdownlint
recognizes their languages.
In `@prompt-service/src/unstract/prompt_service/core/retrievers/retriever_llm.py`:
- Around line 33-43: Declare _compat as a Pydantic PrivateAttr at the class
level for v2 compatibility: import PrivateAttr from pydantic and add a class
attribute like "_compat: Any = PrivateAttr()" (or similar typed PrivateAttr) on
the retriever class, then keep the existing assignment to self._compat in
__init__ where LLMCompat(...) is created (referencing __init__, LLMCompat, and
the _compat attribute).
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 698-728: LLMCompat.chat() and LLMCompat.complete() call
litellm.completion() directly, skipping the SDK’s usage recording and structured
error wrapping used elsewhere; update these methods to (1) call into the
internal LLM instance’s usage recorder (e.g., invoke
self._llm_instance._record_usage(...) or the same _record_usage flow used by
LLM.complete) after getting the response so usage/audit data is emitted, (2)
wrap litellm.completion() in a try/except and re-raise errors as LLMError to
match callers' expectations, and (3) if metrics capture is required, apply the
same `@capture_metrics` behavior or add a TODO comment documenting intentional
omission; locate changes in the LLMCompat.chat and LLMCompat.complete methods
around the litellm.completion(...) calls and the response handling to implement
these fixes.
---
Nitpick comments:
In `@llm-predict-error-investigation.md`:
- Around line 1-221: This investigation file is a debug artifact and should not
be committed to main; remove llm-predict-error-investigation.md from the branch
(or revert the commit) and either convert its essential outcome into a concise
ADR under docs/adr/ (summarizing the chosen approach such as the SDK1LLMBridge,
use of LiteLLM, or SimpleKeywordTableIndex and the
BaseRetriever/KeywordTableIndex impact) or move the full investigative content
into the PR description/temporary notes, keeping only the final decision and
rationale in the repo.
In `@prompt-service/src/unstract/prompt_service/core/retrievers/retriever_llm.py`:
- Line 15: Remove the redundant noqa directives reported by Ruff (RUF100) in
retriever_llm.py: delete the unused "# noqa: N811" on the import "from
llama_index.core.llms.llm import LLM as LlamaIndexBaseLLM" and likewise remove
the other unused "# noqa" tags on the affected lines (the import/annotation
lines referencing LlamaIndexBaseLLM and any annotated functions/classes flagged
with ANN401). Ensure you only remove the "# noqa" comments (not the imports or
annotations themselves), run Ruff/pytest to verify no new linter errors, and
keep the symbols like LlamaIndexBaseLLM unchanged so references in the file
remain valid.
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 36-43: Replace the MessageRole class to inherit from enum.StrEnum
instead of (str, Enum): import StrEnum from enum and change the class definition
for MessageRole to subclass StrEnum; keep the same member names and values
(SYSTEM, USER, ASSISTANT, FUNCTION, TOOL) so behavior and comparisons remain
identical.
- Around line 659-676: predict and apredict assume a PromptTemplate-like object
(they call prompt.format_messages(...) and prompt.format(...)) but are annotated
as Any; update the contract by changing the type annotation from Any to the
appropriate llama-index type (e.g. BasePromptTemplate or PromptTemplate) or at
minimum add a clear inline comment/docstring stating the expected type, and add
a simple runtime check in predict and apredict that raises a descriptive
TypeError if the prompt lacks format/format_messages (e.g. if not
hasattr(prompt, "format") and not hasattr(prompt, "format_messages")),
referencing the predict/apredict methods and self.metadata.is_chat_model, so
callers get a clear error instead of an AttributeError.
- Around line 574-628: LLMCompat currently always constructs a new LLM (in
LLMCompat.__init__) causing duplicate LLMs; change the constructor to accept an
optional existing LLM instance parameter (e.g., llm_instance: LLM | None = None)
and if provided set self._llm_instance = llm_instance and skip creating a new
LLM, while still setting self._tool, self._adapter_instance_id, self.model_name
= self._llm_instance.get_model_name(), and self.callback_manager =
self._llm_instance.callback_manager (or None) to preserve behavior; retain the
original parameter list and only construct a new LLM when llm_instance is None
so RetrieverLLM can pass the SDK1 LLM directly and avoid duplicated adapter
validation and config fetching.
- Around line 804-815: Add a brief doc-comment above the _to_litellm_messages
function explaining the duck-typing contract: that the function accepts both the
emulated ChatMessage defined in this module and
llama_index.core.base.llms.types.ChatMessage objects because both expose .role
and .content, and that getattr(m.role, "value", str(m.role)) handles enum or
string roles while m.content may be str | None; mention this is intentional to
support inputs from achat() and acomplete().
prompt-service/src/unstract/prompt_service/core/retrievers/retriever_llm.py
Show resolved
Hide resolved
harini-venkataraman
left a comment
There was a problem hiding this comment.
Added some comments. Please address. Otherwise LGTM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hari-kuriakose
left a comment
There was a problem hiding this comment.
@pk-zipstack LGTM overall.
Let's address the minor comments though, from both humans and AI.
prompt-service/src/unstract/prompt_service/core/retrievers/base_retriever.py
Outdated
Show resolved
Hide resolved
prompt-service/src/unstract/prompt_service/core/retrievers/fusion.py
Outdated
Show resolved
Hide resolved
prompt-service/src/unstract/prompt_service/core/retrievers/base_retriever.py
Outdated
Show resolved
Hide resolved
…e_retriever.py Co-authored-by: Hari John Kuriakose <hari@zipstack.com> Signed-off-by: Praveen Kumar <praveen@zipstack.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@prompt-service/src/unstract/prompt_service/core/retrievers/base_retriever.py`:
- Around line 28-41: The constructor is using an undefined _get_llm and never
initializes the cache causing NameError/AttributeError and potential
double-wrapping of LLMs; fix by (1) replacing the call to _get_llm with the
correct factory (e.g., get_llm) when assigning self.llm (or import/define
_get_llm if that was intended), (2) initializing self._llama_index_llm = None in
__init__, and (3) adjust the llama_index_llm property to avoid double-bridging
by checking types: if self._llama_index_llm is None then if isinstance(self.llm,
RetrieverLLM) set self._llama_index_llm = get_llama_index_llm(self.llm) (or use
the existing RetrieverLLM directly) otherwise convert once with
get_llama_index_llm(self.llm), then return self._llama_index_llm.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
prompt-service/src/unstract/prompt_service/core/retrievers/base_retriever.py
prompt-service/src/unstract/prompt_service/core/retrievers/base_retriever.py
Outdated
Show resolved
Hide resolved
- Remove llm_helper.py — move conversion to _get_llm staticmethod in BaseRetriever - Convert SDK1 LLM to RetrieverLLM eagerly in constructor - Remove llama_index_llm lazy property - Revert all retrievers to use self.llm directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
prompt-service/src/unstract/prompt_service/core/retrievers/subquestion.py (1)
43-50: Fail fast whenself.llmis missing before building subquestion components.Add an explicit guard before constructing
question_genand query engine so error handling is deterministic and immediate.Suggested patch
query_bundle = QueryBundle(query_str=self.prompt) + if self.llm is None: + raise RetrievalError( + "SubquestionRetriever requires an initialized LLM instance." + ) + question_gen = LLMQuestionGenerator.from_defaults( llm=self.llm, ) query_engine = SubQuestionQueryEngine.from_defaults( query_engine_tools=query_engine_tools, question_gen=question_gen, use_async=True, llm=self.llm, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prompt-service/src/unstract/prompt_service/core/retrievers/subquestion.py` around lines 43 - 50, Before calling LLMQuestionGenerator.from_defaults and SubQuestionQueryEngine.from_defaults, add an explicit guard that checks self.llm and raises a clear exception (e.g., ValueError) if it is None or falsy so construction fails fast; update the code path where question_gen and query_engine are created (LLMQuestionGenerator.from_defaults, SubQuestionQueryEngine.from_defaults) to perform this check first and return/raise immediately with a descriptive message referencing self.llm.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@prompt-service/src/unstract/prompt_service/core/retrievers/subquestion.py`:
- Around line 43-50: Before calling LLMQuestionGenerator.from_defaults and
SubQuestionQueryEngine.from_defaults, add an explicit guard that checks self.llm
and raises a clear exception (e.g., ValueError) if it is None or falsy so
construction fails fast; update the code path where question_gen and
query_engine are created (LLMQuestionGenerator.from_defaults,
SubQuestionQueryEngine.from_defaults) to perform this check first and
return/raise immediately with a descriptive message referencing self.llm.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
prompt-service/src/unstract/prompt_service/core/retrievers/base_retriever.pyprompt-service/src/unstract/prompt_service/core/retrievers/fusion.pyprompt-service/src/unstract/prompt_service/core/retrievers/keyword_table.pyprompt-service/src/unstract/prompt_service/core/retrievers/subquestion.py
✅ Files skipped from review due to trivial changes (2)
- prompt-service/src/unstract/prompt_service/core/retrievers/keyword_table.py
- prompt-service/src/unstract/prompt_service/core/retrievers/fusion.py
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
LLMCompatclass in SDK1 that emulates the llama-index LLM interface without requiring llama-index as a dependencyRetrieverLLMclass in prompt-service that inherits from llama-index'sLLMbase class and delegates calls toLLMCompatRetrieverLLMwhen passing LLM to llama-index componentsSubQuestionQueryEnginefailing due to missingllama-index-question-gen-openaipackageWhy
unstract.sdk1.llm.LLMuseslitellm.completion()directly and does not inherit fromllama_index.core.llms.llm.LLM'LLM' object has no attribute 'predict'errors at runtime for all retrieval strategies except Simpleresolve_llm()assertsisinstance(llm, LLM)which fails for plain classes — breaking Fusion, Router, and Subquestion retrieversHow
Two-layer architecture:
LLMCompat(unstract/sdk1/src/unstract/sdk1/llm.py): A plain Python class (no llama-index dependency) that emulates the llama-index LLM interface. Uses local dataclass emulations of llama-index types (ChatMessage,ChatResponse,CompletionResponse,LLMMetadata,MessageRole). Follows the existingEmbeddingCompatinitialization pattern — takes raw adapter params and creates an SDK1LLMinternally. Implementschat(),complete(),predict(),achat(),acomplete(),apredict().RetrieverLLM(prompt-service/.../core/retrievers/retriever_llm.py): Inherits fromllama_index.core.llms.llm.LLM(passingresolve_llm()isinstance checks) and delegates all LLM calls to an internalLLMCompatinstance. Converts emulated return types back to real llama-index types.Supporting changes:
llm_helper.py: Helper to convert SDK1LLM→RetrieverLLMinstancebase_retriever.py: Addedllama_index_llmproperty that lazily creates and caches theRetrieverLLMwrapperself.llmwithself.llama_index_llmwhere passed to llama-index components (keyword_table, subquestion, fusion, router)LLMQuestionGeneratortoSubQuestionQueryEngine.from_defaults()to avoid import ofllama-index-question-gen-openaiCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
LLMCompatandRetrieverLLMare only used when retrievers pass LLM to llama-index components. The Simple, Automerging, and Recursive retrievers don't pass LLM to llama-index and remain unchanged. The bridge delegates all calls to the existing SDK1 LLM, so no behavior changes.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
LLMCompatuses emulated types (no llama-index imports in SDK1).RetrieverLLMuses llama-index core classes already present in the prompt-service.Notes on Testing
chunk_size > 0and run extraction — should no longer error with'LLM' object has no attribute 'predict'Screenshots
Checklist
I have read and understood the Contribution Guidelines.