Skip to content

Comments

Fix/gemini json parsing 1886#1935

Closed
Aparnap2 wants to merge 3 commits intolingodotdev:mainfrom
Aparnap2:fix/gemini-json-parsing-1886
Closed

Fix/gemini json parsing 1886#1935
Aparnap2 wants to merge 3 commits intolingodotdev:mainfrom
Aparnap2:fix/gemini-json-parsing-1886

Conversation

@Aparnap2
Copy link

@Aparnap2 Aparnap2 commented Jan 29, 2026

Problem

Closes #1886

Gemini 2.5 Flash Lite and potentially other "chatty" LLMs prefix JSON responses with conversational text

(e.g., "OK", "Sure, here is..."), causing JSON.parse() to fail with:

Unexpected token 'O', "OK{"source"... is not valid JSON

Solution

Implemented robust JSON extraction as discussed with @maxprilutskiy:

  1. Structural Extraction: Use indexOf('{') and lastIndexOf('}') to locate JSON boundaries
  2. Native Parsing: Attempt standard JSON.parse() first (fast path)
  3. Repair Fallback: Use jsonrepair for malformed JSON (existing behavior preserved)
  4. Better Errors: Include first 200 chars of raw response in error messages

Changes

  • Modified packages/cli/src/cli/localizer/explicit.ts - localize() function
  • Added packages/cli/src/cli/localizer/explicit.spec.ts - Unit tests for fix (6 tests)
  • Added packages/cli/src/cli/localizer/explicit.e2e.spec.ts - Integration test with real Gemini API
  • Created changeset for patch release

Testing

  • Builds successfully (pnpm build)
  • All existing tests pass (pnpm test - 723 tests)
  • New unit tests added and passing (6 tests)

Unit Tests (6 tests - all passing)

  • OK prefix extraction: OK{"data":...} ✓

  • Conversational prefix: Sure, here's...{"data":...} ✓

  • Clean JSON: {"data":...} ✓

  • jsonrepair fallback ✓

  • Text after JSON ✓

  • Multiline JSON with wrapper ✓

  • Manually tested with Gemini 2.5 Flash Lite (API key provided):

✅ Gemini 2.5 Flash Lite test passed!

greeting: ¡Hola, mundo!
farewell: ¡Adiós, amigo!

  • Changeset created (pnpm new)

Why This Approach

  • Deterministic: No hardcoded prefix lists (as requested by @maxprilutskiy)
  • Robust: Handles any conversational wrapper around JSON
  • Backward Compatible: Doesn't affect well-behaved models
  • Model Agnostic: Works with any LLM provider
  • Maintainable: Simple logic, easy to debug

cc @maxprilutskiy @vrcprl @PatrickHuiskens

Summary by CodeRabbit

  • Bug Fixes

    • More robust JSON parsing for chatty LLM responses (handles conversational wrappers and malformed output, including Gemini 2.5 Flash Lite).
  • Tests

    • Added unit and end-to-end tests validating resilient JSON extraction and explicit localization flows, including optional real API integration.
  • Chores

    • Added a changeset documenting a patch release.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Extracts JSON from chatty LLM responses by taking the substring between the first { and last }, attempting JSON.parse, falling back to jsonrepair on failure, and surfacing detailed errors; adds unit and conditional e2e tests and a changeset for a patch release.

Changes

Cohort / File(s) Summary
Changelog
.changeset/heavy-queens-share.md
Adds a changeset documenting a patch release for @lingo.dev/cli describing the JSON parsing fix for chatty LLM responses (Gemini 2.5 Flash Lite).
Implementation
packages/cli/src/cli/localizer/explicit.ts
Replace naive JSON.parse(response.text) with: extract substring between first { and last }, attempt JSON.parse, on failure call jsonrepair and re-parse, otherwise throw an error containing a response snippet.
Unit Tests
packages/cli/src/cli/localizer/explicit.spec.ts
Adds tests covering prefixed/conversational/clean/extraneous/multiline and malformed LLM responses and validates fallback to jsonrepair and parsing behavior.
E2E Test
packages/cli/src/cli/localizer/explicit.e2e.spec.ts
Adds a conditional end-to-end test for Gemini 2.5 Flash Lite that runs only when GOOGLE_API_KEY is set; verifies translations (en → es) against the real API with extended timeout.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant LLM as Gemini LLM
    participant Localizer as Explicit Localizer
    participant Parser as JSON Parser
    participant Repair as jsonrepair
    participant Result as Result Handler

    Client->>LLM: Request translation
    LLM-->>Localizer: Responds (conversational text + JSON)
    Localizer->>Parser: Extract substring between first "{" and last "}"
    Parser->>Parser: Attempt JSON.parse(extracted)
    alt Parse Success
        Parser->>Result: Return parsed JSON object
    else Parse Failure
        Parser->>Repair: Call jsonrepair(extracted)
        Repair->>Parser: Return repaired text
        Parser->>Parser: Attempt JSON.parse(repaired)
        alt Repair Success
            Parser->>Result: Return parsed JSON object
        else Repair Failure
            Parser->>Result: Throw error with response snippet
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

  • vrcprl

Poem

🐰 A Gemini chatter came in quite brisk,
"OK{"json"...}" — what a quirky risk.
I hop to the first brace, then slice to the last,
mend broken bits with jsonrepair fast,
now translations arrive tidy and brisk. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix/gemini json parsing 1886' is concise but somewhat generic; while it references the issue number, the phrase 'json parsing' doesn't clearly convey the specific fix of handling chatty LLM responses that prepend conversational text to JSON. Consider revising the title to be more specific about the fix, such as 'Handle conversational text in Gemini JSON responses' or 'Robust JSON extraction for chatty LLM outputs'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description includes problem statement, solution approach, changes, and testing results with unit and integration tests documented; however, the Testing section has incomplete checkboxes and lacks clarity on whether all required business logic tests are properly documented.
Linked Issues check ✅ Passed The PR fully addresses issue #1886 by implementing robust JSON extraction logic that handles chatty LLM responses, remaining model-agnostic and backward compatible, with comprehensive unit and integration testing.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #1886: JSON parsing improvements in the localizer, relevant test additions (unit and e2e), and a changeset for version management; no out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@vrcprl
Copy link
Contributor

vrcprl commented Feb 2, 2026

@Aparnap2 - pls review failed workflow

@Aparnap2 Aparnap2 force-pushed the fix/gemini-json-parsing-1886 branch from c470823 to 912637d Compare February 3, 2026 06:57
- Extract JSON using indexOf/lastIndexOf for structural boundaries
- Preserve jsonrepair fallback for malformed JSON internals
- Add comprehensive error messages with raw response preview
- Add unit tests for chatty response scenarios
- Fixes Gemini 2.5 Flash Lite 'OK' prefix parsing error

Fixes lingodotdev#1886
- Extract JSON using indexOf/lastIndexOf for structural boundaries
- Preserve jsonrepair fallback for malformed JSON internals
- Add comprehensive error messages with raw response preview
- Add unit tests for chatty response scenarios
- Add integration test for Gemini 2.5 Flash Lite real API
- Fixes Gemini 2.5 Flash Lite 'OK' prefix parsing error

Fixes lingodotdev#1886
@Aparnap2 Aparnap2 force-pushed the fix/gemini-json-parsing-1886 branch from 912637d to b59267e Compare February 3, 2026 07:03
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Thank you for your contribution! However, this PR references issue #1886 where you're not currently assigned. To contribute, please either get assigned to the issue first or find an unassigned issue to work on. This helps us coordinate contributions effectively.

@github-actions github-actions bot closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When using Gemini the translation does not work

2 participants