Skip to content

feat(core): tolerant JSON parsing for LLM output (repair_json, loads_robust)#3120

Open
Bartok9 wants to merge 1 commit into
eosphoros-ai:mainfrom
Bartok9:feat/json-repair-robust-loads
Open

feat(core): tolerant JSON parsing for LLM output (repair_json, loads_robust)#3120
Bartok9 wants to merge 1 commit into
eosphoros-ai:mainfrom
Bartok9:feat/json-repair-robust-loads

Conversation

@Bartok9

@Bartok9 Bartok9 commented Jun 29, 2026

Copy link
Copy Markdown

Problem

LLM responses very often emit JSON that is almost valid but breaks json.loads:

  • wrapped in a Markdown code fence (```json … ```),
  • containing // line comments or /* … */ block comments,
  • containing trailing commas before } or ].

These defects are a recurring source of parse failures in the agent / output-parsing
paths. Today callers must hand-roll cleanup (and the existing _format_json_str
only handles newlines/tabs).

Solution

Two small, stdlib-only helpers added to dbgpt.util.json_utils:

  • repair_json(text) — best-effort cleanup that, in order, strips a surrounding
    code fence, removes // and /* */ comments, and removes trailing commas before
    }/]. All transformations preserve the contents of double-quoted strings
    (escape-aware), so commas/slashes inside string values are never touched.
  • loads_robust(text, **kwargs) — tries strict json.loads first and only falls
    back to parsing repair_json(text) if that raises JSONDecodeError. Well-formed
    input parses identically to json.loads; **kwargs are forwarded.

Tests

Added 12 tests in packages/dbgpt-core/src/dbgpt/util/tests/test_json_utils.py
covering trailing commas (object/array), fenced blocks (with/without language tag),
line/block comments, a combined case, string-content preservation, valid-JSON
passthrough, and the unrepairable-input raise path.

$ python -m pytest dbgpt/util/tests/test_json_utils.py -q
17 passed in 0.15s

Compatibility

  • No breaking changes — purely additive; existing functions untouched.
  • Pure standard library (no new dependencies).
  • loads_robust only diverges from json.loads when strict parsing already fails.

…ads_robust)

LLM responses frequently emit JSON wrapped in Markdown code fences, with
// or /* */ comments, or with trailing commas before } or ] - all of which
break json.loads. Add two stdlib-only helpers in dbgpt.util.json_utils:

- repair_json(text): strips a surrounding code fence, removes comments, and
  removes trailing commas, all while preserving double-quoted string contents.
- loads_robust(text, **kwargs): json.loads first, then retries on the repaired
  text only if strict parsing fails - so well-formed input is unaffected.

Covered by 12 new parametrized/unit tests. No breaking changes.
@github-actions github-actions Bot added core Module: core enhancement New feature or request labels Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Module: core enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant