feat(core): tolerant JSON parsing for LLM output (repair_json, loads_robust) by Bartok9 · Pull Request #3120 · eosphoros-ai/DB-GPT

Bartok9 · 2026-06-29T04:27:31Z

Problem

LLM responses very often emit JSON that is almost valid but breaks json.loads:

wrapped in a Markdown code fence (```json … ```),
containing // line comments or /* … */ block comments,
containing trailing commas before } or ].

These defects are a recurring source of parse failures in the agent / output-parsing
paths. Today callers must hand-roll cleanup (and the existing _format_json_str
only handles newlines/tabs).

Solution

Two small, stdlib-only helpers added to dbgpt.util.json_utils:

repair_json(text) — best-effort cleanup that, in order, strips a surrounding
code fence, removes // and /* */ comments, and removes trailing commas before
}/]. All transformations preserve the contents of double-quoted strings
(escape-aware), so commas/slashes inside string values are never touched.
loads_robust(text, **kwargs) — tries strict json.loads first and only falls
back to parsing repair_json(text) if that raises JSONDecodeError. Well-formed
input parses identically to json.loads; **kwargs are forwarded.

Tests

Added 12 tests in packages/dbgpt-core/src/dbgpt/util/tests/test_json_utils.py
covering trailing commas (object/array), fenced blocks (with/without language tag),
line/block comments, a combined case, string-content preservation, valid-JSON
passthrough, and the unrepairable-input raise path.

$ python -m pytest dbgpt/util/tests/test_json_utils.py -q
17 passed in 0.15s

Compatibility

No breaking changes — purely additive; existing functions untouched.
Pure standard library (no new dependencies).
loads_robust only diverges from json.loads when strict parsing already fails.

…ads_robust) LLM responses frequently emit JSON wrapped in Markdown code fences, with // or /* */ comments, or with trailing commas before } or ] - all of which break json.loads. Add two stdlib-only helpers in dbgpt.util.json_utils: - repair_json(text): strips a surrounding code fence, removes comments, and removes trailing commas, all while preserving double-quoted string contents. - loads_robust(text, **kwargs): json.loads first, then retries on the repaired text only if strict parsing fails - so well-formed input is unaffected. Covered by 12 new parametrized/unit tests. No breaking changes.

github-actions Bot added core Module: core enhancement New feature or request labels Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): tolerant JSON parsing for LLM output (repair_json, loads_robust)#3120

feat(core): tolerant JSON parsing for LLM output (repair_json, loads_robust)#3120
Bartok9 wants to merge 1 commit into
eosphoros-ai:mainfrom
Bartok9:feat/json-repair-robust-loads

Bartok9 commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Bartok9 commented Jun 29, 2026

Problem

Solution

Tests

Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant