Skip to content

fix(repair): point index-read failures to repair --mode from-sqlite (#1843)#1847

Open
mvalentsev wants to merge 1 commit into
MemPalace:developfrom
mvalentsev:fix/1843-repair-from-sqlite-guidance
Open

fix(repair): point index-read failures to repair --mode from-sqlite (#1843)#1847
mvalentsev wants to merge 1 commit into
MemPalace:developfrom
mvalentsev:fix/1843-repair-from-sqlite-guidance

Conversation

@mvalentsev

Copy link
Copy Markdown
Contributor

Addresses #1843.

What does this PR do?

When the chromadb compactor cannot apply the WAL into the drawers HNSW
segment (InternalError: Failed to apply logs to the hnsw segment writer),
mempalace repair (default legacy mode) and the library rebuild_index
both fail on their first read (Collection.count()) and then print:

Cannot recover — palace may need to be re-mined from source files.

That advice is harmful: the drawer rows are intact in chroma.sqlite3, and
repair --mode from-sqlite rebuilds the index from them. Re-mining instead
silently drops drawers added through the MCP server and diary entries, which
have no source file to mine. That is the data loss #1843 reports.

Both legacy read-failure sites now emit shared guidance pointing at the
working recovery:

If a MemPalace server or mine is still running against this palace,
stop it and retry. Otherwise the drawer index is likely corrupt
(for example a failed chromadb HNSW compaction) while your drawer
rows remain in chroma.sqlite3. Rebuild the index from SQLite rather
than re-mining:

    mempalace repair --mode from-sqlite --archive-existing

(Re-mining from source files would drop drawers added via the MCP
server and diary entries, which have no source file.)

The message is worded conditionally (it also covers a live server/mine still
holding the palace open) because the except Exception cannot prove which
case it caught.

Scope

This is the "at minimum, change the message" part of #1843's first request.
The larger asks (auto-fallback to from-sqlite, MCP reconnect resilience,
decoupling the durable SQLite write from HNSW indexing in add_drawer) are
out of scope here; the compaction failure itself lives in chromadb's core.

Two other repair.py re-mine messages are left as-is on purpose:
check_extraction_safety (a row-count truncation abort, not an index read)
and the RebuildCollectionError no-backup-after-live-swap path (where the
collection is genuinely gone and re-mining is a legitimate last resort).

How to test

uv run pytest tests/test_cli.py tests/test_repair.py -q

New regression tests:

  • test_cmd_repair_error_reading_points_to_from_sqlite_not_remine
  • test_rebuild_index_read_failure_points_to_from_sqlite
  • test_index_read_recovery_guidance_recommends_from_sqlite

Checklist

  • Tests pass (python -m pytest tests/ -v)
  • No hardcoded paths
  • Linter passes (ruff check .)

…emPalace#1843)

When the chromadb compactor cannot apply the WAL into the drawers HNSW
segment (InternalError: Failed to apply logs to the hnsw segment writer),
the legacy repair paths fail on their first Collection.count() read and
advise re-mining from source files. The drawer rows are intact in
chroma.sqlite3, so repair --mode from-sqlite rebuilds them; re-mining
silently drops drawers added via the MCP server and diary entries that
have no source file.

Both legacy read-failure sites (cmd_repair and rebuild_index) now emit
shared guidance pointing at the from-sqlite recovery, worded conditionally
so it also covers a live server or mine still holding the palace open.

Co-Authored-By: undeadindustries <9536461+undeadindustries@users.noreply.github.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the recovery guidance provided to users when a drawer-index read fails (e.g., due to a chromadb HNSW compactor failure). Instead of advising users to re-mine from source files—which can cause data loss for drawers added via the MCP server or diary entries—the CLI and index rebuilder now recommend rebuilding the index from SQLite using mempalace repair --mode from-sqlite --archive-existing. Unit tests have been added to verify this behavior. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant