Conversation
Adds .cs, .csproj, .sln, .razor, and .cshtml so C#/.NET projects are indexed by the project miner. .razor/.cshtml are analogous to the already-supported .jsx/.tsx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Updated READABLE_EXTENSIONS in miner.py to include ".swift", ".kt", and ".kts". - Added tests in test_miner.py to ensure scanning includes Swift and Kotlin files.
Add _try_pi_jsonl parser for Pi agent session files stored at
~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonl.
Uses type "message" entries with role "user"/"assistant". Skips
toolResult messages, model_change, thinking_level_change, and other
operational events. Requires session header (type "session" with
"version" key) to avoid false positives.
Format documented at github.com/badlogic/pi-mono session.md and
verified via Context7. Sample data provided by tunnckoCore in #59.
Refs: #59
Adds _try_gemini_json parser to normalize.py for three layouts:
1. Gemini API contents format (~/.gemini/sessions/*.json):
{"contents": [{"role": "user", "parts": [{"text": "..."}]}, ...]}
2. Messages-wrapper variant:
{"messages": [{"role": "user", ...}, {"role": "model", ...}]}
3. Flat top-level list with role="model".
This complements the existing _try_gemini_jsonl parser (which handles
~/.gemini/tmp/<hash>/chats/session-*.jsonl with session_metadata
sentinel) — JSONL covers Gemini CLI runtime sessions, JSON covers
exported / Studio-saved transcripts.
## Review feedback addressed (PR #204)
bgauryy review:
- #1 Parser-precedence bug: _try_gemini_json runs *before*
_try_claude_ai_json so the {"messages":[..., role=model, ...]}
layout is no longer silently claimed by the Claude parser. The
Gemini parser's has_model_role guard prevents false-positives
against Claude / ChatGPT data.
- #2 Layout 2a coverage: TestGeminiJson.test_messages_wrapper_format
+ test_messages_wrapper_does_not_get_claimed_by_claude pin the
fix in place.
- #3 Test conflicts with current main: rebased onto develop;
tests restructured into TestGeminiJson class.
- #4 tempfile/os.unlink → pytest tmp_path everywhere.
- #5 elif not text → else (the elif branch was dead).
- #6 Module docstring updated to mention Google AI Studio.
Tests: 9 new cases in TestGeminiJson covering all three layouts,
multi-part text joining, non-text part skipping, has_model_role
disambiguation, dispatch-chain regression for review #1.
Add _try_continue_json() normalizer for Continue.dev AI assistant sessions (~/.continue/sessions/*.json). Parses history array with role/content pairs, handles tool calls, system messages, and metadata. Closes #59 (partial — adds Continue.dev format support) Includes comprehensive test coverage for valid sessions, edge cases, malformed input, and unicode content.
Fixes lint CI: ruff format --check flagged blank-line and long-dict wrapping in the Continue.dev parser tests.
Mirrors the portable fake-client arms of test_pgvector_backend.py
against a real PostgreSQL+pgvector server and adds live-only arms the
in-memory fake cannot exercise: real <=> operator ground truth, JSONB
pushdown vs local-fallback equivalence, cross-namespace isolation on
real tables, 8-connection concurrent writers, and the advisory-lock
serialization of run_maintenance('reindex') under a 2-connection race.
Gated on MEMPALACE_PGVECTOR_LIVE_DSN (same pattern as the qdrant live
gate); skips cleanly when unset. First run: 15/15 pass on PostgreSQL
16.10 + pgvector 0.8.2 (+AGE 1.6.0 in the same server), psycopg 3.3.4.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…actly-one-ran asserts
- Stub _write_marker on the 8 concurrent writer backends: upsert()
rewrites the marker on every call with a plain open('w'), so backends
sharing one local_path race on the same file (sharing violations on
Windows) — a test-design artifact, not the contract under test
- Guard the fixture's created list with a lock for the threaded tests
- Assert exact distance-ordered ids in the query/filter arms
- Reindex race: exactly one 'ran' (index absent beforehand, so the
advisory-lock winner must build)
Re-run live after changes: 15/15 pass (PG 16.10, pgvector 0.8.2).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](docker/metadata-action@v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](docker/login-action@v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
…arning
`tests/test_fact_checker.py` imports symbols from `mempalace.fact_checker` at
module top (putting it in sys.modules), then `TestCLI.test_exits_nonzero_when_
issues_found` re-executed the same module as __main__ via
`runpy.run_module("mempalace.fact_checker", run_name="__main__")`. runpy warns
because it re-runs an already-imported module against a half-initialized state:
RuntimeWarning: 'mempalace.fact_checker' found in sys.modules after import of
package 'mempalace', but prior to execution of 'mempalace.fact_checker'
Run the CLI in a fresh process via `subprocess.run([sys.executable, "-m",
"mempalace.fact_checker", ...])` instead — no sys.modules collision, and it
exercises the real `python -m` entry point. Assertions are preserved
(SystemExit code 1 → returncode 1; captured stdout substring → result.stdout).
The child's entity registry (`~/.mempalace/known_entities.json`, resolved via
expanduser at import) is redirected by overriding both HOME and USERPROFILE in
the subprocess env so it works on POSIX and Windows.
Verified: `pytest tests/test_fact_checker.py -W error::RuntimeWarning` passes
(26) with the warning promoted to error — proving it no longer fires.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds end-to-end regression coverage for the migration swap path where os.replace hits EXDEV, the shutil.move fallback fails, and the original palace must be restored from the rename-aside copy.
Adds an MCP tool to remove every drawer mined from a given source_file
exact match, for cleaning up benchmark/test data accidentally mined into
a user wing (ShareGPT dumps, results_mempal_*.jsonl, language config
JSON) that drowns out real memories in semantic search.
Matching is pushed to the backend via delete(where={"source_file": ...})
the same idiom the miner and diary-ingest paths already use so it is not
subject to the SQLite variable limit regardless of how many drawers share
the source. Defaults to a dry run reporting match count and a sample;
dry_run=false commits. Absent source is an idempotent no-op, not an error.
…e guard Address Gemini review on #1729: - normalize source_file with strip_lone_surrogates so exact matching hits rows mined from non-ASCII paths via cp1252 stdin (#1488), mirroring tool_add_drawer's ingestion-side normalization - isinstance(str) guard so a non-string source_file returns a clean error instead of AttributeError - default missing wing/room to "" in the dry-run sample, consistent with the rest of the file - add tests: non-string rejection + surrogate-normalization match
…ionable assertion failure message if PreCompact is ever missing
feat(miner): add support for Swift and Kotlin file extensions
fix: detect Java project manifests
feat(normalize): add Continue.dev session parser
# Conflicts: # mempalace/normalize.py
feat: add Gemini CLI / AI Studio session import support
…scroll fix(backends): single-scroll bulk metadata fetch for Qdrant; bump scroll page size (#1796)
fix(mcp): refuse second writer for same palace
feat(search): add an optional source_file filter to mempalace_search (#1815)
…-tool feat: add mempalace_checkpoint batch save tool
fix(claude-plugin): run final mine on SessionEnd
feat(mcp): add mempalace_delete_by_source bulk-cleanup tool (#1722)
…, real tests The opt-in HTTP transport reuses the stdio dispatcher and binds loopback by default, but /mcp was unauthenticated with no protection against a malicious web page reaching a DNS-rebound localhost server, and its tests reached for Starlette/uvicorn (not project deps) so they were silently skipped in CI — the production _serve_http handler had zero coverage. Hardening: - Pin the Host header to loopback literals + the bound host on a loopback bind (DNS-rebinding defense); relaxed for a deliberately non-loopback bind, which is the operator's call and may sit behind a Host-rewriting proxy. - Reject any browser Origin that isn't a loopback origin (rebinding/SSRF guard); non-browser MCP clients omit Origin and are unaffected. - Optional bearer token via MEMPALACE_MCP_HTTP_TOKEN (constant-time compare); required on /mcp, never on /healthz so liveness probes work credential-free. - Warn loudly when bound to a non-loopback host (palace reachable from network). Testability: - Split _build_http_server() out of _serve_http() so tests bind 127.0.0.1:0 and drive the real handler over a loopback socket via stdlib http.client. - Replace the skipped Starlette reimplementation with 12 tests covering dispatch, initialize, /healthz, 404, parse-error, the 16 MiB cap, notification 202, and the Host/Origin/token rejections — no third-party deps.
feat(mcp): add opt-in HTTP transport
ChromaDB's rust HNSW core intermittently fails compaction on Windows with "Failed to apply logs to the hnsw segment writer" during add/update — a long-standing, non-reproducible-on-Linux/macOS flake that hits different tests (test_migrate_wings, test_closets) across unrelated commits and has been turning otherwise-green release/CI runs red at random. Add pytest-rerunfailures and wire `--reruns 2 --only-rerun "Failed to apply logs to the hnsw segment writer"` into the test-windows job only. The --only-rerun scope means a real, deterministic failure still fails on the first run; only this specific transient native-dependency error is retried. The Linux and macOS jobs deliberately keep zero reruns so genuine regressions surface there loudly.
…ries ci(test-windows): retry the transient ChromaDB HNSW compaction flake
chore(release): 3.5.0
There was a problem hiding this comment.
Code Review
This pull request introduces version 3.5.0 of MemPalace, featuring an opt-in local daemon for queued writes, an opt-in HTTP transport for the MCP server, and new MCP tools like mempalace_checkpoint and mempalace_delete_by_source. It also adds a source_file filter for searches, a final mine on Claude plugin session end, performance optimizations using SQLite aggregates, and expanded language support. Feedback on these changes highlights two issues: first, _purge_source_closets incorrectly calls .get("ids") on a GetResult object, which will cause an AttributeError on modern backends; second, the hardcoded _CLI_MAX_CHUNKS_PER_FILE_DEFAULT ignores the MEMPALACE_MAX_CHUNKS_PER_FILE environment variable.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if closets_col is None: | ||
| return 0 | ||
| try: | ||
| ids = closets_col.get(where={"source_file": source_file}, include=[]).get("ids") or [] |
There was a problem hiding this comment.
The function _purge_source_closets calls .get("ids") directly on the result of closets_col.get(...). However, BaseCollection.get() returns a GetResult object (which is a dataclass/namedtuple) rather than a dict on modern backends (such as sqlite_exact, pgvector, and qdrant). This will raise an AttributeError when trying to delete by source on any of these backends, silently failing the closet purge because the exception is caught and returns 0 closets deleted.
| ids = closets_col.get(where={"source_file": source_file}, include=[]).get("ids") or [] | |
| batch = closets_col.get(where={"source_file": source_file}, include=[]) | |
| ids = batch.ids if hasattr(batch, "ids") else (batch.get("ids") or []) |
| # This mirrors miner.MAX_CHUNKS_PER_FILE without importing miner here; | ||
| # importing miner pulls in Chroma dependencies before argparse can handle | ||
| # lightweight exits such as --version. | ||
| _CLI_MAX_CHUNKS_PER_FILE_DEFAULT = 50_000 |
There was a problem hiding this comment.
The constant _CLI_MAX_CHUNKS_PER_FILE_DEFAULT is hardcoded to 50_000. However, the help text for --max-chunks-per-file (line 1675) states that the default is _CLI_MAX_CHUNKS_PER_FILE_DEFAULT (or MEMPALACE_MAX_CHUNKS_PER_FILE). Because argparse uses the hardcoded default value, any environment variable MEMPALACE_MAX_CHUNKS_PER_FILE is completely ignored when running via the CLI. It should be initialized from the environment variable. When parsing this environment variable, ensure to return a safe default (such as 0 to disable) if the value is unparseable, rather than falling back to a default value.
def _get_max_chunks_default():
val = os.environ.get("MEMPALACE_MAX_CHUNKS_PER_FILE")
if not val:
return 50000
try:
return int(val)
except ValueError:
return 0
_CLI_MAX_CHUNKS_PER_FILE_DEFAULT = _get_max_chunks_default()References
- When parsing environment variables for configuration, return a safe default (e.g., 0.0 to disable) if the value is unparseable, rather than falling back to a default value.
Promote
develop→mainfor the 3.5.0 releaseReleases publish only from
main(per docs/RELEASING.md). This promotes the 3.5.0 bump (merged via #1853) plus alldevelopwork accumulated since v3.4.1 — 33 PRs.Version
All six sources at 3.5.0 (
version-guard.ymlgreen on the bump commit). Tool count refreshed 34 → 35 (delete_by_source+checkpoint).Headline changes since v3.4.1
Features
MEMPALACE_MCP_HTTP_TOKEN(Long-lived stdio server stops answering tools/list after extended uptime (small frames still work) #1801, feat(mcp): add opt-in HTTP transport #1806)mempalace_checkpointbatch-save tool (feat: add mempalace_checkpoint batch save tool #1851)mempalace_delete_by_sourcebulk cleanup — dry-run-by-default, purges drawers and their AAAK index entries (Bug: Benchmark/test data loaded into user wing pollutes semantic search #1722, feat(mcp): add mempalace_delete_by_source bulk-cleanup tool (#1722) #1729)source_filefilter formempalace_search(Expose source_file filtering in mempalace_search #1815, feat(search): add an optional source_file filter to mempalace_search (#1815) #1817)SessionEnd(Add a SessionEnd hook for a deterministic mine on clean exit (plugin only wires Stop + PreCompact) #1814, fix(claude-plugin): run final mine on SessionEnd #1820)Performance
graph_statsanswered from the SQLite aggregate to kill large-palace timeouts (mempalace_status times out on large palaces under Claude Desktop — full-metadata pagination instead of the existing sqlite aggregation path #1748, MCP overview tools time out on large palaces #1379); embedder ORT thread cap (Backgroundmempalace minepins 400–500 % CPU — ORT intra_op pool ignores OMP env vars #1068); SQL-pushdown pagination for sqlite_exact/pgvector + single-scroll Qdrant metadata (fix(backends): push sqlite_exact get(limit, offset) pagination into SQL (#1841) #1842, fix(pgvector): push get(limit, offset) pagination into SQL (#1830) #1840, fix(backends): single-scroll bulk metadata fetch for Qdrant; bump scroll page size (#1796) #1832)Bug fixes
repair --mode from-sqlitenot a data-losing re-mine (repair (legacy) and MCP server cannot recover from a ChromaDB compactor failure, though --mode from-sqlite can #1843, fix(repair): point index-read failures to repair --mode from-sqlite (#1843) #1847, fix: point diverged-index recovery at from-sqlite, not re-mine (#1843) #1849); MCP refuses a second writer (fix(mcp): refuse second writer for same palace #1823); Windows hook miner usesCREATE_NO_WINDOW(fix: use CREATE_NO_WINDOW so Windows hook miner spawns don't flash a console (#1783) #1848)CI
test-windowsnow retries only the transient ChromaDB HNSW compaction flake (--only-rerun); Linux/macOS keep zero reruns so real regressions stay loud (ci(test-windows): retry the transient ChromaDB HNSW compaction flake #1854)Full detail in CHANGELOG.md under
## [3.5.0].After merge
Draft a GitHub Release targeting
main, tagv3.5.0→ triggerspublish.yml(PyPI Trusted Publishing, manual approval on thepypienvironment).