-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Port/rust #1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Port/rust #1846
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
| find specs -name '*.md' | while read -r f; do | ||
| n="$(grep -coE '\([^)]+:L[0-9]+' "$f" || true)" | ||
| printf '%4s %s\n' "${n:-0}" "$f" | ||
| done | sort -n | tee /tmp/spec-citation-report.txt | ||
| echo "--- coverage ---" | ||
| echo "source modules: $(git ls-files 'mempalace/*.py' | wc -l) source specs: $(find specs/src -name '*.md' 2>/dev/null | wc -l)" | ||
| echo "test files: $(git ls-files 'tests/*.py' | wc -l) test specs: $(find specs/tests -name '*.md' 2>/dev/null | wc -l)" | ||
| echo "--- specs with ZERO citations (must be empty) ---" | ||
| awk '$1==0{print $2}' /tmp/spec-citation-report.txt | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # Behavior Specification: `mempalace/__init__.py` | ||
|
|
||
| This file is the package initialization module for MemPalace. It runs at import | ||
| time and performs environment hygiene and telemetry suppression before exposing | ||
| the package version (`mempalace/__init__.py:L1-L1`). | ||
|
|
||
| ## Public Surface | ||
|
|
||
| The package exports exactly one public symbol: `__version__`, a version string | ||
| re-exported from the package's version module (`mempalace/__init__.py:L38-L38`, | ||
| `mempalace/__init__.py:L60-L60`). No other names are part of the public API | ||
| contract; the public name list contains only `__version__` | ||
| (`mempalace/__init__.py:L60-L60`). | ||
|
|
||
| ## Import-Time Side Effects (Ordering Guarantees) | ||
|
|
||
| On import, the module executes the following steps in this exact order: | ||
|
|
||
| 1. Strips leaked interpreter search-path entries originating from the | ||
| `PYTHONPATH` environment variable (`mempalace/__init__.py:L36-L36`). | ||
| 2. Imports and binds the version string (`mempalace/__init__.py:L38-L38`). | ||
| 3. Silences a specific telemetry logger by raising its level to the most | ||
| critical threshold (`mempalace/__init__.py:L44-L44`). | ||
|
|
||
| The path-stripping step (1) must run before any further imports so that | ||
| subsequent imports resolve packages only from the environment's own | ||
| installation rather than from externally-injected paths | ||
| (`mempalace/__init__.py:L9-L24`, `mempalace/__init__.py:L36-L38`). | ||
|
|
||
| ## Behavior: PYTHONPATH Search-Path Sanitization | ||
|
|
||
| A function performs the following observable contract when the package is | ||
| imported (`mempalace/__init__.py:L8-L33`): | ||
|
|
||
| - **Input:** The current value of the `PYTHONPATH` environment variable and the | ||
| interpreter's current module search path list (`mempalace/__init__.py:L25-L25`, | ||
| `mempalace/__init__.py:L33-L33`). | ||
| - **No-op condition:** If `PYTHONPATH` is unset or empty, the function returns | ||
| immediately and the search path is left unchanged | ||
| (`mempalace/__init__.py:L25-L27`). | ||
| - **Action:** The `PYTHONPATH` value is split on the platform path separator | ||
| into individual entries; empty entries are discarded | ||
| (`mempalace/__init__.py:L32-L32`). Each search-path entry is then removed from | ||
| the interpreter search path if it matches one of those `PYTHONPATH`-derived | ||
| entries (`mempalace/__init__.py:L33-L33`). | ||
| - **Matching rule:** Comparison is performed on a normalized form of each path | ||
| that collapses case differences and path-separator/normalization quirks, so | ||
| that case-insensitive filesystems and trailing-separator differences are | ||
| treated as equal (`mempalace/__init__.py:L29-L30`, `mempalace/__init__.py:L13-L14`). | ||
| - **Preservation invariant:** The empty-string entry on the search path (the | ||
| marker representing the implicit current working directory) is always | ||
| preserved and never removed, even if `PYTHONPATH` contains a value referring | ||
| to the current directory (`mempalace/__init__.py:L15-L17`, | ||
| `mempalace/__init__.py:L33-L33`). | ||
| - **Environment invariant:** The `PYTHONPATH` environment variable itself is NOT | ||
| modified by this function. Only the in-process search path is altered. This | ||
| keeps an embedding host application's `PYTHONPATH` intact for its own | ||
| unrelated subprocesses (`mempalace/__init__.py:L19-L24`). (Entry-point | ||
| programs separately drop `PYTHONPATH` from the environment themselves; that | ||
| behavior is external to this file — `mempalace/__init__.py:L19-L22`.) | ||
|
|
||
| ## Behavior: Telemetry Logger Suppression | ||
|
|
||
| The logger named `chromadb.telemetry.product.posthog` has its level set to the | ||
| most-critical (highest) severity threshold at import time, which suppresses | ||
| noisy telemetry-related warning output on the standard error stream | ||
| (`mempalace/__init__.py:L40-L44`). | ||
|
|
||
| ## Edge Cases | ||
|
|
||
| - Empty or unset `PYTHONPATH`: search path untouched | ||
| (`mempalace/__init__.py:L26-L27`). | ||
| - `PYTHONPATH` containing only empty segments (e.g. a lone separator): those | ||
| empty segments are filtered out, so no real entries are matched, but the | ||
| current-directory marker on the search path is still preserved | ||
| (`mempalace/__init__.py:L32-L33`). | ||
| - Paths differing only by letter case or by trailing separators are still | ||
| matched and removed due to the normalization rule | ||
| (`mempalace/__init__.py:L29-L30`). | ||
|
|
||
| ## Notes for Reimplementation | ||
|
|
||
| The version string is the single externally observable output of this module; | ||
| everything else is environment/process hygiene with no return value | ||
| (`mempalace/__init__.py:L38-L38`, `mempalace/__init__.py:L60-L60`). The | ||
| search-path scrubbing is a side effect on interpreter-global state and produces | ||
| no return value (`mempalace/__init__.py:L8-L8`). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| # Behavior Spec: `mempalace/__main__.py` | ||
|
|
||
| ## Purpose | ||
|
|
||
| This file is the package execution entry point. It makes the package runnable as | ||
| an executable module (e.g. `python -m mempalace`), delegating all behavior to the | ||
| package's CLI dispatcher (mempalace/__main__.py:L1-L5). | ||
|
|
||
| ## Public Surface | ||
|
|
||
| This module exposes no functions, classes, or constants of its own. Its only | ||
| observable behavior is the side effect produced when the module is loaded/executed | ||
| as the program entry point (mempalace/__main__.py:L3-L5). | ||
|
|
||
| ## Behavior | ||
|
|
||
| On execution, the module obtains the CLI dispatcher entry point named `main` from | ||
| the package's CLI component and invokes it with no arguments | ||
| (mempalace/__main__.py:L3-L5). All command-line argument parsing, dispatch, | ||
| input/output, exit codes, and side effects are therefore defined entirely by that | ||
| CLI entry point, not by this file (mempalace/__main__.py:L3-L5). | ||
|
|
||
| The invocation occurs unconditionally at module load time — invoking the module as | ||
| the program entry point runs the CLI immediately (mempalace/__main__.py:L5). | ||
|
|
||
| ## Inputs / Outputs | ||
|
|
||
| - Inputs: none consumed directly by this module; it forwards no explicit arguments | ||
| to the CLI entry point (mempalace/__main__.py:L5). | ||
| - Outputs / exit code: this module returns or produces nothing of its own; the | ||
| process exit code and all output are determined by the delegated CLI entry point | ||
| (mempalace/__main__.py:L3-L5). | ||
|
|
||
| ## Invariants / Ordering | ||
|
|
||
| - The CLI entry point is resolved before it is called (import precedes invocation) | ||
| (mempalace/__main__.py:L3-L5). | ||
| - Exactly one CLI invocation happens per module execution | ||
| (mempalace/__main__.py:L5). | ||
|
|
||
| ## Error / Edge-Case Behavior | ||
|
|
||
| This module adds no error handling of its own. Any failure to resolve the CLI | ||
| entry point, or any error raised by it, propagates unchanged to the caller | ||
| (mempalace/__main__.py:L3-L5). | ||
|
|
||
| ## Side Effects | ||
|
|
||
| No filesystem, network, process, or environment side effects originate in this | ||
| file; all such effects are those of the delegated CLI entry point | ||
| (mempalace/__main__.py:L3-L5). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # Behavior Spec: `_stdio.py` — Stdio UTF-8 Reconfiguration Helper | ||
|
|
||
| ## Purpose | ||
|
|
||
| This module provides a single shared routine that forces the process's standard I/O streams (stdin, stdout, stderr) to use UTF-8 encoding on Windows, so that non-Latin / non-ASCII UTF-8 text is not corrupted ("mojibaked") by the platform's default ANSI codepage. On all non-Windows platforms the routine does nothing (mempalace/_stdio.py:L1-L9, mempalace/_stdio.py:L49-L50). | ||
|
|
||
| ## Public Surface | ||
|
|
||
| A single public function: | ||
|
|
||
| `reconfigure_stdio_utf8_on_windows(*, stdin_errors, stdout_errors, stderr_errors, on_failure) -> None` (mempalace/_stdio.py:L31-L37). | ||
|
|
||
| ### Parameters (all keyword-only, all optional) | ||
|
|
||
| - `stdin_errors`: string error-handling policy applied when reconfiguring stdin. Default value is `"surrogateescape"` (mempalace/_stdio.py:L33). The default ensures malformed bytes from a redirected file or misbehaving client survive as lone surrogates rather than aborting the read with a decode error (mempalace/_stdio.py:L19-L22). | ||
| - `stdout_errors`: string error-handling policy applied when reconfiguring stdout. Default value is `"strict"` (mempalace/_stdio.py:L34). | ||
| - `stderr_errors`: string error-handling policy applied when reconfiguring stderr. Default value is `"strict"` (mempalace/_stdio.py:L35). | ||
| - `on_failure`: optional callback invoked as `on_failure(stream_name, exception)` for any stream whose reconfiguration raises an error. If not provided (i.e. `None`), a default failure behavior is used instead (mempalace/_stdio.py:L36-L47). | ||
|
|
||
| ### Return value | ||
|
|
||
| Returns nothing / no value (mempalace/_stdio.py:L37-L38). | ||
|
|
||
| ## Behavior | ||
|
|
||
| ### Platform gating | ||
|
|
||
| If the current platform is not Windows (`win32`), the function returns immediately and performs no reconfiguration and no side effects (mempalace/_stdio.py:L49-L50). | ||
|
|
||
| ### Reconfiguration order and processing | ||
|
|
||
| On Windows, the function processes exactly three streams in this fixed order: stdin first, then stdout, then stderr. Each stream is paired with its caller-chosen error policy (mempalace/_stdio.py:L52-L57). | ||
|
|
||
| For each stream, in order: | ||
| 1. The stream object is looked up by name on the standard I/O namespace; if the named stream is absent it is treated as missing (mempalace/_stdio.py:L58). | ||
| 2. The stream's reconfigure capability is looked up; if the stream does not support reconfiguration, that stream is skipped entirely (no error, no callback) and processing continues to the next stream (mempalace/_stdio.py:L59-L61). | ||
| 3. Otherwise the stream is reconfigured to encoding UTF-8 using that stream's error policy (mempalace/_stdio.py:L62-L63). | ||
|
|
||
| ### Error handling per stream | ||
|
|
||
| If reconfiguring a given stream raises any exception, the failure is isolated to that stream and does not stop processing of the remaining streams (mempalace/_stdio.py:L62-L71). On such a failure: | ||
|
|
||
| - If an `on_failure` callback was supplied, it is invoked with the stream name and the raised exception (mempalace/_stdio.py:L65-L66). | ||
| - If no callback was supplied, a warning line is written to the standard error stream in the exact form `WARNING: Could not reconfigure {name} to UTF-8: {exc}`, where `{name}` is the stream name (one of `stdin`, `stdout`, `stderr`) and `{exc}` is the textual rendering of the exception (mempalace/_stdio.py:L67-L71). | ||
|
|
||
| ## Caller-policy contract (documented intent) | ||
|
|
||
| The per-stream error policy is intentionally caller-chosen so callers can align behavior across entry points (mempalace/_stdio.py:L11-L22): | ||
| - A server emitting only self-controlled JSON-RPC is expected to use `strict` on stdout/stderr so any encode failure surfaces loudly as a bug (mempalace/_stdio.py:L13-L15). | ||
| - A CLI or tool that prints verbatim text possibly containing round-tripped surrogate halves is expected to use `replace` on stdout/stderr to avoid crashing mid-print (mempalace/_stdio.py:L16-L18). | ||
| - All callers are expected to use `surrogateescape` on stdin so a single malformed byte does not kill the read loop (mempalace/_stdio.py:L19-L22). | ||
|
|
||
| ## Invariants and Edge Cases | ||
|
|
||
| - Idempotent in effect: calling on a non-Windows platform is always a no-op (mempalace/_stdio.py:L49-L50). | ||
| - Missing or non-reconfigurable streams are silently skipped without raising or invoking the failure callback (mempalace/_stdio.py:L58-L61). | ||
| - A reconfiguration failure on one stream never prevents the remaining streams from being attempted (loop continues over all three) (mempalace/_stdio.py:L57-L71). | ||
| - The only side effects are: (a) reconfiguring the three standard streams to UTF-8 on Windows, and (b) on a failure with no callback, writing one warning line per failing stream to standard error (mempalace/_stdio.py:L62-L71). The function performs no filesystem, network, or environment access. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Spec: `mempalace/backends/__init__.py` | ||
|
|
||
| ## Purpose | ||
|
|
||
| This is the public package facade for MemPalace storage backends (RFC 001). It defines no behavior of its own; it aggregates and re-exports the public surface from sibling modules so that consumers import everything from one stable namespace (`mempalace/backends/__init__.py:L1-L15`). An implementation in any language should expose a single package/namespace that surfaces the symbols listed below, sourced from the corresponding submodules. | ||
|
|
||
| ## Re-exported Contract Symbols (from the `base` submodule) | ||
|
|
||
| The package re-exports the following abstract contract and value types from `base` (`mempalace/backends/__init__.py:L17-L37`): | ||
|
|
||
| - Abstract contracts: `BaseBackend` (per-palace factory contract) and `BaseCollection` (per-collection read/write contract) (`mempalace/backends/__init__.py:L5-L6`, `mempalace/backends/__init__.py:L21-L22`). | ||
| - Value/identity object: `PalaceRef` — identifies a palace for a backend (`mempalace/backends/__init__.py:L7`, `mempalace/backends/__init__.py:L32`). | ||
| - Typed read returns: `QueryResult`, `GetResult` (`mempalace/backends/__init__.py:L8`, `mempalace/backends/__init__.py:L26`, `mempalace/backends/__init__.py:L33`). | ||
| - Lexical/health/maintenance result types: `HealthStatus`, `LexicalHit`, `LexicalResult`, `MaintenanceResult` (`mempalace/backends/__init__.py:L27-L30`). | ||
| - Error classes: `BackendError`, `BackendClosedError`, `BackendMismatchError`, `CollectionNotInitializedError`, `DimensionMismatchError`, `EmbedderIdentityMismatchError`, `PalaceNotFoundError`, `UnsupportedCapabilityError`, `UnsupportedFilterError`, `UnsupportedMaintenanceKindError` (`mempalace/backends/__init__.py:L18-L36`). | ||
|
|
||
| ## Re-exported Concrete Backends (from backend submodules) | ||
|
|
||
| The package re-exports concrete backend implementations and their collection classes, one pair per storage engine (`mempalace/backends/__init__.py:L38-L41`): | ||
|
|
||
| - `ChromaBackend` / `ChromaCollection` — the in-tree default backend (`mempalace/backends/__init__.py:L14`, `mempalace/backends/__init__.py:L38`). | ||
| - `PgVectorBackend` / `PgVectorCollection` (`mempalace/backends/__init__.py:L39`). | ||
| - `QdrantBackend` / `QdrantCollection` (`mempalace/backends/__init__.py:L40`). | ||
| - `SQLiteExactBackend` / `SQLiteExactCollection` (`mempalace/backends/__init__.py:L41`). | ||
|
|
||
| ## Re-exported Registry Functions (from the `registry` submodule) | ||
|
|
||
| The package re-exports the backend registry API (`mempalace/backends/__init__.py:L42-L52`): | ||
|
|
||
| - `get_backend`, `get_backend_class` — resolve a backend instance / class (`mempalace/backends/__init__.py:L46-L47`). | ||
| - `register`, `unregister`, `reset_backends` — mutate the registry of available backends (`mempalace/backends/__init__.py:L48-L49`, `mempalace/backends/__init__.py:L51`). | ||
| - `available_backends` — enumerate registered backends (`mempalace/backends/__init__.py:L43`). | ||
| - `detect_backend_for_path`, `detect_backends_for_path` — infer the backend(s) for a given on-disk palace path (`mempalace/backends/__init__.py:L44-L45`). | ||
| - `resolve_backend_for_palace` — resolve the backend for a palace reference (`mempalace/backends/__init__.py:L50`). | ||
|
|
||
| ## Public Surface Invariant | ||
|
|
||
| The exported public namespace is explicitly enumerated and is the authoritative list of symbols this package promises to consumers; it contains exactly the 37 names listed and they must all be importable from the package root (`mempalace/backends/__init__.py:L54-L91`). The enumerated public list is a superset of the docstring summary: every concrete backend pair, every registry function, and every contract/error/result type re-exported above appears in it (`mempalace/backends/__init__.py:L54-L91`). Any symbol not present in this list is not part of the package's public contract. | ||
|
|
||
| ## Side Effects | ||
|
|
||
| Importing the package transitively imports the `base`, `chroma`, `pgvector`, `qdrant`, `sqlite_exact`, and `registry` submodules; any import-time side effects of those modules (e.g. backend registration) occur as a consequence of loading this facade (`mempalace/backends/__init__.py:L17-L52`). This file itself performs no filesystem, network, process, or environment access beyond importing those submodules. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues:
/tmp(/tmp/spec-citation-report.txt) can lead to symlink attacks or conflicts if multiple users or concurrent processes run the script. Usingmktempis much safer.findpiped towhile readwithout-print0andIFS=can fail or misbehave if filenames contain spaces, newlines, or backslashes.Suggestion:
Use
mktempto securely create a temporary file, set up atrapto clean it up on exit, and usefind -print0withread -d ''to robustly handle any special characters in filenames.