Add Rust core (agent-trace-core) via PyO3 + maturin#218
Conversation
|
Moving it to draft for now! |
|
Really glad you sent this as a discussion starter. The approach is right and there's a lot to like. A few things worth talking through before this goes further. The The README calls sequential IDs an intentional choice for test reproducibility. But In The fix is simple. Claude Code's JSONL has a Hash chain cross-compatibility needs a test Python's One thing to verify: Python's Should this replace That's the question I keep coming back to. Right now there are two implementations of the same import logic. The Rust one has better memory characteristics (no per-event Python allocation, bounded memory on large traces). The Python one is the current canonical path. My instinct: make Rust the canonical path and delete the Python importer once the What's already right The On the streaming parser Also worth thinking about before this lands: right now Two things to fix before merge: preserve source |
Rust crate with trace models, NDJSON + SHA-256 hash chain, and Claude Code JSONL import. Builds as both a pure-Rust library and an abi3 Python extension (agent_trace_core) via maturin/PyO3. Output matches the existing agent_trace.jsonl_import format, so the Python tooling reads it unchanged.
event_id was a per-session counter, so ids collided across sessions and broke consumers that treat them as global (e.g. OTLP span ids). Derive it from each JSONL entry's uuid (12 hex), with an intra-entry suffix and a session_end fallback. Add tests/test_rust_core_roundtrip.py: Python<->Rust hash-chain verification in both directions and byte-identical serialization.
- PyO3 is optional behind a `python` feature; the default rlib build is pure Rust and links no libpython. - write_ndjson recomputes the whole prev_hash chain, discarding stale values; import reuses it instead of duplicating the loop. - verify_hash_chain hashes lines verbatim, matching the writer exactly.
serde_json emits raw UTF-8 in strings; Python escapes non-ASCII as \uXXXX, so traces with accents/CJK/emoji serialized to different bytes and broke hash-chain parity. to_json now escapes non-ASCII (surrogate pairs for astral chars); ASCII output takes a fast path.
Replace
|
15a01ed to
eb2c6be
Compare
|
Good updates. A few reactions:
The Feature flags The three-tier split ( Redaction Fair point, and I missed this entirely. Python's Streaming The two-pass clarification is useful. The win isn't a single-pass rewrite, it's making both passes stream over a What's left before merge Two things: the hash chain round-trip test (write in Python, verify in Rust; write in Rust, verify in Python), and the fast-path / fallback wiring. Redaction port and streaming are follow-ups. This is in good shape. |
|
Two new commits worth calling out.
This is the right fix. The old version imported stale
This one would have been a silent production bug. The round-trip test now covers both directions (Python writes / Rust verifies, Rust writes / Python verifies) and the non-ASCII case. That was the last open item from my earlier review. Remaining before merge: fast-path / fallback wiring in |
|
Both pre-merge items are in:
|
|
Pulled the two new commits. Here's what I see. Fast-path / fallback wiring ( This is exactly what we discussed. The test coverage is solid. Three cases: Rust absent (fallback), redaction enabled (Python forced), workspace active (Python forced). The One small thing: Cleanup (
One thing I'd push back on: the "Why Rust" section was removed from the README. That section was actually useful — it explained the memory model (stack values dropped deterministically, no GC) in a way that justifies the complexity of maintaining a Rust crate alongside Python. Without it, a new contributor reading the README has no context for why this exists. Worth keeping a shorter version. State of the PR The last open item from my earlier review (fast-path / fallback wiring) is done. The round-trip tests cover both directions and the non-ASCII case. Redaction and streaming are correctly scoped as follow-ups. This is ready to come out of draft. |
|
I added a shorter 'Why Rust' back in. As for the return type, _rust_import actually already uses the -> str | None annotation, so the docstring is just prose. I think we're covered there, but let me know if you're seeing something else! |
Introduce a Rust crate exposing trace models, NDJSON with a SHA-256 hash chain, and Claude Code JSONL import. Built with maturin/PyO3 as an abi3 extension module (agent_trace_core) and usable as a plain Rust library.
Output is byte-for-byte identical to agent_trace.jsonl_import, verified across all local Claude sessions. Moving these hot paths to Rust keeps memory bounded on large traces with no per-event Python allocation.
Do not merge for now. Sending it as a way to start discussion.