Add Rust core (agent-trace-core) via PyO3 + maturin by vishwanath1306 · Pull Request #218 · Siddhant-K-code/agent-trace

vishwanath1306 · 2026-06-17T04:58:20Z

Introduce a Rust crate exposing trace models, NDJSON with a SHA-256 hash chain, and Claude Code JSONL import. Built with maturin/PyO3 as an abi3 extension module (agent_trace_core) and usable as a plain Rust library.

Output is byte-for-byte identical to agent_trace.jsonl_import, verified across all local Claude sessions. Moving these hot paths to Rust keeps memory bounded on large traces with no per-event Python allocation.

Do not merge for now. Sending it as a way to start discussion.

Siddhant-K-code · 2026-06-17T07:34:17Z

Moving it to draft for now!

Siddhant-K-code · 2026-06-17T08:07:15Z

Really glad you sent this as a discussion starter. The approach is right and there's a lot to like. A few things worth talking through before this goes further.

The event_id problem is worse than it looks

The README calls sequential IDs an intentional choice for test reproducibility. But event_id is load-bearing in a way that matters.

In subagent.py, the tree view works by matching a parent session's tool_call.event_id against child sessions' meta.parent_event_id. Import a parent session through Rust and the tool_call that spawned a subagent gets reassigned from "a3f9c2b1d4e0" to "000000000003". The child's meta.json still points to "a3f9c2b1d4e0". The lookup misses. The tree collapses silently. No error, just a flat list where there should be a nested view.

The fix is simple. Claude Code's JSONL has a uuid field on each entry. Use that (truncated to 12 hex chars, same as Python's uuid4().hex[:12]) instead of the counter. For tests, snapshot the expected NDJSON output rather than relying on sequential IDs to make it reproducible.

Hash chain cross-compatibility needs a test

Python's append_event hashes the last line of the file as written to disk. Rust hashes inline during the write pass. That works as long as serialization is byte-for-byte identical, and preserve_order on serde_json is doing a lot of work to make that true.

One thing to verify: Python's to_json() drops fields that are None or empty string, including redacted when it's false. If Rust emits "redacted": false where Python omits the key entirely, verify_hash_chain fails on files written by one and read by the other. A round-trip test catches this before it becomes a production surprise: write in Python, import in Rust, verify chain. Then write in Rust, verify in Python.

Should this replace jsonl_import.py entirely?

That's the question I keep coming back to. Right now there are two implementations of the same import logic. The Rust one has better memory characteristics (no per-event Python allocation, bounded memory on large traces). The Python one is the current canonical path.

My instinct: make Rust the canonical path and delete the Python importer once the event_id issue is fixed and the round-trip test passes. Maintaining two implementations of ~300 lines of careful translation logic is the real long-term risk here, not the Rust complexity.

What's already right

The extension-module feature flag pattern is easy to get wrong and you got it right. Off for cargo test, on for maturin builds, so both work cleanly without fighting libpython linking. The abi3-py310 wheel means one build covers CPython 3.10+. And preserve_order on serde_json to match Python's field ordering is a detail most people would miss.

On the streaming parser

Also worth thinking about before this lands: right now import_jsonl does fs::read_to_string then parses line by line, which loads the whole file into memory before parsing starts. For multi-MB Claude sessions a BufReader + serde_json::Deserializer::from_reader streaming approach would keep memory flat regardless of file size. Not a blocker, but a natural next step once the foundations are solid.

Two things to fix before merge: preserve source uuid as event_id, and add the cross-language hash chain round-trip test. Everything else is either already good or future work.

Rust crate with trace models, NDJSON + SHA-256 hash chain, and Claude Code JSONL import. Builds as both a pure-Rust library and an abi3 Python extension (agent_trace_core) via maturin/PyO3. Output matches the existing agent_trace.jsonl_import format, so the Python tooling reads it unchanged.

event_id was a per-session counter, so ids collided across sessions and broke consumers that treat them as global (e.g. OTLP span ids). Derive it from each JSONL entry's uuid (12 hex), with an intra-entry suffix and a session_end fallback. Add tests/test_rust_core_roundtrip.py: Python<->Rust hash-chain verification in both directions and byte-identical serialization.

- PyO3 is optional behind a `python` feature; the default rlib build is pure Rust and links no libpython. - write_ndjson recomputes the whole prev_hash chain, discarding stale values; import reuses it instead of duplicating the loop. - verify_hash_chain hashes lines verbatim, matching the writer exactly.

serde_json emits raw UTF-8 in strings; Python escapes non-ASCII as \uXXXX, so traces with accents/CJK/emoji serialized to different bytes and broke hash-chain parity. to_json now escapes non-ASCII (surrogate pairs for astral chars); ASCII output takes a fast path.

vishwanath1306 · 2026-06-18T04:42:33Z

Replace `jsonl_import.py` Entirely?

Agreed on the long-term direction: two implementations of ~300 lines of fiddly translation logic is the real maintenance risk, and I’d like to converge on one. But I don’t think we should delete the Python importer in this PR. Two things have to land first:

Redaction parity: The Python TraceStore runs secret redaction on append by default via redaction_enabled(), and the Rust importer does not. When I diff the two across my local Claude sessions, the output is byte-for-byte identical on content + meta only with Python redaction disabled. The sessions that differ do so solely in redacted fields. Making Rust canonical before porting redaction would silently drop a security feature on import.
The zero-dependency promise: Today import works on a pure-Python, stdlib-only install. If Rust becomes the only path, any platform/arch without a prebuilt wheel loses the ability to import. That breaks the project’s zero-dependency guarantee.

So my proposal is the incremental version: make the Rust core the fast path when the extension is importable, with the existing Python importer as the fallback. That gives one observable behavior, two backends, and no capability loss. Then we can delete jsonl_import.py once redaction is ported to Rust and wheels cover the support matrix. I’d do that wiring as a focused follow-up PR rather than expanding this one.

Streaming Parser Follow-Up

Fully agree this is the right next step, and I’d like to take it on as a follow-up so the foundation here stays reviewable.

A couple notes on the shape it should take:

The importer is inherently two-pass: the first pass derives session-level metadata (session_id, started_at from the first entry, ended_at from the last), and the second emits events. So the win is not a single streaming pass. It is making both passes stream over a BufReader line-by-line or via Deserializer, instead of the current read_to_string plus Vec<Value> of every entry held in memory. That gets us to roughly constant memory regardless of file size.
For true bounded memory, the write side should stream too. write_ndjson currently builds the whole NDJSON String before writing. Hashing inline as each line is emitted to a BufWriter keeps output memory bounded as well.
I’ll back it with a benchmark on my largest local session so we can compare before/after rather than assume the win.

Marking this as tracked follow-up unless you’d rather it be part of this PR.

Siddhant-K-code · 2026-06-18T15:58:24Z

Good updates. A few reactions:

event_id fix

The event_id() function with fallback_seed + intra counter is cleaner than what I suggested. Deriving from the source uuid keeps IDs stable across re-imports, and the intra-entry suffix handles the multiple-events-per-entry case properly. The tests cover both. This is done.

Feature flags

The three-tier split (default / python / extension-module) is the right call. A pure Rust consumer of the rlib now pulls in zero Python. That matters if this ever gets used outside the agent-trace wheel context.

Redaction

Fair point, and I missed this entirely. Python's TraceStore.append_event() runs secret redaction by default. Rust doesn't. Making Rust canonical before porting redaction would silently drop credentials from the import path. The fast-path / fallback approach makes sense: use Rust when the wheel is importable, fall back to Python otherwise. One observable behavior, two backends, no capability loss. Delete jsonl_import.py once redaction is ported.

Streaming

The two-pass clarification is useful. The win isn't a single-pass rewrite, it's making both passes stream over a BufReader instead of read_to_string + Vec<Value> in memory. The write_ndjson point is good too — building the whole NDJSON string before writing bounds output memory unnecessarily. Benchmark first, then we'll know if it's worth the complexity.

What's left before merge

Two things: the hash chain round-trip test (write in Python, verify in Rust; write in Rust, verify in Python), and the fast-path / fallback wiring. Redaction port and streaming are follow-ups.

This is in good shape.

Siddhant-K-code · 2026-06-19T10:10:28Z

Two new commits worth calling out.

write_ndjson recomputes the hash chain

This is the right fix. The old version imported stale prev_hash values from the source JSONL. Now it discards them and recomputes the whole chain on write, so the chain is always consistent with what Rust actually serialized. verify_hash_chain hashing lines verbatim closes the loop — it verifies exactly what the writer produced, not a re-serialization.

ensure_ascii parity

This one would have been a silent production bug. serde_json emits raw UTF-8; Python escapes non-ASCII as \uXXXX. Any trace with accents, CJK, or emoji would serialize to different bytes and break hash chain verification across the two implementations. The escape_non_ascii function with encode_utf16 for surrogate pairs matches Python's behavior exactly, including astral characters. The test covers café, Japanese, and emoji in the same event. Good catch.

The round-trip test now covers both directions (Python writes / Rust verifies, Rust writes / Python verifies) and the non-ASCII case. That was the last open item from my earlier review.

Remaining before merge: fast-path / fallback wiring in jsonl_import.py. Everything else is done.

vishwanath1306 · 2026-06-19T19:52:14Z

Both pre-merge items are in:

The Round-trip test is in tests/test_rust_core_roundtrip.py: both directions + redacted:false + non-ASCII parity.
Fast-path/fallback currently has import_jsonl() using the agent_trace_core only when it's importable, redaction is off, no workspace, and AGENT_STRACE_NO_RUST unset; otherwise the Python path (redaction/workspace preserved).
Tests cover each gate + Rust↔Python parity.

Siddhant-K-code · 2026-06-22T06:26:44Z

Pulled the two new commits. Here's what I see.

Fast-path / fallback wiring (7cae28b)

This is exactly what we discussed. _rust_import checks store.redact and store.workspace_id before touching the extension, so the Python path stays canonical for anything that needs redaction or workspace isolation. AGENT_STRACE_NO_RUST=1 as an escape hatch is the right call — useful for debugging and for CI environments where you want to test the Python path explicitly.

The test coverage is solid. Three cases: Rust absent (fallback), redaction enabled (Python forced), workspace active (Python forced). The _failing_rust_core helper that raises AssertionError if called is a clean way to assert the Rust path was never reached.

One small thing: _rust_import returns str | None but the return type annotation says str | None only in the docstring, not as a type hint. Minor, but worth adding for consistency with the rest of the file.

Cleanup (b481369)

char_len removed (unused after the take_chars refactor), obj_get_str / usage_u64 renamed to get_str / get_u64. README trimmed. All fine.

One thing I'd push back on: the "Why Rust" section was removed from the README. That section was actually useful — it explained the memory model (stack values dropped deterministically, no GC) in a way that justifies the complexity of maintaining a Rust crate alongside Python. Without it, a new contributor reading the README has no context for why this exists. Worth keeping a shorter version.

State of the PR

The last open item from my earlier review (fast-path / fallback wiring) is done. The round-trip tests cover both directions and the non-ASCII case. Redaction and streaming are correctly scoped as follow-ups.

This is ready to come out of draft.

vishwanath1306 · 2026-06-23T01:11:25Z

I added a shorter 'Why Rust' back in. As for the return type, _rust_import actually already uses the -> str | None annotation, so the docstring is just prose. I think we're covered there, but let me know if you're seeing something else!

Siddhant-K-code marked this pull request as draft June 17, 2026 07:35

vishwanath1306 added 4 commits June 17, 2026 21:14

vishwanath1306 force-pushed the rust-sdk-pyo3 branch from 15a01ed to eb2c6be Compare June 18, 2026 04:43

vishwanath1306 added 2 commits June 19, 2026 10:39

made changes to the parity

7cae28b

changed README and imports

b481369

Restore shortened Why Rust section to crate README

043c1da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Rust core (agent-trace-core) via PyO3 + maturin#218

Add Rust core (agent-trace-core) via PyO3 + maturin#218
vishwanath1306 wants to merge 7 commits into
Siddhant-K-code:mainfrom
vishwanath1306:rust-sdk-pyo3

vishwanath1306 commented Jun 17, 2026

Uh oh!

Siddhant-K-code commented Jun 17, 2026

Uh oh!

Siddhant-K-code commented Jun 17, 2026 •

edited

Loading

Uh oh!

vishwanath1306 commented Jun 18, 2026

Uh oh!

Siddhant-K-code commented Jun 18, 2026

Uh oh!

Siddhant-K-code commented Jun 19, 2026

Uh oh!

vishwanath1306 commented Jun 19, 2026

Uh oh!

Siddhant-K-code commented Jun 22, 2026

Uh oh!

vishwanath1306 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vishwanath1306 commented Jun 17, 2026

Uh oh!

Siddhant-K-code commented Jun 17, 2026

Uh oh!

Siddhant-K-code commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishwanath1306 commented Jun 18, 2026

Replace jsonl_import.py Entirely?

Streaming Parser Follow-Up

Uh oh!

Siddhant-K-code commented Jun 18, 2026

Uh oh!

Siddhant-K-code commented Jun 19, 2026

Uh oh!

vishwanath1306 commented Jun 19, 2026

Uh oh!

Siddhant-K-code commented Jun 22, 2026

Uh oh!

vishwanath1306 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Siddhant-K-code commented Jun 17, 2026 •

edited

Loading

Replace `jsonl_import.py` Entirely?