universe: improve sync reliability and performance by jtobin · Pull Request #2187 · lightninglabs/taproot-assets

jtobin · 2026-07-01T18:26:19Z

Resolves #2026.

This should substantially improve universe sync reliability and performance. In some cases the performance improvement/bandwidth reduction can be multiple orders of magnitude, and some DB contention/failure modes are eliminated entirely.

Codex's estimate of the reliability/performance wins:

For production scale: on a fresh empty sync, total work is still roughly all 186,377 proofs, so this mostly improves reliability and peak pressure, not total bytes. Peak DB batch pressure falls from roughly GOMAXPROCS * 200 leaves to 2 * 50; on an 8-16 core process that is about 16x-32x less peak write-side batch pressure. Peak nested proof-fetch concurrency should also drop about GOMAXPROCS / 2, so usually 4x-8x lower.

For steady-state or mostly-synced nodes, the improvement is much larger. Old behavior re-fetched all leaves in each divergent root; new behavior fetches only the delta. If a large 8,622-leaf root gains 1 new proof, that root’s leaf fetch/ write work drops by about 8,600x. At whole-server scale, if a sync is behind by 100, 1,000, or 10,000 proofs, the theoretical leaf-level work reduction versus worst-case refetching is about 1,864x, 186x, or 19x, respectively.

Net estimate: fresh syncs should stop retry-looping and have far smoother traffic; incremental syncs should see one to three orders of magnitude less redundant proof fetch/verify/write work depending on delta size.

The PR itself fixes three issues, each in a different layer of the universe sync pipeline (Opus's summary follows):

Leaf-key diff previously compared by pointer identity, so a mostly-synced node re-fetched every remote leaf on each sync. Now compared by content hash.
Roots were synced in the order the remote returned them, letting a transfer race ahead of its issuance and abort on dependency lookup. Now partitioned by proof type, issuance first.
Sync fan-out was unbounded and the write batch large, exhausting the DB transaction retry budget at scale. Now bounded, with a smaller batch.

Adds a SyncFixture that pairs two SQLite-backed universes (local and remote) with a SimpleSyncer wired to treat one side as remote, so a bench can drive SyncUniverse end-to-end without any network I/O. The direct-write registrar bypasses Archive verification because the seeded corpus is random proofs and the contention we want to observe lives in MultiverseStore.UpsertProofLeafBatch, not the verifier. SyncMetrics reports batches, leaves inserted, DB retry errors, and dependency-missing errors via b.ReportMetric so scenarios laid down in a follow-up commit surface each as its own benchstat column. Fraction is a bounded [0, 1] newtype used for LocalOverlap so a malformed seed spec fails at NewFraction rather than silently drifting the workload. SeedSpec keeps issuance and transfer as separate fields rather than a flat list tagged by proof type, so a caller cannot accidentally interleave the two.

Adds three benches that drive the sync fixture end to end: FreshLocal for the "first sync into an empty node" case, MostlySynced for the "resume after most leaves already present" case (which is the key one for demonstrating the SetDiff fix), and Mixed for a workload that interleaves issuance and transfer roots. The benches directly reproduce two of the three symptoms from issue lightninglabs#2026 with no code change: - FreshLocal/roots=50/leaves=200 crashes with "db tx retries exceeded: database is locked (5) (SQLITE_BUSY)" — the tx contention Phase 3 targets. - Every MostlySynced/* iteration inserts every remote leaf even when the local side already has 90% of them (e.g. 200 leaves fetched at leaves=200 when only 20 are new). This is the pointer-identity SetDiff bug Phase 1 targets. leaves_inserted reports the over-fetch each time. Baseline (Apple M4, sqlite, -benchtime=1x): MostlySynced/roots=10/leaves=50 2.64s 500 leaves (20 expected) MostlySynced/roots=10/leaves=200 22.1s 2000 leaves (200 expected) MostlySynced/roots=50/leaves=50 15.9s 2500 leaves (250 expected) MostlySynced/roots=50/leaves=200 138 s 10000 leaves (1000 expected) FreshLocal/roots=10/leaves=50 2.08s 500 leaves FreshLocal/roots=10/leaves=200 16.5s 2000 leaves FreshLocal/roots=50/leaves=50 13.0s 2500 leaves FreshLocal/roots=50/leaves=200 FAIL (db tx retries exceeded) Mixed/leaves=50 10.4s 2000 leaves Mixed/leaves=200 84.2s 8000 leaves bench/results/ is gitignored so the raw output lives in a scratch file locally; the numbers above are what the follow-up phases will compare against.

gemini-code-assist · 2026-07-01T18:26:31Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses reliability and performance issues in the universe sync pipeline. By optimizing how leaves are diffed, enforcing a logical order for root processing, and bounding concurrency to protect database stability, the changes significantly reduce redundant work and eliminate common failure modes observed at production scale.

Highlights

Leaf-key diffing: Replaced pointer-identity comparison with content-based hashing for leaf-key diffing, preventing redundant re-fetches of already-synced leaves.
Sync ordering: Partitioned universe roots by proof type, ensuring issuance roots are processed before transfer roots to avoid dependency lookup failures.
Sync concurrency: Introduced bounded concurrency for root syncing and reduced batch sizes to prevent database transaction retry budget exhaustion at scale.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly improves the performance and reliability of universe sync by introducing bounded concurrency, partitioning roots to process issuance before transfer, and implementing content-based leaf key diffing to prevent redundant fetches. It also reduces the default sync batch size to mitigate database transaction contention. The review feedback suggests adding defensive nil checks in the new diffLeafKeys function to prevent potential nil pointer dereference panics when handling interface slices.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

fn.SetDiff keys its intermediate maps by the LeafKey interface value, which for the common BaseLeafKey implementation compares the embedded *asset.ScriptKey by pointer address, not by pubkey content. tapdb's mintingKeys allocates a fresh ScriptKey per row per call, so the remote and local sides of the syncer's diff never observe overlapping pointer identities — even when the underlying content is identical. The result: on a mostly-synced node, every remote leaf looks new, and the syncer re-fetches the entire tree on each pass. This is the root cause of the "instantaneous 100MB traffic spikes" reported in issue lightninglabs#2026. Replace the call at syncer.go:336 with a small diffLeafKeys helper keyed by UniverseKey() — the same 32-byte content hash the multiverse stores under. The helper preserves remote's relative order too, which the downstream error-attribution loop indexes into. Unit tests in universe/syncer_test.go pin the direct regression (pointer-invariance) plus subset and subsequence semantics. Rapid property tests in syncer_diff_property_test.go elevate the same invariants to universally-quantified form, including extensional agreement with a slower reference impl comparing (outpoint, script pubkey) directly. Bench (Apple M4, -benchtime=1x), MostlySynced with 90% overlap: roots=10/leaves=50 2.64s -> 0.39s 500 -> 50 leaves inserted roots=10/leaves=200 22.1s -> 2.59s 2000 -> 200 leaves inserted

The previous executeSync ran every remote root through a single fn.ParSlice, in whatever order fetchAllRoots returned them. A transfer root could race ahead of the issuance root it depends on; when the archive verifier fetched the previous asset snapshot it would surface "no universe proof found" and abort the sync, which in issue lightninglabs#2026 the reporter observed as an ongoing loop of partial syncs. Introduce SortedRoots as a typed partition — Issuance and Transfer are separate fields, structurally impossible to interleave — and run the syncRoots fan-out twice, issuance first. syncRoots itself is factored out for reuse in the follow-up concurrency commit, where the fan-out swaps in a bounded worker pool. TestExecuteSync_IssuanceBeforeTransfer wires a minimal DiffEngine that records RootNode calls and asserts every issuance-typed call precedes every transfer-typed call. Rapid property tests in syncer_partition_property_test.go pin the three invariants of SortedRoots: soundness (each bucket holds only its own proof type), totality (every recognised root lands in exactly one bucket), and order-preservation within each bucket. ProofTypeUnspecified is included in the generator to cover the drop case matching pre- partition uniIdSyncFilter behaviour.

Two changes wired together to relieve the DB tx contention issue lightninglabs#2026 reports as "db tx retries exceeded": - SimpleSyncCfg gains an internal SyncRootConcurrency knob and syncRoots swaps its unbounded fn.ParSlice for a bounded errgroup.SetLimit worker pool. NewSimpleSyncer clamps non-positive inputs to 1 so the internal fan-out can trust the "at least one worker" invariant without defensive checks. - defaultUniverseSyncBatchSize drops from 200 to 50, shortening per-tx write hold times. defaultUniverseSyncRootConcurrency lands at 2 — the value the issue reporter observed as retry-free in their environment. The knob is kept internal for now; no CLI flag. tapcfg wires production to (batch=50, concurrency=2). The bench fixture picks the same defaults so scenarios reflect the shipped workload. TestSyncRoots_HonoursConcurrencyCap spins up a probe with a live in-flight gauge and asserts peak concurrency never exceeds the cap across cap ∈ {1, 2, 4, 8}. TestNewSimpleSyncer_ClampsNonPositive Concurrency pins the constructor invariant. The ordering test introduced in the partition commit is upgraded here now that SyncRootConcurrency exists: it runs at concurrency 8 with a 1ms sleep in the recorder so a hypothetical single-pool refactor collapsing the two syncRoots calls would fail with observably interleaved orderings rather than passing by accident. Bench (Apple M4, -benchtime=1x): FreshLocal/roots=50/leaves=200 FAIL (db tx retries) -> 98.2s 0 retries

jtobin added 2 commits July 1, 2026 15:23

jtobin added this to the v0.8.1 milestone Jul 1, 2026

jtobin requested review from GeorgeTsagk and darioAnongba July 1, 2026 18:26

jtobin self-assigned this Jul 1, 2026

jtobin added performance syncing labels Jul 1, 2026

jtobin added this to Taproot-Assets Project Board Jul 1, 2026

github-project-automation Bot moved this to 🆕 New in Taproot-Assets Project Board Jul 1, 2026

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread universe/syncer.go

jtobin force-pushed the issue2026 branch from 5b19ef0 to c6d58ed Compare July 1, 2026 18:49

jtobin added 4 commits July 1, 2026 17:13

docs: add release note

d255c13

jtobin force-pushed the issue2026 branch from c6d58ed to d255c13 Compare July 1, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

universe: improve sync reliability and performance#2187

universe: improve sync reliability and performance#2187
jtobin wants to merge 6 commits into
lightninglabs:mainfrom
jtobin:issue2026

jtobin commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jtobin commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant