fix(eventualreg): converge own names after a process restart#343
Open
wolfy-j wants to merge 1 commit into
Open
fix(eventualreg): converge own names after a process restart#343wolfy-j wants to merge 1 commit into
wolfy-j wants to merge 1 commit into
Conversation
A node that restarts re-registers its names with localCounter reset to 0, so a fresh dot is causally stale behind the node-left reap tombstone peers minted at the prior incarnation's counter. Apply drops it (in.Counter < cur.Counter) and the name stays dead cluster-wide (control_rpc/node_query => 'name not registered'). - state: nextCounter() seeds a local mint above cv[localNode] (Lamport advance), so a re-registration dominates the prior incarnation's dot/tombstone. - service: track owned names and re-assert one when an incoming same-origin dot (a stale reap) overrides it, off the merge hot path (guarded by a lock-free localNode compare). Tests: Lamport-seed unit + full restart->reap->rejoin convergence. race + vet clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A node that restarts re-registers its names with
localCounterreset to 0 (NewState). Meanwhile peers, on the node'sNodeLeft, reap its bindings into tombstones at the prior incarnation's counter. The restarted node's fresh dot (counter 1) is therefore causally stale —State.Applydrops it via thein.Counter < cur.Counterbranch (and the reap comment notes the tombstone is "terminal"). The name stays dead cluster-wide, so cluster-visible names like a control-RPC endpoint or a per-node read service resolve asname not registeredindefinitely after a control-node restart.Fix
Two layers, both off the merge hot path:
nextCounter()returns a counter strictly above bothlocalCounterandcv[localNode]. A restarted node relearns its prior incarnation's highest counter via anti-entropy (bumpCV), so a re-registration now dominates the prior dot/tombstone instead of losing to it.e.Node == LocalNode()compare; the owned map/lock and re-register run only on the rare self-origin override.Tests
TestRegister_SeedsCounterAboveObservedOrigin— the Lamport seed.TestRestart_ReclaimsOwnNameAfterReapTombstone— full register → reap → restart → rejoin convergence (fails on main, passes here).go test -race+go vetclean.