[BFTree] Add RangeIndex cluster migration#1731
Conversation
61a467b to
3a6e52f
Compare
3a6e52f to
dc78c02
Compare
74564ec to
74a1e44
Compare
5754d5d to
ac06150
Compare
Missing RangeIndex feature-enabled guardsThis PR doesn't consistently check whether the RangeIndex feature is enabled (i.e., whether 1. Source side —
|
| } | ||
|
|
||
| public async Task<bool> TransmitKeysAsync(Dictionary<byte[], byte[]> vectorSetKeysToIgnore) | ||
| public async Task<bool> TransmitKeysAsync(Dictionary<byte[], byte[]> vectorSetKeysToIgnore, Dictionary<byte[], byte[]> rangeIndexKeysToIgnore) |
There was a problem hiding this comment.
Is it possible to combine these two instead of having to maintain two dictionaries?
vazois
left a comment
There was a problem hiding this comment.
libs/cluster/Server/Migration/MigrateScanFunctions.cs:53 — maybe not for this PR, but can't we just get these values from an enum?
| await WaitForConfigPropagationAsync().ConfigureAwait(false); | ||
|
|
||
| // Discover Vector Sets linked namespaces | ||
| var allKeys = migrateTask.sketch.Keys.Select(t => t.Item1.ToArray()); |
There was a problem hiding this comment.
can we avoid the copy here and just iterate over the container?
vazois
left a comment
There was a problem hiding this comment.
libs/cluster/Server/Migration/MigrateSessionKeys.cs:43 — seems really wasteful to maintain a separate dictionary just for skipping keys. We can potentially store the info for the key type inline and skip the key once we read it from the sketch list
| } | ||
|
|
||
| /// <summary>Reset state for the next key stream.</summary> | ||
| private void Reset() |
There was a problem hiding this comment.
Ensure that the reset does not race with back to back Migration calls or parallel sessions on different slots
There was a problem hiding this comment.
Each ClusterSession has a single RangeIndexMigrationReceiveSession - within a single ClusterSession, processing is single threaded right?
- Update RangeIndexManager.Migration.cs for 7 API changes from main: dataDir->riLogRoot, liveIndexes key nint->Guid, SnapshotToFile->CPR pattern, evicted tree paths, RecoverFromCprSnapshot, LogDataPathFor, stub.ResetFlags() - Document TRYAGAIN behavior for RI commands during migration instead of spin-wait, with rationale (pipeline stall, client timeout, and connection reset risks) - Add Redis protocol context and SE.Redis client handling details Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9e45701 to
74b9745
Compare
…errors - Return TRYAGAIN for RI.* commands when key is blocked during migration instead of spin-waiting, avoiding pipeline stall and client timeout risks - Add returnTryAgainForMigratingKeys field to ClusterSlotVerificationInput - Add MigratingKeyResult local function in SingleKeySlotVerify - Thread parameter through MultiKeySlotVerify, VerifyKeysInRange, NetworkIterativeSlotVerify - Fix migration API visibility: internal -> public for cross-project access (DefaultMigrationChunkSize, RangeIndexRecordType, GetRangeIndexKeysForMigration, SnapshotRangeIndexAndCreateReader, DeriveTempMigrationPath, PublishMigratedIndex) - Fix RangeIndexManager constructor calls in tests (enabled/dataDir -> riLogRoot) - Fix broken XML doc cref and remove stale Allure.NUnit usings - Fix dataDir -> riLogRoot references in RangeIndexManager.cs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove RangeIndexMigrationReceiveState allocation from ClusterSession constructor; create lazily on first SerializedRangeIndexStream record - Add XML doc on rangeIndexMigrationState explaining why per-session state is needed (chunked BfTree snapshots across CLUSTER MIGRATE calls) - Separate error paths: RangeIndex not enabled vs ProcessRecord failure - Use null-conditional pattern for protocol enforcement check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove redundant returnTryAgainForMigratingKeys parameter from SingleKeySlotVerify and NetworkIterativeSlotVerify; read from csvi field - Fix XML doc cref in ClusterSession - Remove duplicate clusterSession.Dispose() in RespServerSession Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thread _cts.Token through TransmitRangeIndexAsync chunk read loop and MigrateRangeIndexKeysAsync key iteration for consistent cancellation support with rest of migration session. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merge snapshot and transmission into single try/catch block with using declaration. Update error message to cover both phases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace direct _cts.Token access with explicit CancellationToken parameter. Remove default chunkSize value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dex.cs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use fixed statement to pin the key byte[] before creating PinnedSpanByte, matching the pattern in MigrateSessionSlots.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SnapshotForMigration now receives PinnedSpanByte instead of ReadOnlySpan<byte>, matching the codebase pattern for internal storage methods. The caller pins the key with fixed statement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pin keyBytes alongside stubBytes in the fixed block and use FromPinnedPointer instead of FromPinnedSpan. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add migrationTempDir field, created in constructor after cleanup - Remove dot prefix (.migration-tmp -> migration-tmp) for Windows - Remove per-call Directory.CreateDirectory from DeriveTempMigrationPath - Simplify DeriveTempMigrationPath to expression-bodied member Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read_MainStore can trigger CopyToTail, which calls PostCopyToTail-cold and tries to take the RI exclusive lock. In SnapshotForMigration the exclusive lock is already held, causing a deadlock. Use Read_RangeIndex (which passes ReadCopyOptions.None) to suppress CTT on both read sites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Discovery (GetRangeIndexKeysForMigration) now returns HashSet<byte[]> instead of Dictionary — it only identifies which keys are RangeIndex, without capturing stub bytes that could go stale. SnapshotForMigration reads the stub under exclusive lock and returns it via out parameter. SnapshotRangeIndexAndCreateReader uses the fresh stub directly, eliminating the time-of-check/time-of-use gap where concurrent reads could promote the stub via RIPROMOTE. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These changes belong on the separate tiagonapoli/tryagain-ri-migration branch. Reverts ClusterSlotVerificationInput.returnTryAgainForMigratingKeys, MigratingKeyResult local function, and the IsRangeIndexCommand flag setting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 4 ExceptionInjectionType enum values for RI migration phases - Add TriggerException and ResetAndWaitAsync hooks in migration code - Add 7 new cluster migration tests (on-disk stubs, concurrent R/W, pause during transmit/delete, exception injection, slow transmit) - Add Stopwatch-based timing to sender overall, sender per-key, and receiver paths (logged in TimeSpan.Ticks) - Expose TotalFileBytes on RangeIndexChunkedSerializer and reader - Inject ILogger into RangeIndexMigrationReceiveState Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract timing variables into TransmitRangeIndexMetrics, MigrateRangeIndexMetrics, and ReceiveRangeIndexMetrics structs, each with a LogSummary method for self-contained logging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move TransmitRangeIndexMetrics, MigrateRangeIndexMetrics, and ReceiveRangeIndexMetrics into separate files under the new Server/Migration/RangeIndex/ directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move from lazy ??= initialization to constructor init. Simplify pattern match to plain null-check, add chunk count to protocol violation log message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…= null Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If a concurrent checkpoint already owns the snapshot claim, wait for it and copy the scratch file it produced — avoids spinning under the exclusive lock. Matches the pattern used by SnapshotForFlushViaCpr. Also changed File.Copy to overwrite: false since migration paths are always unique GUIDs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds end-to-end cluster migration for RangeIndex keys, supporting both
MIGRATE SLOTSandMIGRATE KEYSpaths. RangeIndex keys are backed by a native BfTree whose on-disk state (data.bftree) lives outside Tsavorite — shipping just the 51-byte stub is insufficient. This change streams the entire BfTree snapshot file alongside the stub over the existing migration transport.Architecture
RangeIndexChunkedSerializer): Pure state machine overSpan<byte>— no I/O. Takes file data as input viaMoveNext(dest, fileData, out consumed).RangeIndexMigrationReader): Async wrapper that reads the snapshot file and feeds bytes to the serializer.RangeIndexChunkedDeserializer): Sync state machine that writes received file data to a temp file, validates xxHash64 checksum, recovers the native BfTree, and publishes the stub to the store.RangeIndexManager.SnapshotRangeIndexAndCreateReader): Snapshots the BfTree under an exclusive lock to a temp file, then creates the reader.Wire format
Single
MigrationRecordSpanType.SerializedRangeIndexStream(tag 4). Stream format across chunks:[4B keyLen][key bytes][8B fileSize][file bytes][8B xxHash64][4B stubLen][stub]Key and file bytes may span chunks; all other elements must fit within a single chunk.
SLOTS path
RecordType == 2, captures toMigrateOperation.RangeIndexesMigrateRangeIndexKeysAsyncruns a sketch-protected batch cycle:TransmitRangeIndexAsyncfinally: clear sketch (unblocks clients)KEYS path
GetRangeIndexKeysForMigrationdiscovers RI keys via RIGETTransmitKeysAsyncskips RI keys (inrangeIndexKeysToIgnore)TransmitRangeIndexAsync, then marked in sketch forDeleteKeysAsyncBug fixes
GarnetServer.cs)Publishdeletes existing data file before movePublishregistration: acceptInPlaceUpdated/CopyUpdatedstatusTransmitRangeIndexAsynccatches all exceptions (never throws)Tests
TODO