Skip to content

perf(messaging): add ref-counted message pooling#10072

Draft
ReubenBond wants to merge 2 commits into
dotnet:mainfrom
ReubenBond:split/message-pooling-refcounting
Draft

perf(messaging): add ref-counted message pooling#10072
ReubenBond wants to merge 2 commits into
dotnet:mainfrom
ReubenBond:split/message-pooling-refcounting

Conversation

@ReubenBond

@ReubenBond ReubenBond commented Apr 30, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds a thread-local MessagePool and ref-counted message ownership APIs for safe message reuse.
  • Moves message creation/deserialization onto pooled Message instances and resets messages when they return to the pool.
  • Updates core/runtime send, receive, drop, and callback lifecycle paths to release message references explicitly.
  • Updates activation repartitioning to record message addresses instead of retaining Message instances, and adds focused message pool tests.

Validation

  • git diff --check
  • conflict-marker scan
  • dotnet build src\Orleans.Core\Orleans.Core.csproj -m
  • dotnet build src\Orleans.Runtime\Orleans.Runtime.csproj -m
  • dotnet test test\Orleans.Core.Tests\Orleans.Core.Tests.csproj --filter MessagePool (20 passed)

Dependencies / notes

  • Excludes callback pooling/dictionary, invokable pooling, benchmarks, SEDA, and the transport rewrite.
  • NonSilo.Tests.csproj was not present on the current base, so message pool tests live under Orleans.Core.Tests.
  • Additional end-to-end networking lifecycle/leak stress coverage would still be valuable.
Microsoft Reviewers: Open in CodeFlow

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces ref-counted ownership tracking for Message instances and a thread-local MessagePool to enable safe reuse of messages across the runtime’s send/receive lifecycle, while updating related runtime paths (networking, callbacks, activation repartitioning) to explicitly release message references.

Changes:

  • Add MessagePool plus ref-counted Message.Acquire()/Release() ownership APIs (with DEBUG leak tracking and reset-on-return semantics).
  • Switch message creation/deserialization paths to use pooled Message instances and update many lifecycle paths to call Release()/ReleaseDropped().
  • Update activation repartitioning sampling to record addresses (instead of retaining Message instances) and add focused message pool tests.
Show a summary per file
File Description
test/Orleans.Placement.Tests/ActivationRepartitioningTests/TestMessageFilter.cs Updates test filter API to accept grain ids instead of Message.
test/Orleans.Core.Tests/Messaging/MessagePoolTests.cs Adds unit tests for pooling/ref-count behaviors and DEBUG leak tracking.
src/Orleans.Runtime/Placement/Repartitioning/RepartitionerMessageFilter.cs Updates filter API to operate on GrainId pairs rather than Message.
src/Orleans.Runtime/Placement/Repartitioning/ActivationRepartitioner.cs Changes pending buffer type to store recorded message metadata.
src/Orleans.Runtime/Placement/Repartitioning/ActivationRepartitioner.MessageSink.cs Records message addressing data instead of retaining Message objects.
src/Orleans.Runtime/Networking/SiloConnection.cs Releases dropped/expired/rejected messages and marks ownership transfers.
src/Orleans.Runtime/Networking/GatewayInboundConnection.cs Releases dropped/expired/rejected gateway messages.
src/Orleans.Runtime/Messaging/MessageCenter.cs Releases blocked/expired outgoing messages and adjusts observer invocation ordering.
src/Orleans.Runtime/Messaging/Gateway.cs Releases messages rejected due to client drop.
src/Orleans.Runtime/Core/InsideRuntimeClient.cs Releases/marks responses in callback/no-callback/status-response paths.
src/Orleans.Runtime/Core/HostedClient.cs Releases expired messages at dispatch.
src/Orleans.Runtime/Catalog/StatelessWorkerGrainContext.cs Releases dropped messages when context creation fails.
src/Orleans.Runtime/Catalog/ActivationData.cs Releases dropped messages and releases completed requests.
src/Orleans.Core/Runtime/OutsideRuntimeClient.cs Releases/marks responses in callback/no-callback/status-response paths.
src/Orleans.Core/Runtime/InvokableObjectManager.cs Releases messages dropped during observer dispatch/invocation/response.
src/Orleans.Core/Runtime/CallbackData.cs Acquires request message while awaiting completion; releases on completion/timeout/cancel/fail.
src/Orleans.Core/Networking/Connection.cs Releases send-pipeline references after flush and on send failures.
src/Orleans.Core/Messaging/MessageSerializer.cs Deserializes into pooled Message instances.
src/Orleans.Core/Messaging/MessagePool.cs Adds thread-local message pooling and optional DEBUG leak tracking.
src/Orleans.Core/Messaging/MessageFactory.cs Creates request/response messages from the pool instead of allocating new ones.
src/Orleans.Core/Messaging/Message.cs Adds ref-count ownership tracking, drop-release helper, and reset-on-return support.
src/Orleans.Core/Messaging/ClientMessageCenter.cs Releases dropped/rejected client messages.

Copilot's findings

Comments suppressed due to low confidence (1)

src/Orleans.Runtime/Networking/SiloConnection.cs:256

  • In the expired-send path, msg.ReleaseDropped(...) can return the message to the pool and reset it. The subsequent msg.IsPing() check/logging can become a use-after-release (and may log incorrect data or race with reuse). Capture whether it’s a ping (and any info needed for logging) before releasing, and don’t touch msg after calling ReleaseDropped.
            if (msg.IsExpired)
            {
                this.MessagingTrace.OnDropExpiredMessage(msg, MessagingInstruments.Phase.Send);
                msg.ReleaseDropped("ExpiredAtSend");

                if (msg.IsPing())
                {
                    LogWarningDroppingExpiredPingMessage(this.Log, msg);
                }

                return false;
  • Files reviewed: 22/22 changed files
  • Comments generated: 5

Comment on lines +140 to 142
msg.ReleaseDropped("RejectedObsoleteEpoch");

LogDebugRejectingObsoleteRequest(this.Log, msg.TargetSilo?.ToString() ?? "null", this.LocalSiloAddress.ToString(), msg);

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msg.ReleaseDropped("RejectedObsoleteEpoch") is called before LogDebugRejectingObsoleteRequest(..., msg). Since releasing can reset/return the message to the pool, logging afterwards is a use-after-release and may print incorrect data or race with reuse. Log before releasing (or log using captured values).

This issue also appears on line 246 of the same file.

Suggested change
msg.ReleaseDropped("RejectedObsoleteEpoch");
LogDebugRejectingObsoleteRequest(this.Log, msg.TargetSilo?.ToString() ?? "null", this.LocalSiloAddress.ToString(), msg);
LogDebugRejectingObsoleteRequest(this.Log, msg.TargetSilo?.ToString() ?? "null", this.LocalSiloAddress.ToString(), msg);
msg.ReleaseDropped("RejectedObsoleteEpoch");

Copilot uses AI. Check for mistakes.
Comment on lines 385 to 399
@@ -382,6 +388,13 @@ private async Task ProcessOutgoing()
break;
}

// Release the send pipeline's reference after bytes have been flushed.
foreach (var msg in inflight)
{
msg.MarkTransferred("Connection.ProcessOutgoing:Sent");
msg.Release();
}

inflight.Clear();
}

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessOutgoing breaks on flushResult.IsCompleted || flushResult.IsCanceled before releasing the messages still in inflight. With ref-counted pooled messages, that will leak references on connection shutdown/cancel. Ensure inflight messages are released/cleared on all exit paths (including completed/canceled flush and exceptions), e.g., release before break and/or in finally.

Copilot uses AI. Check for mistakes.
Comment on lines +340 to +344
// Ref count should never go negative - indicates a double release.
#if DEBUG
Debug.Fail($"Message ref count went negative. Last transfer tag: '{_lastTransferTag}'");
#else
Debug.Fail("Message ref count went negative.");

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release() only calls Debug.Fail when the ref count goes negative. In non-DEBUG builds that check is effectively a no-op, so double-release/use-after-release can silently corrupt the pool (eg, decrementing a message which has been re-acquired by another owner). Consider throwing/fail-fast (or at least gating return-to-pool with an additional state/version check) when newRefCount < 0 to prevent silent corruption in production.

Suggested change
// Ref count should never go negative - indicates a double release.
#if DEBUG
Debug.Fail($"Message ref count went negative. Last transfer tag: '{_lastTransferTag}'");
#else
Debug.Fail("Message ref count went negative.");
// Ref count should never go negative - indicates a double release or use-after-release.
#if DEBUG
throw new InvalidOperationException($"Message ref count went negative. Last transfer tag: '{_lastTransferTag}'");
#else
throw new InvalidOperationException("Message ref count went negative.");

Copilot uses AI. Check for mistakes.
}

/// <summary>
/// Returns a message to the pool after resetting it.

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML doc says Return "Returns a message to the pool after resetting it", but the implementation delegates to message.Release(), which only resets/returns when the ref count reaches 0. Consider adjusting the doc comment (or renaming) to reflect that Return is just a release of the caller’s reference and may not return to the pool if there are other owners.

Suggested change
/// Returns a message to the pool after resetting it.
/// Releases the caller's reference to a message.
/// The message is reset and returned to the pool only when its reference count reaches zero.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +31
[Fact, TestCategory("BVT"), TestCategory("Messaging")]
public void Message_RefCount_InitializedToOne()
{
var message = MessagePool.Get();

Assert.NotNull(message);

message.Release();
}

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message_RefCount_InitializedToOne doesn’t assert anything about the ref count (it only checks non-null). Either rename it to reflect what it validates, or add an assertion which actually verifies the ref-count initialization behavior (eg, via a public/internal observable effect).

Copilot uses AI. Check for mistakes.
ReubenBond and others added 2 commits April 30, 2026 08:30
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Snapshot sampled message addressing before asynchronous processing so activation repartitioning does not observe reset pooled messages.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond force-pushed the split/message-pooling-refcounting branch from 3e263c6 to 11233d3 Compare April 30, 2026 15:34
@ReubenBond ReubenBond changed the title Add message ownership tracking with ref-counted pooling perf(messaging): add ref-counted message pooling May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants