Skip to content

Commit 0efce2b

Browse files
ProjectsByJackHeguhetiermtfriesenCopilot
authored
[CP] Add support for multiprocess sharing + Fix a QTIP xdp bug (#5798) (#5747) (#5395) (#6050)
## Description Cherry picking: #5798 and #5747 ## Testing CI ## Documentation No --------- Co-authored-by: Guillaume Hetier <guhetier@microsoft.com> Co-authored-by: Michael Friesen <3517159+mtfriesen@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 3e1a5c0 commit 0efce2b

33 files changed

Lines changed: 1112 additions & 225 deletions

docs/CIBIR.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# CIBIR
2+
3+
## What is it
4+
5+
See the [draft IETF](https://datatracker.ietf.org/doc/html/draft-banks-quic-cibir) for CIBIR.
6+
7+
When CIBIR is used, rather than programming [XDP](./XDP.md) to filter and demux packets based on on address and port number,
8+
XDP with CIBIR will instead filter and de-mux packets based on address, port number, and QUIC connection ID.
9+
10+
What CIBIR allows for is 2 or more separate server processes to share a single
11+
port on the same machine, as long as their CIBIR ID is different.
12+
13+
## CIBIR port sharing logic
14+
- Applications must provide a well-known local port for server sockets when using CIBIR and XDP.
15+
- **IMPORTANT:** MsQuic will **NOT** reserve an OS port for server sockets when both CIBIR and XDP is enabled and available.
16+
- Client sockets can never share ports, so MsQuic will reserve an OS port in that scenario.
17+
- The responsibility of book-keeping shared ports and ensuring robust protection for those shared ports is delegated to the application.
18+
19+
20+
## Port protection recommendations for shared ports
21+
22+
### Option 1: Persistent port reservations (Recommended)
23+
24+
MsQuic strongly recommends applications leverage the Windows [persistent port reservations API](https://learn.microsoft.com/en-us/windows/win32/api/iphlpapi/nf-iphlpapi-createpersistentudpportreservation) to secure shared CIBIR ports prior to serving multi-process CIBIR traffic on a shared port.
25+
- One time setup by a system admin to create the persistent reservation.
26+
- A good option for book-keeping persistent port reservations is via registry keys.
27+
- Persistent port reservations survive reboots, allowing for robust protection in the event of crashes.
28+
- Having a persistent reservation makes sure CIBIR ports are taken out of the ephemeral port pool and forbids sockets from binding to it unless it is associated with a persistent reservation token, which can only happen in an elevated process.
29+
- This way, an unsuspecting application process won't get accidently assigned an ephemeral port that collides with a CIBIR port.
30+
31+
### Option 2: WFP ALE (Application Layer Enforcement) filters
32+
33+
As an alternative, applications can use the [Windows Filtering Platform (WFP)](https://learn.microsoft.com/en-us/windows/win32/fwp/windows-filtering-platform-start-page) to create ALE filters that block unauthorized bind attempts to CIBIR ports.
34+
35+
ALE filters operate at the [bind and connect authorization layers](https://learn.microsoft.com/en-us/windows/win32/fwp/ale-layers) (`FWPM_LAYER_ALE_AUTH_RECV_ACCEPT_V4/V6`, `FWPM_LAYER_ALE_RESOURCE_ASSIGNMENT_V4/V6`). A filter can be configured to block any process from binding to a specific UDP port unless it matches an allowed application path or security descriptor.

docs/QTIP.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,24 @@ This setting can be overridden per client connection, allowing you to create som
2424
> [!IMPORTANT]
2525
> Crucial information necessary for users to succeed.
2626
27-
Using QTIP will initialize a TCP socket and attempt to bind to your listener's local address. This is to reserve a TCP port for your listener to ensure
28-
XDP does not steal any TCP traffic from your other processes later. That also means you need to ensure no other processes are listening on the same TCP port as your listener's local address prior
27+
Listeners with QTIP enabled will initialize a TCP and UDP socket and attempt to bind to your listener's local address. This is to reserve a TCP/UDP port for your listener to ensure
28+
XDP does not steal any traffic from other sockets later. That also means you need to ensure no other processes are listening on the same port as your listener's local address prior
2929
to starting your listener, otherwise the OS will throw a socket access denied / address in use error,
3030
and your listener will fail to initialize.
31+
32+
**Client connections with different QTIP enablements CAN exist on the same local port.**
33+
34+
MsQuic connections over XDP/UDP creates an OS UDP socket only, and relies on the OS to assign the app an ephemeral UDP port to reserve it. XDP will be configured to intercept UDP packets on that port.
35+
36+
MsQuic connections over XDP/QTIP creates an OS TCP socket only, and relies on the OS to assign the app an ephemeral TCP port to reserve it. XDP will be configured to intercept TCP traffic on that port.
37+
38+
Since apps can create many client connections with different QTIP enablements, sometimes the OS assigns
39+
the same TCP and UDP port number. If using client connections with and without QTIP enabled simultaneously, the application should not assume 4-tuples uniquely identify a connection and also track the QTIP state.
40+
41+
42+
**Listeners with different QTIP enablements shall NOT be able to exist on the same local port.**
43+
44+
A QTIP-enabled listener will reserve both UDP/TCP ports equal to the local port of the listener
45+
and configure XDP to intercept UDP/TCP packets on that local port. A non-QTIP listener will just reserve a UDP port
46+
and have XDP intercept UDP packets on that port. To avoid conflicting traffic, we disallow 2 listeners with the same
47+
local port but different QTIP enablements to exist.

docs/Settings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ These parameters are accessed by calling [GetParam](./api/GetParam.md) or [SetPa
170170
|-------------------------------------------|---------------------------|-----------|-----------------------------------------------------------|
171171
| `QUIC_PARAM_LISTENER_LOCAL_ADDRESS`<br> 0 | QUIC_ADDR | Get-only | Get the full address tuple the server is listening on. |
172172
| `QUIC_PARAM_LISTENER_STATS`<br> 1 | QUIC_LISTENER_STATISTICS | Get-only | Get statistics specific to this Listener instance. |
173-
| `QUIC_PARAM_LISTENER_CIBIR_ID`<br> 2 | uint8_t[] | Both | The CIBIR well-known idenfitier. |
173+
| `QUIC_PARAM_LISTENER_CIBIR_ID`<br> 2 | uint8_t[] | Both | Sets a [CIBIR](./CIBIR.md) (CID-Based Identification and Routing) well-known identifier. |
174174
| `QUIC_PARAM_DOS_MODE_EVENTS`<br> 2 | BOOLEAN | Both | The Listener opted in for DoS Mode event. |
175175

176176
## Connection Parameters

docs/XDP.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# MsQuic over XDP
2+
3+
To avoid confusion, "XDP" refers to [XDP-for-windows](https://github.com/microsoft/xdp-for-windows).
4+
MsQuic does not support Linux XDP as a datapath.
5+
6+
## What is XDP
7+
8+
XDP enables received packets to completely bypass the OS networking stack.
9+
10+
Applications can subscribe to XDP ring buffers to post packets to send,
11+
and process packets that are received through AF_XDP sockets.
12+
13+
Additionally, applications can program XDP to determine the
14+
logic for which packets to filter for, and what to do with them.
15+
16+
For instance: "drop all packets with a UDP header and destination port
17+
42."
18+
19+
## Port reservation logic
20+
21+
The type of logic MsQuic programs into XDP looks like:
22+
"redirect all packets with a destination port X to an AF_XDP socket."
23+
24+
This runs into the issue of **packet stealing.** If there was an unrelated process
25+
that binds an OS socket to the same port MsQuic used to program XDP, XDP will steal
26+
that traffic from underneath it.
27+
28+
Which is why MsQuic will always create an OS UDP socket on the same port as the AF_XDP
29+
socket to play nice with the rest of the stack.
30+
31+
There are *exceptions* to this port reservation.
32+
33+
- Sometimes, MsQuic may create a TCP OS socket instead, or both TCP and UDP (see [QTIP](./QTIP.md)).
34+
- Sometimes, MsQuic may NOT create any OS sockets at all (see [CIBIR](./CIBIR.md)).
35+
36+
37+
## MsQuic over XDP general architecture:
38+
39+
```mermaid
40+
flowchart TB
41+
42+
%% =========================
43+
%% NIC + RSS
44+
%% =========================
45+
NIC["NIC interface"]
46+
47+
RSS1["RSS queue"]
48+
RSS2["RSS queue"]
49+
50+
NIC --> RSS1
51+
NIC --> RSS2
52+
53+
%% =========================
54+
%% XDP FILTER ENGINE
55+
%% =========================
56+
subgraph XDP_ENGINE["XDP FILTER ENGINE"]
57+
58+
XDP_PROG1["XDP::XDP program"]
59+
XDP_PROG2["XDP::XDP program"]
60+
61+
XDP_RULES["XDP::XDP RULES"]
62+
63+
AFXDP1["AF_XDP Socket"]
64+
AFXDP2["AF_XDP Socket"]
65+
66+
RSS1 -->|packet data| XDP_PROG1
67+
RSS2 -->|packet data| XDP_PROG2
68+
69+
XDP_PROG1 --> XDP_RULES
70+
XDP_PROG2 --> XDP_RULES
71+
72+
XDP_RULES --> AFXDP1
73+
XDP_RULES --> AFXDP2
74+
75+
end
76+
77+
%% =========================
78+
%% PACKET DEMUX
79+
%% =========================
80+
DEMUX["Packet DE-MUX logic"]
81+
82+
AFXDP1 --> DEMUX
83+
AFXDP2 --> DEMUX
84+
85+
%% =========================
86+
%% CXPLAT SOCKET POOL
87+
%% =========================
88+
subgraph CXPLAT_POOL["CXPLAT SOCKET POOL HASH TABLE"]
89+
90+
CX1["CXPLAT Socket"]
91+
CX2["CXPLAT Socket"]
92+
CX3["CXPLAT Socket"]
93+
CX4["CXPLAT Socket"]
94+
95+
end
96+
97+
DEMUX --> CX1
98+
DEMUX --> CX2
99+
DEMUX --> CX3
100+
DEMUX --> CX4
101+
102+
%% =========================
103+
%% FIND BINDING LOGIC
104+
%% =========================
105+
BIND["FIND BINDING LOGIC"]
106+
107+
CX1 --> BIND
108+
CX2 --> BIND
109+
CX3 --> BIND
110+
CX4 --> BIND
111+
112+
%% =========================
113+
%% MSQUIC OBJECTS
114+
%% =========================
115+
subgraph MSQUIC_OBJECTS["MSQUIC OBJECTS"]
116+
117+
CONN1["Connection"]
118+
CONN2["Connection"]
119+
CONN3["Connection"]
120+
LIST1["Listener"]
121+
LIST2["Listener"]
122+
123+
end
124+
125+
BIND --> CONN1
126+
BIND --> CONN2
127+
BIND --> CONN3
128+
BIND --> LIST1
129+
BIND --> LIST2
130+
```

src/core/binding.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,15 @@ QuicBindingGetRemoteAddress(
299299
#endif
300300
}
301301

302+
_IRQL_requires_max_(DISPATCH_LEVEL)
303+
BOOLEAN
304+
QuicBindingGetQtipEnabled(
305+
_In_ const QUIC_BINDING* Binding
306+
)
307+
{
308+
return CxPlatSocketGetQtipEnabled(Binding->Socket);
309+
}
310+
302311
//
303312
// Returns TRUE if there are any registered listeners on this binding.
304313
//

src/core/binding.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,15 @@ QuicBindingGetRemoteAddress(
321321
_Out_ QUIC_ADDR* Address
322322
);
323323

324+
//
325+
// Queries the QTIP settings of the binding.
326+
//
327+
_IRQL_requires_max_(DISPATCH_LEVEL)
328+
BOOLEAN
329+
QuicBindingGetQtipEnabled(
330+
_In_ const QUIC_BINDING* Binding
331+
);
332+
324333
//
325334
// Looks up the listener based on the ALPN list. Optionally, outputs the
326335
// first ALPN that matches.

src/core/connection.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6699,11 +6699,14 @@ QuicConnParamSet(
66996699
memcpy(Connection->CibirId + 1, Buffer, BufferLength);
67006700

67016701
QuicTraceLogConnInfo(
6702-
CibirIdSet,
6702+
CibirIdSetInfo,
67036703
Connection,
6704-
"CIBIR ID set (len %hhu, offset %hhu)",
6704+
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
67056705
Connection->CibirId[0],
6706-
Connection->CibirId[1]);
6706+
Connection->CibirId[1],
6707+
(unsigned long long)QuicCibirIdToUint64(
6708+
Connection->CibirId + 2,
6709+
Connection->CibirId[0]));
67076710

67086711
return QUIC_STATUS_SUCCESS;
67096712
}

src/core/library.c

Lines changed: 32 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -2108,13 +2108,13 @@ MsQuicClose(
21082108
_IRQL_requires_max_(DISPATCH_LEVEL)
21092109
QUIC_BINDING*
21102110
QuicLibraryLookupBinding(
2111-
#ifdef QUIC_COMPARTMENT_ID
2112-
_In_ QUIC_COMPARTMENT_ID CompartmentId,
2113-
#endif
2114-
_In_ const QUIC_ADDR* LocalAddress,
2115-
_In_opt_ const QUIC_ADDR* RemoteAddress
2111+
_In_ const CXPLAT_UDP_CONFIG* UdpConfig
21162112
)
21172113
{
2114+
const QUIC_ADDR* LocalAddress = UdpConfig->LocalAddress;
2115+
const QUIC_ADDR* RemoteAddress = UdpConfig->RemoteAddress;
2116+
BOOLEAN EnableQtip = !!(UdpConfig->Flags & CXPLAT_SOCKET_FLAG_QTIP);
2117+
21182118
for (CXPLAT_LIST_ENTRY* Link = MsQuicLib.Bindings.Flink;
21192119
Link != &MsQuicLib.Bindings;
21202120
Link = Link->Flink) {
@@ -2123,7 +2123,7 @@ QuicLibraryLookupBinding(
21232123
CXPLAT_CONTAINING_RECORD(Link, QUIC_BINDING, Link);
21242124

21252125
#ifdef QUIC_COMPARTMENT_ID
2126-
if (CompartmentId != Binding->CompartmentId) {
2126+
if (UdpConfig->CompartmentId != Binding->CompartmentId) {
21272127
continue;
21282128
}
21292129
#endif
@@ -2134,21 +2134,25 @@ QuicLibraryLookupBinding(
21342134
if (Binding->Connected) {
21352135
//
21362136
// For client/connected bindings we need to match on both local and
2137-
// remote addresses/ports.
2137+
// remote addresses/ports, along with transport type (QTIP). We need to match on the 5-tuple,
2138+
// because client connections cannot share the binding if the underlying transport does not match.
21382139
//
21392140
if (RemoteAddress &&
21402141
QuicAddrCompare(LocalAddress, &BindingLocalAddr)) {
21412142
QUIC_ADDR BindingRemoteAddr;
21422143
QuicBindingGetRemoteAddress(Binding, &BindingRemoteAddr);
2143-
if (QuicAddrCompare(RemoteAddress, &BindingRemoteAddr)) {
2144+
if (QuicAddrCompare(RemoteAddress, &BindingRemoteAddr) &&
2145+
QuicBindingGetQtipEnabled(Binding) == EnableQtip) {
21442146
return Binding;
21452147
}
21462148
}
21472149

21482150
} else {
21492151
//
21502152
// For server (unconnected/listening) bindings we always use wildcard
2151-
// addresses, so we simply need to match on local port.
2153+
// addresses, so we simply need to match on the local port. We need not consider the
2154+
// binding QTIP settings because we always disallow listeners with different QTIP settings to
2155+
// share a binding. This is enforced by the caller.
21522156
//
21532157
if (QuicAddrGetPort(&BindingLocalAddr) == QuicAddrGetPort(LocalAddress)) {
21542158
//
@@ -2179,6 +2183,8 @@ QuicLibraryGetBinding(
21792183
UdpConfig->LocalAddress == NULL || QuicAddrGetPort(UdpConfig->LocalAddress) == 0;
21802184
const BOOLEAN ShareBinding = !!(UdpConfig->Flags & CXPLAT_SOCKET_FLAG_SHARE);
21812185
const BOOLEAN ServerOwned = !!(UdpConfig->Flags & CXPLAT_SOCKET_SERVER_OWNED);
2186+
const BOOLEAN EnableQtip = !!(UdpConfig->Flags & CXPLAT_SOCKET_FLAG_QTIP);
2187+
21822188

21832189
#ifdef QUIC_SHARED_EPHEMERAL_WORKAROUND
21842190
//
@@ -2209,17 +2215,12 @@ QuicLibraryGetBinding(
22092215

22102216
Status = QUIC_STATUS_NOT_FOUND;
22112217
CxPlatDispatchLockAcquire(&MsQuicLib.DatapathLock);
2212-
22132218
Binding =
2214-
QuicLibraryLookupBinding(
2215-
#ifdef QUIC_COMPARTMENT_ID
2216-
UdpConfig->CompartmentId,
2217-
#endif
2218-
UdpConfig->LocalAddress,
2219-
UdpConfig->RemoteAddress);
2219+
QuicLibraryLookupBinding(UdpConfig);
22202220
if (Binding != NULL) {
22212221
if (!ShareBinding || Binding->Exclusive ||
2222-
(ServerOwned != Binding->ServerOwned)) {
2222+
(ServerOwned != Binding->ServerOwned) ||
2223+
(!Binding->Connected && QuicBindingGetQtipEnabled(Binding) != EnableQtip)) {
22232224
//
22242225
// The binding does already exist, but cannot be shared with the
22252226
// requested configuration.
@@ -2291,26 +2292,30 @@ QuicLibraryGetBinding(
22912292
// tuple, so we need to do collision detection based on the whole
22922293
// 4-tuple.
22932294
//
2294-
Binding =
2295-
QuicLibraryLookupBinding(
2295+
CXPLAT_UDP_CONFIG Config = {0};
22962296
#ifdef QUIC_COMPARTMENT_ID
2297-
UdpConfig->CompartmentId,
2297+
Config.CompartmentId = UdpConfig->CompartmentId;
22982298
#endif
2299-
&NewLocalAddress,
2300-
UdpConfig->RemoteAddress);
2299+
Config.LocalAddress = &NewLocalAddress;
2300+
Config.RemoteAddress = UdpConfig->RemoteAddress;
2301+
Config.Flags = UdpConfig->Flags;
2302+
Binding =
2303+
QuicLibraryLookupBinding(&Config);
23012304
} else {
23022305
//
23032306
// The datapath does not supports multiple connected sockets on the same
23042307
// local tuple, so we just do collision detection based on the local
23052308
// tuple.
23062309
//
2307-
Binding =
2308-
QuicLibraryLookupBinding(
2310+
CXPLAT_UDP_CONFIG Config = {0};
23092311
#ifdef QUIC_COMPARTMENT_ID
2310-
UdpConfig->CompartmentId,
2312+
Config.CompartmentId = UdpConfig->CompartmentId;
23112313
#endif
2312-
&NewLocalAddress,
2313-
NULL);
2314+
Config.LocalAddress = &NewLocalAddress;
2315+
Config.RemoteAddress = NULL;
2316+
Config.Flags = UdpConfig->Flags;
2317+
Binding =
2318+
QuicLibraryLookupBinding(&Config);
23142319
}
23152320

23162321
if (Binding != NULL) {

0 commit comments

Comments
 (0)