zebra: defer EVPN DAD-freeze delete until dplane kernel read completes#22211
zebra: defer EVPN DAD-freeze delete until dplane kernel read completes#22211Manpreet-k0 wants to merge 1 commit into
Conversation
Greptile SummaryThis PR fixes EVPN duplicate-address-detection (DAD) freeze behaviour by converting the synchronous kernel peek used during frozen-entry remote MAC/IP withdrawal into a properly asynchronous dplane read. The dplane context is tagged with a reason code (
Confidence Score: 5/5The change is safe to merge. The async deferral logic is well-guarded by sequence numbers and pending flags that are cleared in all relevant MAC/neighbor state transitions. The core async mechanism — tagging reads with a reason/sequence and making the keep-vs-delete decision after the kernel reply arrives — is implemented correctly. Stale completions are rejected via sequence-number matching; all MAC AUTO-transition paths explicitly clear the pending flag to prevent late completions from acting on changed entries. The two findings are limited to a defensive early-return path in zebra_evpn_dad_neigh_read_complete that is unreachable given current call sites, and a mildly misleading debug log when no SVI interface is available. zebra/zebra_evpn_neigh.c — the mac_valid early-return and the unconditional debug log both live here and are worth a second look before merging. Important Files Changed
Sequence DiagramsequenceDiagram
participant BGP
participant zebra_evpn as zebra_evpn.c
participant zebra_evpn_neigh as zebra_evpn_neigh.c
participant dplane as zebra_dplane.c
participant kernel as Kernel (netlink)
participant rib as zebra_rib.c
BGP->>zebra_evpn: zebra_evpn_rem_macip_del()
alt MAC-only, frozen REMOTE+DUPLICATE
zebra_evpn->>zebra_evpn: "SET ZEBRA_MAC_DAD_READ_PENDING, dad_read_seq=next_seq"
zebra_evpn->>dplane: "dplane_fdb_read_specific_mac() [reason=EVPN_DAD_MAC, seq, flags]"
dplane->>kernel: FDB read request
zebra_evpn-->>BGP: return (deferred)
kernel-->>dplane: FDB reply
dplane->>rib: DPLANE_OP_FDB_READ result ctx
rib->>zebra_evpn: zebra_evpn_dplane_read_complete()
zebra_evpn->>zebra_evpn: zebra_evpn_dad_mac_read_process() [check PENDING+seq]
alt MAC became LOCAL
zebra_evpn->>zebra_evpn: suppress delete (optionally sync_mac_del)
else MAC still REMOTE
zebra_evpn->>zebra_evpn: zebra_evpn_rem_mac_del()
end
else MACIP, frozen REMOTE+DUPLICATE neighbor
zebra_evpn->>zebra_evpn_neigh: zebra_evpn_neigh_remote_uninstall()
zebra_evpn_neigh->>zebra_evpn_neigh: "SET ZEBRA_NEIGH_DAD_READ_PENDING, dad_read_seq=next_seq"
zebra_evpn_neigh->>dplane: "dplane_neigh_read_specific_ip() [reason=EVPN_DAD_NEIGH, seq, vni, mac]"
dplane->>kernel: Neigh read request
zebra_evpn_neigh-->>BGP: return (deferred)
kernel-->>dplane: Neigh reply
dplane->>rib: DPLANE_OP_NEIGH_READ result ctx
rib->>zebra_evpn: zebra_evpn_dplane_read_complete()
zebra_evpn->>zebra_evpn_neigh: zebra_evpn_dad_neigh_read_complete()
zebra_evpn_neigh->>zebra_evpn_neigh: zebra_evpn_dad_neigh_read_process() [check PENDING+seq+MAC]
alt Neighbor became LOCAL
zebra_evpn_neigh->>zebra_evpn_neigh: suppress delete, sync_neigh_del
else Neighbor still REMOTE
zebra_evpn_neigh->>zebra_evpn_neigh: neigh_uninstall + neigh_del + deref_ip2mac
end
end
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
zebra/zebra_evpn_neigh.c:2333-2334
**Early return leaves `ZEBRA_NEIGH_DAD_READ_PENDING` set on the neighbor**
Returning here before calling `zebra_evpn_dad_neigh_read_process` skips the only code that clears `ZEBRA_NEIGH_DAD_READ_PENDING` on this neighbor. Although `mac_valid` is always `true` for a `ZEBRA_DPLANE_READ_EVPN_DAD_NEIGH` context today (the MAC address is always passed at enqueue time), if that invariant is ever broken the neighbor is left permanently stuck in the pending state — subsequent remote updates will clear it via `zebra_evpn_neigh_remote_macip_add`, but a removal-only sequence would leave the entry wedged. Consider clearing the pending state before returning, or moving the `mac_valid` check so a missing MAC is treated as "no local entry found" (delete the neighbor) rather than an abort.
### Issue 2 of 2
zebra/zebra_evpn_neigh.c:2231-2233
**Debug message always says "queue dplane read" even when no read will be queued**
The log is emitted before the `if (vlan_if)` guard, so when `zevpn_map_to_svi` returns `NULL` the message claims a dplane read is being queued while in reality the function falls through and deletes the neighbor synchronously. The log prints `"Unknown"` for the interface name, which may draw attention during debugging, but the action description itself is misleading. Moving the log (or splitting into two messages) to reflect the actual outcome would make traces easier to follow.
Reviews (3): Last reviewed commit: "zebra: defer EVPN DAD-freeze delete unti..." | Re-trigger Greptile |
5a86111 to
19315a5
Compare
|
@greptileai review |
|
ci:rerun |
|
@mjstapp and @chiragshah6 Can you plz review it |
|
Its good to have topotest for basic MAC mobility capturing the remote neigh state post freeze |
|
Tick the box to add this pull request to the merge queue (same as
|
19315a5 to
ad8ca89
Compare
|
@chiragshah6 Added the topotest |
When duplicate-address-detection (DAD) freeze is enabled and a frozen
(DUPLICATE + REMOTE) MAC or neighbor receives a remote MACIP withdrawal,
zebra is supposed to first read the kernel for a local copy of the
entry: if one exists the frozen remote entry is kept and re-originated,
otherwise it is removed.
The kernel peek used to be a synchronous netlink read whose reply was
applied inline, so the keep-vs-delete decision that immediately followed
could observe the updated state. Once EVPN FDB and neighbor reads were
routed through the dataplane the read became asynchronous: the reply is
now applied later via a dplane result context, so the inline decision
ran before the kernel answer was processed and the peek no longer had
any effect.
Restructure the DAD-freeze delete path to be asynchronous:
- Tag the dplane read with a reason (ZEBRA_DPLANE_READ_EVPN_DAD_MAC /
_NEIGH) and a sequence number, carried on the read context along with
the VNI and (for neighbors) the expected MAC.
- Mark the MAC/neighbor ZEBRA_*_DAD_READ_PENDING, record the sequence,
and return instead of deleting inline.
- Make the keep-vs-delete decision in the read completion
(zebra_evpn_dplane_read_complete), after the kernel reply has been
applied: if the entry became local suppress the delete, otherwise
remove the remote entry.
- Guard the completion with the pending flag and sequence so a stale or
superseded read result is ignored, and clear the pending state when a
fresh remote update arrives or the entry is demoted to AUTO, so a late
completion never acts on a changed entry.
This restores the documented DAD-freeze behaviour with the
dataplane-based reads.
Testing:
- With fix
tests/topotests/bgp_evpn_dad_native_topo1/test_bgp_evpn_dad_native.py::test_dad_remote_mac_del_keeps_local_kernel_mac
Result: PASS
10x consistency run: 10/10 passed
Each run reported: 1 passed, 2 warnings
- Without fix
tests/topotests/bgp_evpn_dad_native_topo1/test_bgp_evpn_dad_native.py::test_dad_remote_mac_del_keeps_local_kernel_mac
Result: FAIL
Summary: 1 failed, 2 warnings in 64.45s
Failure:
AssertionError: tor-11 expected EVPN neighbor 60.1.1.100, got {}
at _wait_evpn_neigh_state(tor11, TEST_IP4, "local", LOCAL_MAC, False)
Signed-off-by: Manpreet Kaur <manpreetk@nvidia.com>
ad8ca89 to
1a82c76
Compare
When duplicate-address-detection (DAD) freeze is enabled and a frozen (DUPLICATE + REMOTE) MAC or neighbor receives a remote MACIP withdrawal, zebra is supposed to first read the kernel for a local copy of the entry: if one exists the frozen remote entry is kept and re-originated, otherwise it is removed.
The kernel peek used to be a synchronous netlink read whose reply was applied inline, so the keep-vs-delete decision that immediately followed could observe the updated state. Once EVPN FDB and neighbor reads were routed through the dataplane the read became asynchronous: the reply is now applied later via a dplane result context, so the inline decision ran before the kernel answer was processed and the peek no longer had any effect.
Restructure the DAD-freeze delete path to be asynchronous:
This restores the documented DAD-freeze behaviour with the dataplane-based reads.