Skip to content

zebra: defer EVPN DAD-freeze delete until dplane kernel read completes#22211

Open
Manpreet-k0 wants to merge 1 commit into
FRRouting:masterfrom
Manpreet-k0:__manpreetk/evpn-dad-async
Open

zebra: defer EVPN DAD-freeze delete until dplane kernel read completes#22211
Manpreet-k0 wants to merge 1 commit into
FRRouting:masterfrom
Manpreet-k0:__manpreetk/evpn-dad-async

Conversation

@Manpreet-k0

Copy link
Copy Markdown
Contributor

When duplicate-address-detection (DAD) freeze is enabled and a frozen (DUPLICATE + REMOTE) MAC or neighbor receives a remote MACIP withdrawal, zebra is supposed to first read the kernel for a local copy of the entry: if one exists the frozen remote entry is kept and re-originated, otherwise it is removed.

The kernel peek used to be a synchronous netlink read whose reply was applied inline, so the keep-vs-delete decision that immediately followed could observe the updated state. Once EVPN FDB and neighbor reads were routed through the dataplane the read became asynchronous: the reply is now applied later via a dplane result context, so the inline decision ran before the kernel answer was processed and the peek no longer had any effect.

Restructure the DAD-freeze delete path to be asynchronous:

  • Tag the dplane read with a reason (ZEBRA_DPLANE_READ_EVPN_DAD_MAC / _NEIGH) and a sequence number, carried on the read context along with the VNI and (for neighbors) the expected MAC.
  • Mark the MAC/neighbor ZEBRA_*_DAD_READ_PENDING, record the sequence, and return instead of deleting inline.
  • Make the keep-vs-delete decision in the read completion (zebra_evpn_dplane_read_complete), after the kernel reply has been applied: if the entry became local suppress the delete, otherwise remove the remote entry.
  • Guard the completion with the pending flag and sequence so a stale or superseded read result is ignored, and clear the pending state when a fresh remote update arrives or the entry is demoted to AUTO, so a late completion never acts on a changed entry.

This restores the documented DAD-freeze behaviour with the dataplane-based reads.

@frrbot frrbot Bot added the zebra label Jun 4, 2026
@Manpreet-k0 Manpreet-k0 marked this pull request as draft June 4, 2026 10:03
@greptile-apps

greptile-apps Bot commented Jun 4, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes EVPN duplicate-address-detection (DAD) freeze behaviour by converting the synchronous kernel peek used during frozen-entry remote MAC/IP withdrawal into a properly asynchronous dplane read. The dplane context is tagged with a reason code (ZEBRA_DPLANE_READ_EVPN_DAD_MAC / _NEIGH), a sequence number, and ancillary VNI/MAC information; the MAC or neighbor is marked DAD_READ_PENDING; and the keep-vs-delete decision is deferred to zebra_evpn_dplane_read_complete, which fires only after the kernel reply has been applied.

  • Adds zd_read_reason, zd_read_seq, zd_read_flags to struct zebra_dplane_ctx and corresponding accessors; extends dplane_fdb_read_specific_mac and dplane_neigh_read_specific_ip signatures for DAD use.
  • Introduces ZEBRA_MAC_DAD_READ_PENDING / ZEBRA_NEIGH_DAD_READ_PENDING flags with sequence guards and clears them in every MAC AUTO-transition and fresh-remote-update path to prevent stale completions from acting on changed entries.
  • Also fixes a pre-existing typo (ZEBRA_NEIGH_REMOTEZEBRA_MAC_REMOTE) in the MAC-only delete fallthrough in zebra_evpn_rem_macip_del.

Confidence Score: 5/5

The change is safe to merge. The async deferral logic is well-guarded by sequence numbers and pending flags that are cleared in all relevant MAC/neighbor state transitions.

The core async mechanism — tagging reads with a reason/sequence and making the keep-vs-delete decision after the kernel reply arrives — is implemented correctly. Stale completions are rejected via sequence-number matching; all MAC AUTO-transition paths explicitly clear the pending flag to prevent late completions from acting on changed entries. The two findings are limited to a defensive early-return path in zebra_evpn_dad_neigh_read_complete that is unreachable given current call sites, and a mildly misleading debug log when no SVI interface is available.

zebra/zebra_evpn_neigh.c — the mac_valid early-return and the unconditional debug log both live here and are worth a second look before merging.

Important Files Changed

Filename Overview
zebra/zebra_dplane.c Extends dplane context with zd_read_reason, zd_read_seq, zd_read_flags and adds VNI/MAC fields to dplane_neigh_read_info; updates enqueue functions to propagate these values; existing callers pass ZEBRA_DPLANE_READ_NONE / zero defaults.
zebra/zebra_dplane.h Introduces zebra_dplane_read_reason enum and ZEBRA_DPLANE_READ_FLAG_EVPN_SYNC_DEL flag; extends dplane_fdb_read_specific_mac and dplane_neigh_read_specific_ip signatures with new reason/seq/flags/vni/mac parameters.
zebra/zebra_evpn.c Core change: zebra_evpn_rem_macip_del now tags and defers the MAC delete via dplane read; adds zebra_evpn_dad_mac_read_process, zebra_evpn_dad_mac_read_complete, and dispatch entry-point zebra_evpn_dplane_read_complete. Also fixes pre-existing bug: ZEBRA_NEIGH_REMOTE → ZEBRA_MAC_REMOTE in the MAC fallthrough path.
zebra/zebra_evpn_mac.c Adds ZEBRA_MAC_DAD_READ_PENDING flag to the debug dump; calls zebra_evpn_mac_clear_dad_read_pending in all MAC AUTO-transition paths to prevent stale completions from firing on a changed entry.
zebra/zebra_evpn_neigh.c Implements async neigh DAD-freeze delete; two issues: the mac_valid guard returns without clearing ZEBRA_NEIGH_DAD_READ_PENDING, and the 'queue dplane read' log is printed even when vlan_if is NULL.
zebra/zebra_evpn_mac.h Defines ZEBRA_MAC_DAD_READ_PENDING (0x2000), adds dad_read_seq field to struct zebra_mac, and introduces the zebra_evpn_mac_clear_dad_read_pending inline helper.
zebra/zebra_evpn_neigh.h Adds ZEBRA_NEIGH_DAD_READ_PENDING (0x400), dad_read_seq field, zebra_evpn_neigh_clear_dad_read_pending helper, and exposes zebra_evpn_dad_neigh_read_complete; extends the flags display legend.
zebra/zebra_rib.c Splits DPLANE_OP_FDB_READ/DPLANE_OP_NEIGH_READ from the TC_QDISC_READ fall-through and dispatches them to zebra_evpn_dplane_read_complete; reads with reason NONE are handled as no-ops.
zebra/zebra_vxlan.c Adds zebra_evpn_mac_clear_dad_read_pending call in zebra_vxlan_check_del_local_mac when the MAC is demoted from local, preventing a stale DAD read from acting on the updated entry.

Sequence Diagram

sequenceDiagram
    participant BGP
    participant zebra_evpn as zebra_evpn.c
    participant zebra_evpn_neigh as zebra_evpn_neigh.c
    participant dplane as zebra_dplane.c
    participant kernel as Kernel (netlink)
    participant rib as zebra_rib.c

    BGP->>zebra_evpn: zebra_evpn_rem_macip_del()
    alt MAC-only, frozen REMOTE+DUPLICATE
        zebra_evpn->>zebra_evpn: "SET ZEBRA_MAC_DAD_READ_PENDING, dad_read_seq=next_seq"
        zebra_evpn->>dplane: "dplane_fdb_read_specific_mac() [reason=EVPN_DAD_MAC, seq, flags]"
        dplane->>kernel: FDB read request
        zebra_evpn-->>BGP: return (deferred)
        kernel-->>dplane: FDB reply
        dplane->>rib: DPLANE_OP_FDB_READ result ctx
        rib->>zebra_evpn: zebra_evpn_dplane_read_complete()
        zebra_evpn->>zebra_evpn: zebra_evpn_dad_mac_read_process() [check PENDING+seq]
        alt MAC became LOCAL
            zebra_evpn->>zebra_evpn: suppress delete (optionally sync_mac_del)
        else MAC still REMOTE
            zebra_evpn->>zebra_evpn: zebra_evpn_rem_mac_del()
        end
    else MACIP, frozen REMOTE+DUPLICATE neighbor
        zebra_evpn->>zebra_evpn_neigh: zebra_evpn_neigh_remote_uninstall()
        zebra_evpn_neigh->>zebra_evpn_neigh: "SET ZEBRA_NEIGH_DAD_READ_PENDING, dad_read_seq=next_seq"
        zebra_evpn_neigh->>dplane: "dplane_neigh_read_specific_ip() [reason=EVPN_DAD_NEIGH, seq, vni, mac]"
        dplane->>kernel: Neigh read request
        zebra_evpn_neigh-->>BGP: return (deferred)
        kernel-->>dplane: Neigh reply
        dplane->>rib: DPLANE_OP_NEIGH_READ result ctx
        rib->>zebra_evpn: zebra_evpn_dplane_read_complete()
        zebra_evpn->>zebra_evpn_neigh: zebra_evpn_dad_neigh_read_complete()
        zebra_evpn_neigh->>zebra_evpn_neigh: zebra_evpn_dad_neigh_read_process() [check PENDING+seq+MAC]
        alt Neighbor became LOCAL
            zebra_evpn_neigh->>zebra_evpn_neigh: suppress delete, sync_neigh_del
        else Neighbor still REMOTE
            zebra_evpn_neigh->>zebra_evpn_neigh: neigh_uninstall + neigh_del + deref_ip2mac
        end
    end
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
zebra/zebra_evpn_neigh.c:2333-2334
**Early return leaves `ZEBRA_NEIGH_DAD_READ_PENDING` set on the neighbor**

Returning here before calling `zebra_evpn_dad_neigh_read_process` skips the only code that clears `ZEBRA_NEIGH_DAD_READ_PENDING` on this neighbor. Although `mac_valid` is always `true` for a `ZEBRA_DPLANE_READ_EVPN_DAD_NEIGH` context today (the MAC address is always passed at enqueue time), if that invariant is ever broken the neighbor is left permanently stuck in the pending state — subsequent remote updates will clear it via `zebra_evpn_neigh_remote_macip_add`, but a removal-only sequence would leave the entry wedged. Consider clearing the pending state before returning, or moving the `mac_valid` check so a missing MAC is treated as "no local entry found" (delete the neighbor) rather than an abort.

### Issue 2 of 2
zebra/zebra_evpn_neigh.c:2231-2233
**Debug message always says "queue dplane read" even when no read will be queued**

The log is emitted before the `if (vlan_if)` guard, so when `zevpn_map_to_svi` returns `NULL` the message claims a dplane read is being queued while in reality the function falls through and deletes the neighbor synchronously. The log prints `"Unknown"` for the interface name, which may draw attention during debugging, but the action description itself is misleading. Moving the log (or splitting into two messages) to reflect the actual outcome would make traces easier to follow.

Reviews (3): Last reviewed commit: "zebra: defer EVPN DAD-freeze delete unti..." | Re-trigger Greptile

Comment thread zebra/zebra_evpn.c
Comment thread zebra/zebra_evpn_neigh.c
Comment thread zebra/zebra_evpn_neigh.c
@Manpreet-k0 Manpreet-k0 force-pushed the __manpreetk/evpn-dad-async branch from 5a86111 to 19315a5 Compare June 4, 2026 11:02
@Manpreet-k0

Copy link
Copy Markdown
Contributor Author

@greptileai review

@Manpreet-k0 Manpreet-k0 marked this pull request as ready for review June 4, 2026 11:19
@Manpreet-k0

Copy link
Copy Markdown
Contributor Author

ci:rerun

@Manpreet-k0

Manpreet-k0 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@mjstapp and @chiragshah6 Can you plz review it

@chiragshah6

Copy link
Copy Markdown
Member

Its good to have topotest for basic MAC mobility capturing the remote neigh state post freeze

@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

@Manpreet-k0 Manpreet-k0 force-pushed the __manpreetk/evpn-dad-async branch from 19315a5 to ad8ca89 Compare June 24, 2026 06:54
@github-actions github-actions Bot added size/XXL rebase PR needs rebase and removed size/L labels Jun 24, 2026
@Manpreet-k0

Copy link
Copy Markdown
Contributor Author

@chiragshah6 Added the topotest

When duplicate-address-detection (DAD) freeze is enabled and a frozen
(DUPLICATE + REMOTE) MAC or neighbor receives a remote MACIP withdrawal,
zebra is supposed to first read the kernel for a local copy of the
entry: if one exists the frozen remote entry is kept and re-originated,
otherwise it is removed.

The kernel peek used to be a synchronous netlink read whose reply was
applied inline, so the keep-vs-delete decision that immediately followed
could observe the updated state. Once EVPN FDB and neighbor reads were
routed through the dataplane the read became asynchronous: the reply is
now applied later via a dplane result context, so the inline decision
ran before the kernel answer was processed and the peek no longer had
any effect.

Restructure the DAD-freeze delete path to be asynchronous:

 - Tag the dplane read with a reason (ZEBRA_DPLANE_READ_EVPN_DAD_MAC /
   _NEIGH) and a sequence number, carried on the read context along with
   the VNI and (for neighbors) the expected MAC.
 - Mark the MAC/neighbor ZEBRA_*_DAD_READ_PENDING, record the sequence,
   and return instead of deleting inline.
 - Make the keep-vs-delete decision in the read completion
   (zebra_evpn_dplane_read_complete), after the kernel reply has been
   applied: if the entry became local suppress the delete, otherwise
   remove the remote entry.
 - Guard the completion with the pending flag and sequence so a stale or
   superseded read result is ignored, and clear the pending state when a
   fresh remote update arrives or the entry is demoted to AUTO, so a late
   completion never acts on a changed entry.

This restores the documented DAD-freeze behaviour with the
dataplane-based reads.

Testing:
  - With fix
    tests/topotests/bgp_evpn_dad_native_topo1/test_bgp_evpn_dad_native.py::test_dad_remote_mac_del_keeps_local_kernel_mac

    Result: PASS
    10x consistency run: 10/10 passed
    Each run reported: 1 passed, 2 warnings

  - Without fix
    tests/topotests/bgp_evpn_dad_native_topo1/test_bgp_evpn_dad_native.py::test_dad_remote_mac_del_keeps_local_kernel_mac

    Result: FAIL
    Summary: 1 failed, 2 warnings in 64.45s
    Failure:
      AssertionError: tor-11 expected EVPN neighbor 60.1.1.100, got {}
      at _wait_evpn_neigh_state(tor11, TEST_IP4, "local", LOCAL_MAC, False)

Signed-off-by: Manpreet Kaur <manpreetk@nvidia.com>
@Manpreet-k0 Manpreet-k0 force-pushed the __manpreetk/evpn-dad-async branch from ad8ca89 to 1a82c76 Compare June 24, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants