zebra: EVPN VXLAN Multihome extern mode - Enable --kernel-mac-ext-learn#21863
zebra: EVPN VXLAN Multihome extern mode - Enable --kernel-mac-ext-learn#21863pbrisset wants to merge 10 commits into
Conversation
Greptile SummaryThis PR implements EVPN VXLAN Multihoming External Mode support in zebra, enabling hardware-based MAC learning and aging for platforms like SONiC/Broadcom ASICs via a new
Confidence Score: 4/5The change is feature-gated behind --kernel-mac-ext-learn and existing deployments are unaffected; previous review iterations resolved the most critical concerns around nhg_id and nda_ext_flags. The core ext-learn MAC lifecycle (sync, expiry via flush, BGP notification through zebra_evpn_del_local_mac) is coherent and flows correctly. The interaction between the RTPROT_ZEBRA and NTF_EXT_LEARNED drop filters in rt_netlink.c and graceful-restart FDB dump replays is worth a targeted maintainer review before merge. zebra/rt_netlink.c — the RTPROT_ZEBRA drop filter and NTF_EXT_LEARNED guard interact with graceful-restart FDB replay and deserve a careful second look. Important Files Changed
Sequence DiagramsequenceDiagram
participant HW as Hardware ASIC
participant Kernel as Linux Kernel (Bridge/VxLAN FDB)
participant NL as Netlink (rt_netlink.c)
participant ZN as zebra_neigh.c
participant ZM as zebra_evpn_mac.c
participant BGP as BGP (EVPN)
HW->>Kernel: MAC learned (NTF_EXT_LEARNED set)
Kernel->>NL: RTM_NEWNEIGH (AF_BRIDGE, NDA_PROTOCOL != RTPROT_ZEBRA)
NL->>NL: Filter: drop if proto==RTPROT_ZEBRA or !NTF_EXT_LEARNED
NL->>ZN: dplane ctx (nhg_id=0 in ext-learn mode)
ZN->>ZM: zebra_vxlan_local_mac_add_update()
ZM->>BGP: zebra_evpn_mac_send_add_to_client()
Note over ZM,Kernel: Hold timer expiry (ext-learn mode)
ZM->>Kernel: dplane_local_mac_del() via zebra_evpn_flush_local_mac
ZM->>BGP: zebra_evpn_mac_send_del_to_client()
Note over NL,Kernel: Zebra programs remote/static MAC
BGP->>ZM: MAC update from BGP EVPN
ZM->>NL: netlink_macfdb_update_ctx()
NL->>Kernel: RTM_NEWNEIGH (NTF_EXT_LEARNED + NDA_PROTOCOL=RTPROT_ZEBRA)
Prompt To Fix All With AIFix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
doc/user/zebra.rst:127
Minor grammatical typo: "its" (possessive) should be "it's" (it is) in the option description.
```suggestion
Signal to zebra that it's operating in MAC external learn mode.
```
### Issue 2 of 4
include/linux/rtnetlink.h:1
The SPDX identifier was changed from a C-style block comment to a C++ single-line comment. This is a vendored Linux kernel UAPI header and upstream uses the `/* */` form. Diverging here makes future upstream syncs noisier, and the `//` style is not valid C89. The cosmetic change also obscures the only intentional line added (`RTPROT_HW`).
```suggestion
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
```
### Issue 3 of 4
tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/test_evpn_mh_l2l3vni_ext_learn.py:34-40
`functools.partial` is imported twice — once near the top of the stdlib imports and again after `time`. The second import is redundant and should be removed.
```suggestion
from functools import partial
import time
import pytest
import json
import platform
```
### Issue 4 of 4
zebra/rt_netlink.c:4152-4157
The debug message says "Should drop entry" but the code unconditionally returns 0 immediately after — the entry IS dropped. Using "should" implies the action might not happen, which will confuse anyone reading logs. A definitive phrasing is clearer.
```suggestion
if (zebra_mac_ext_learn_mode() && !(ndm->ndm_flags & NTF_EXT_LEARNED)) {
if (IS_ZEBRA_DEBUG_KERNEL)
zlog_debug(" Dropping entry due to missing extern learn, FLAGS = 0x%x",
ndm->ndm_flags);
return 0;
}
```
Reviews (4): Last reviewed commit: "tests: topotests: EVPN VXLAN MH extern l..." | Re-trigger Greptile |
15a63f4 to
e5576c6
Compare
e5576c6 to
1ac6693
Compare
|
@greptile-apps review |
1ac6693 to
e91d840
Compare
|
@greptile-apps review |
8ca0e24 to
9c06ae9
Compare
|
@greptile-apps review |
0d67ee7 to
c108974
Compare
c108974 to
48fe525
Compare
f9a9a34 to
06cbd2d
Compare
Introduction of "--kernel-mac-ext-learn" mode of operation in zebra.
This change enables the option '--kernel-mac-ext-learn' in zebra to
support hardware-based MAC learning and aging for platforms with ASIC
capabilities.
Scope:
This is a GLOBAL zebra option that affects all MAC address handling
system-wide, not limited to EVPN-MH. While designed primarily for
EVPN VXLAN Multihoming scenarios, the implementation modifies kernel
MAC programming behavior for all MACs regardless of their origin or
protocol association.
When enabled, zebra operates in a mode suitable for platforms where:
- Hardware ASIC performs MAC learning and aging
- Kernel aging should be disabled for all MACs
- All MACs are marked as externally learned (NTF_EXT_LEARNED)
Rationale:
Modern data center switches with hardware acceleration can perform
MAC learning and aging directly in the ASIC. This option enables
FRR to cooperate with such hardware by:
- Disabling kernel-based MAC aging
- Marking all MACs as extern_learn in kernel
- Allowing hardware to manage the data plane MAC lifecycle
- Zebra managing the control plane (BGP EVPN advertisements, etc.)
Primary Use Case:
EVPN VXLAN Multihoming deployments benefit most from this feature,
as hardware-based MAC learning provides better performance and
scalability than software-based approaches. However, the infrastructure
is general-purpose and affects all MAC handling when enabled.
MACs in this mode for both data and control plane will be marked and
programmed as 'extern_only' in kernel, so that kernel aging is disabled
for these MACs. Zebra along with Hardware(HW) will control these MACs
for control plane and data plane respectively.
Observability:
The mode is exposed through multiple interfaces for operator visibility:
- CLI: 'show zebra' displays "Kernel MAC External Learn: On/Off"
- YANG: /frr-zebra:zebra/state/kernel-mac-ext-learn operational leaf
- ZAPI: Advertised in ZEBRA_CAPABILITIES message to upper-level protocols
Per File change summary:
zebra/main.c:
- Add option '--kernel-mac-ext-learn' in zebra startup
- Alignment changes for other options to match new option width
and keep description alignment consistent
zebra/zebra_router.c:
- Update zebra_router_init definition to pass kernel_mac_ext_learn mode
- Store kernel_mac_ext_learn in zrouter.zav structure
zebra/zebra_router.h:
- New bool field 'kernel_mac_ext_learn' in zebra_architectural_values
to store extern only mode
zebra/zebra_vxlan.h:
- Accessor function zebra_mac_ext_learn_mode() for Extern Only mode
zebra/zebra_vty.c:
- Add "Kernel MAC External Learn" status to 'show zebra' command output
yang/frr-zebra.yang:
- Add kernel-mac-ext-learn operational state leaf
zebra/zebra_nb.c, zebra/zebra_nb.h, zebra/zebra_nb_state.c:
- YANG northbound callbacks for kernel-mac-ext-learn state
zebra/zapi_msg.c:
- Add kernel_mac_ext_learn to ZEBRA_CAPABILITIES message
lib/zclient.h, lib/zclient.c:
- Add kernel_mac_ext_learn to zclient_capabilities structure
- Decode capability in client HELLO response
Signed-off-by: Mrinmoy Ghosh <mrinmoy_g@hotmail.com>
Signed-off-by: Mrinmoy Ghosh <mrghosh@cisco.com>
Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
1d23aab to
64a774e
Compare
|
I just ran the checkpatch tool again, current rtnetlink content is clean and SPDX is intentionally unchanged. |
|
Tick the box to add this pull request to the merge queue (same as
|
Protocol field is added in bridge FDB, to distinguish between MAC addresses learned via the control plane and those learned via the data plane with hardware aging. Protocol 'hw' (i.e RTPROT_HW aka hardware) for MAC learnt by hardware will be used for data plane(hardware) learnt MAC while existing protocol 'zebra' to be used for control plane learnt ones. Kernel Patch in review: https://lore.kernel.org/netdev/20250818175258.275997-1-mrghosh@cisco.com/ iproute2 patch in review: https://lore.kernel.org/netdev/20250818193756.277327-1-mrghosh@cisco.com/ Signed-off-by: Patrice Brissette <pbrisset@cisco.com> Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Add support for displaying the NDA_PROTOCOL attribute in zebra's netlink debug output. This allows operators to see the protocol field value when debugging MAC/neighbor entries, which is useful for distinguishing between entries learned via different mechanisms (e.g., zebra control plane vs hardware data plane). The protocol value is displayed as both the numeric value and its symbolic name (e.g., "193 (hw)") using nl_rtproto_to_str() for better readability. Changes: - Add NDA_PROTOCOL case to neigh_rta2str() for attribute name mapping - Add NDA_PROTOCOL parsing in nlneigh_dump() to display protocol value with symbolic name Signed-off-by: Patrice Brissette <pbrisset@cisco.com> Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
ARP/ND suppression is not supported in kernel MAC external learn mode. Signed-off-by: Patrice Brissette <pbrisset@cisco.com> Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Netlink message handling for MAC operations in kernel MAC external learn mode. In this mode, MACs learned by both control plane and data plane are programmed in the kernel with the 'extern_learn' attribute set, disabling kernel-based MAC learning and aging. This patch handles netlink messages for MAC add/delete/ update operations with proper extern_learn flag handling. Signed-off-by: Patrice Brissette <pbrisset@cisco.com> Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Dataplane Sync MAC Update: - Install sync local MAC only if it's inactive On hold timer expiry: - Flush the MAC, and no reprogram, as no dynamic learn during static Sync Del: - Explicit MAC flush if MAC is inactive, if it has no PEER flags Sync MAC update: - In Peer Proxy, no additional BGP update computation Netlink MAC Update processing: - Ignore VXLAN Info message in extern mode - Ignore MAC netlink update if interface is down. Presently done only for extern mode Signed-off-by: Patrice Brissette <pbrisset@cisco.com> Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Topotests added under bgp_evpn_mh_l2l3vni_ext_learn to: - Validate correct ES discovery and advertisement for both local and remote PEs. - Check VTEP peer lists for accuracy, including handling of downed VTEPs and ES state transitions. - Ensure L2VNI and L3VNI are correctly instantiated and associated with the appropriate VRFs and VXLAN interfaces. - Test orphaned hosts, dual-attached hosts, and single-attached hosts in various failure and recovery scenarios. - MAC 'protocol' state transitions i.e data plane learnt to control plane learnt and vice versa, delete, relearn in both peers sequences Utility and Parser Functions: Utility functions in lib/bgp_evpn.py(new) are added. These changes add the test coverage and reliability for EVPN VXLAN multihoming in external mode, making it easier to detect regressions and validate new features. Signed-off-by: Patrice Brissette <pbrisset@cisco.com> Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
During shutdown, bridge interfaces may be deleted before bond member interfaces, leaving stale br_if pointers in bond interface structures. When bond cleanup tries to access the bridge interface name for logging or VLAN member dereferencing, it causes a heap-use-after-free crash. Add defensive check in zebra_evpn_vl_mbr_deref() to verify the bridge interface is still valid before dereferencing it. Clear stale pointers when detected. This crash was exposed by the extern_learn test which exercises different interface cleanup ordering, but the underlying hook ordering issue exists in mainline FRR EVPN MH code. Fixes AddressSanitizer error at zebra/zebra_evpn_mh.c:559 Signed-off-by: Patrice Brissette <patricebrissette@gmail.com>
Add comprehensive test suite for EVPN MH external learn mode using FPM instead of kernel patches (RTPROT_HW). This allows testing the zebra logic without requiring kernel changes. Key Features: - FPM-based MAC injection simulating hardware learning (RTPROT_HW=193) - inject_mac.py: Python netlink tool for MAC injection with NTF_EXT_LEARNED - 20 test cases covering ES, DF election, MAC lifecycle, protocol transitions - Topology: 2 spines (RR), 2 ToRs (VTEP+FPM), 3 hosts (single/dual/orphan) - L2VNI-1000 + L3VNI-500 with VXLAN multihoming Test Coverage: - Infrastructure validation (zebra ext_learn, FPM connection, VXLAN setup) - EVPN ES and DF election - MAC learning, hold timers, and lifecycle (add/del/move) - Protocol field validation (proto hw vs proto zebra) - Active-active MAC behavior with peer-active/peer-proxy flags - Flag transitions (X → XI → PI → P) for local-inactive handling - Quick re-add race conditions - Orphan host behavior (single-attached without ES) Files: - test_evpn_mh_fpm_ext_learn.py: Main test suite (1754 lines, 20 tests) - inject_mac.py: Netlink injection utility (322 lines) - Router configs: spine1, spine2, torm11, torm12, hostd11, hostd12 - README.md: Comprehensive documentation Tested-by: pytest Signed-off-by: Philippe Brisset <pbrisset@example.com> Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Add pytest.mark.skip to the entire test suite since it requires kernel patches (RTPROT_HW=193) that are not yet merged into the Linux kernel or iproute2. The skip marker directs users to bgp_evpn_mh_fpm_ext_learn as an FPM-based alternative that tests the same zebra logic without requiring kernel changes. Once kernel patches are merged, this skip marker can be removed. Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
64a774e to
8e564be
Compare
|
The CI tool check error is a false positive. This PR is ready for final review |
EVPN VXLAN Multihoming External Mode Support
Summary
This PR series implements complete EVPN VXLAN Multihoming External Mode support in zebra for platforms with Hardware-Based MAC Learning and Aging capabilities.
Motivation
Current EVPN VXLAN implementations rely on software-based MAC learning and aging in the kernel. However, modern data center switches with hardware acceleration capabilities can perform MAC learning and aging directly in the ASIC, which offers:
This PR enables FRR to work in cooperation with hardware-based MAC learning for EVPN VXLAN Multihoming scenarios.
Commit Overview
This PR contains 8 commits that progressively implement the feature:
Each commit compiles independently and adds a logical piece of the functionality.
Implementation Details
When
--kernel-mac-ext-learnmode is enabled:extern_onlyin the kernelKey Changes Across the Series
Commit 1: Infrastructure
Files:
zebra/main.c,zebra/zebra_router.[ch],zebra/zebra_vxlan.hLines: +35, -15 (~50 lines changed)
--kernel-mac-ext-learncommand-line optionzrouter.zav.kernel_mac_ext_learnzebra_mac_ext_learn_mode()Commit 2: Protocol Support
Files:
zebra/zebra_dplane.h,zebra/rt_netlink.cLines: +1, -1 (~2 lines changed)
RTPROT_HWprotocol for hardware-learned MACsCommit 3: Documentation
Files:
doc/user/zebra.rstLines: +10 (~10 lines added)
--kernel-mac-ext-learnoptionCommit 4: Debug Enhancements
Files:
zebra/zebra_evpn_mac.cLines: +7, -1 (~8 lines changed)
Commit 5: ARP/ND Suppression
Files:
zebra/zebra_vxlan.cLines: +11, -3 (~14 lines changed)
Commit 6: Netlink Message Handling
Files:
zebra/rt_netlink.c,zebra/zebra_dplane.cLines: +90, -39 (~129 lines changed)
extern_learnflag handling in netlink messagesCommit 7: MAC Lifecycle Management
Files:
zebra/zebra_evpn_mac.c,zebra/zebra_evpn.cLines: +77, -28 (~105 lines changed)
Commit 8: Kernel-Based Testing (0030)
Files:
tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/,tests/topotests/lib/bgp_evpn.pyLines: +1666, -48 (~1714 lines changed)
Status: ⏸️ Skipped until kernel patches merge (see Commit 10)
bgp_evpn_mh_l2l3vni_ext_learn(~933 lines)Commit 9: FPM-Based Test Suite (NEW)
Files:
tests/topotests/bgp_evpn_mh_fpm_ext_learn/Lines: +2513 (~2513 lines new)
Status: ✅ Fully functional, no kernel dependencies
Commit 10: Skip Kernel Test (NEW)
Files:
tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/test_evpn_mh_l2l3vni_ext_learn.pyLines: +10, -1 (~11 lines changed)
Total Changes: ~4,550 lines across zebra core, documentation, and two comprehensive test suites
Usage
Start zebra with the new option:
Or in the configuration/systemd service file:
Testing
Build Verification
Functional Testing (Commit 8)
Test Environment
tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/tests/topotests/lib/bgp_evpn.pyCompatibility
This change is backward compatible:
Benefits
Performance
Operational
Architecture
Co-authors
Target: FRRouting master branch
Type: Feature enhancement (patch series: 8 commits)
Area: zebra, EVPN, VXLAN
Lines Changed: ~2000+ (infrastructure, implementation, docs, tests)
Test Coverage: Comprehensive topotests included