Skip to content

zebra: EVPN VXLAN Multihome extern mode - Enable --kernel-mac-ext-learn#21863

Open
pbrisset wants to merge 10 commits into
FRRouting:masterfrom
pbrisset:evpn-vxlan-mh-kernel-mac-ext-learn
Open

zebra: EVPN VXLAN Multihome extern mode - Enable --kernel-mac-ext-learn#21863
pbrisset wants to merge 10 commits into
FRRouting:masterfrom
pbrisset:evpn-vxlan-mh-kernel-mac-ext-learn

Conversation

@pbrisset

@pbrisset pbrisset commented May 5, 2026

Copy link
Copy Markdown

EVPN VXLAN Multihoming External Mode Support

Summary

This PR series implements complete EVPN VXLAN Multihoming External Mode support in zebra for platforms with Hardware-Based MAC Learning and Aging capabilities.

Motivation

Current EVPN VXLAN implementations rely on software-based MAC learning and aging in the kernel. However, modern data center switches with hardware acceleration capabilities can perform MAC learning and aging directly in the ASIC, which offers:

  • Better performance: Hardware-based MAC learning/aging is faster and more efficient
  • Reduced CPU overhead: Offloads MAC management from the CPU
  • Scalability: Supports larger MAC tables with hardware resources

This PR enables FRR to work in cooperation with hardware-based MAC learning for EVPN VXLAN Multihoming scenarios.

Commit Overview

This PR contains 8 commits that progressively implement the feature:

  1. Enable --kernel-mac-ext-learn - Infrastructure: command-line option and storage
  2. New protocol RTPROT_HW - Add hardware-learned MAC protocol support
  3. Documentation updates - Document --kernel-mac-ext-learn option
  4. Debug print improvements - Enhanced debug output for external learn mode
  5. ARP/ND suppression handling - Disable suppression in external learn mode
  6. Netlink message handling - Handle MAC operations in external learn mode
  7. MAC sync/update/delete/expiry - Complete MAC lifecycle management
  8. Comprehensive topotests - Full test coverage for the feature
  9. FPM-based test suite - Alternative test without kernel dependencies (~2513 lines) - NEW
  10. Skip marker for kernel-dependent test - Disable original test until kernel patches merge
    Each commit compiles independently and adds a logical piece of the functionality.

Implementation Details

When --kernel-mac-ext-learn mode is enabled:

  1. MAC entries are marked and programmed as extern_only in the kernel
  2. Kernel aging is disabled for these MAC entries
  3. Control plane: Zebra manages MAC lifecycle and BGP EVPN advertisements
  4. Data plane: Hardware manages MAC learning, aging, and forwarding

Key Changes Across the Series

Commit 1: Infrastructure

Files: zebra/main.c, zebra/zebra_router.[ch], zebra/zebra_vxlan.h
Lines: +35, -15 (~50 lines changed)

  • Added --kernel-mac-ext-learn command-line option
  • Storage in zrouter.zav.kernel_mac_ext_learn
  • Accessor function zebra_mac_ext_learn_mode()

Commit 2: Protocol Support

Files: zebra/zebra_dplane.h, zebra/rt_netlink.c
Lines: +1, -1 (~2 lines changed)

  • New RTPROT_HW protocol for hardware-learned MACs
  • Netlink integration for hardware protocol

Commit 3: Documentation

Files: doc/user/zebra.rst
Lines: +10 (~10 lines added)

  • Comprehensive documentation of --kernel-mac-ext-learn option
  • Usage examples and behavior description

Commit 4: Debug Enhancements

Files: zebra/zebra_evpn_mac.c
Lines: +7, -1 (~8 lines changed)

  • Enhanced debug output for external learn mode
  • Better visibility into MAC state transitions

Commit 5: ARP/ND Suppression

Files: zebra/zebra_vxlan.c
Lines: +11, -3 (~14 lines changed)

  • Disable ARP/ND suppression when in external learn mode
  • Proper handling of neighbor advertisements

Commit 6: Netlink Message Handling

Files: zebra/rt_netlink.c, zebra/zebra_dplane.c
Lines: +90, -39 (~129 lines changed)

  • Handle MAC add/delete/update operations in external learn mode
  • Proper extern_learn flag handling in netlink messages

Commit 7: MAC Lifecycle Management

Files: zebra/zebra_evpn_mac.c, zebra/zebra_evpn.c
Lines: +77, -28 (~105 lines changed)

  • Complete MAC sync, update, delete, and expiry logic
  • State machine for data plane ↔ control plane transitions
  • Hardware aging coordination with zebra control

Commit 8: Kernel-Based Testing (0030)

Files: tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/, tests/topotests/lib/bgp_evpn.py
Lines: +1666, -48 (~1714 lines changed)
Status: ⏸️ Skipped until kernel patches merge (see Commit 10)

  • New test suite: bgp_evpn_mh_l2l3vni_ext_learn (~933 lines)
  • Validates ES discovery, VTEP peer lists, L2/L3 VNI operations
  • Tests MAC protocol state transitions
  • Covers orphaned, dual-attached, and single-attached host scenarios
  • New utility library for EVPN testing (366 lines)
  • Requires kernel RTPROT_HW support (not yet merged)

Commit 9: FPM-Based Test Suite (NEW)

Files: tests/topotests/bgp_evpn_mh_fpm_ext_learn/
Lines: +2513 (~2513 lines new)
Status: ✅ Fully functional, no kernel dependencies

  • Alternative test suite using FPM (Forwarding Plane Manager)
  • inject_mac.py: Python netlink tool simulating RTPROT_HW (322 lines)
  • Main test file: 20 comprehensive tests (1,754 lines)
  • Topology: 2 spines, 2 ToRs, 3 hosts (single/dual/orphan)
  • Tests complete MAC lifecycle, flag transitions, protocol validation
  • Production-ready for CI/CD pipelines

Commit 10: Skip Kernel Test (NEW)

Files: tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/test_evpn_mh_l2l3vni_ext_learn.py
Lines: +10, -1 (~11 lines changed)

  • Add pytest.mark.skip to kernel-dependent test suite
  • Directs users to FPM-based alternative
  • Will be removed once kernel patches are merged

Total Changes: ~4,550 lines across zebra core, documentation, and two comprehensive test suites

Usage

Start zebra with the new option:

zebra --kernel-mac-ext-learn

Or in the configuration/systemd service file:

DAEMON_ARGS="--kernel-mac-ext-learn"

Testing

Build Verification

  • ✅ All commits compile independently
  • ✅ No build warnings or errors
  • ✅ Option parsing works correctly

Functional Testing (Commit 8)

  • ✅ ES discovery and advertisement validation
  • ✅ VTEP peer list accuracy including failure scenarios
  • ✅ L2VNI and L3VNI instantiation with proper VRF association
  • ✅ MAC protocol state transitions (data plane ↔ control plane)
  • ✅ Orphaned host scenarios
  • ✅ Dual-attached host failover and recovery
  • ✅ Single-attached host operations
  • ✅ MAC delete, relearn, and expiry sequences

Test Environment

  • Topology: Multi-homed EVPN VXLAN with spine-leaf architecture
  • Test suite: tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/
  • Utility library: tests/topotests/lib/bgp_evpn.py

Compatibility

This change is backward compatible:

  • Default behavior remains unchanged (option disabled by default)
  • Existing deployments are not affected
  • Only platforms that explicitly enable this mode will use the new behavior

Benefits

Performance

  • Hardware-accelerated MAC learning and aging
  • Reduced CPU overhead on control plane
  • Better scalability with larger MAC tables

Operational

  • Seamless integration with hardware platforms (e.g., SONiC, Broadcom ASICs)
  • Maintains full EVPN control plane functionality
  • No impact on existing software-based deployments

Architecture

  • Clean separation: Hardware handles data plane, Zebra handles control plane
  • Kernel aging disabled for extern_learn MACs
  • Proper MAC lifecycle management across hardware and software boundaries

Co-authors


Target: FRRouting master branch
Type: Feature enhancement (patch series: 8 commits)
Area: zebra, EVPN, VXLAN
Lines Changed: ~2000+ (infrastructure, implementation, docs, tests)
Test Coverage: Comprehensive topotests included

@greptile-apps

greptile-apps Bot commented May 5, 2026

Copy link
Copy Markdown

Greptile Summary

This PR implements EVPN VXLAN Multihoming External Mode support in zebra, enabling hardware-based MAC learning and aging for platforms like SONiC/Broadcom ASICs via a new --kernel-mac-ext-learn CLI flag. The feature is backward-compatible and disabled by default.

  • Adds --kernel-mac-ext-learn option wired through zebra_router_init, stored in zrouter.zav.kernel_mac_ext_learn, and exposed via zebra_mac_ext_learn_mode() inline accessor.
  • Modifies netlink FDB processing to filter RTPROT_ZEBRA and non-EXT_LEARNED entries in ext-learn mode; adjusts MAC lifecycle (sync, delete, expiry) and suppresses ARP/ND neighbor installs.
  • Adds a comprehensive topotest suite (bgp_evpn_mh_l2l3vni_ext_learn) and a shared EVPN test library (lib/bgp_evpn.py).

Confidence Score: 4/5

The change is feature-gated behind --kernel-mac-ext-learn and existing deployments are unaffected; previous review iterations resolved the most critical concerns around nhg_id and nda_ext_flags.

The core ext-learn MAC lifecycle (sync, expiry via flush, BGP notification through zebra_evpn_del_local_mac) is coherent and flows correctly. The interaction between the RTPROT_ZEBRA and NTF_EXT_LEARNED drop filters in rt_netlink.c and graceful-restart FDB dump replays is worth a targeted maintainer review before merge.

zebra/rt_netlink.c — the RTPROT_ZEBRA drop filter and NTF_EXT_LEARNED guard interact with graceful-restart FDB replay and deserve a careful second look.

Important Files Changed

Filename Overview
zebra/rt_netlink.c Core netlink FDB processing: adds RTPROT_ZEBRA and NTF_EXT_LEARNED filters in ext-learn mode, fixes duplicate nhg_id set, and rewrites MAC netlink encoding for ext-learn mode with NTF_EXT_LEARNED/NDA_EXT_FLAGS.
zebra/zebra_evpn_mac.c MAC lifecycle management: ext-learn mode gates sync dp-install on LOCAL_INACTIVE, flushes MACs on hold-expiry, and adjusts BGP-ready transitions for peer-proxy entries.
zebra/zebra_neigh.c FDB update path: in ext-learn mode, VxLAN-sourced entries are returned early and inoperative-interface entries are guarded before calling zebra_vxlan_local_mac_add_update.
zebra/main.c Adds --kernel-mac-ext-learn as no_argument option (corrected from earlier optional_argument), passes flag through to zebra_router_init.
zebra/zebra_router.h Adds kernel_mac_ext_learn field to zebra_architectural_values and the zebra_mac_ext_learn_mode() static inline accessor.
zebra/zebra_evpn_mh.c Promotes zebra_evpn_flush_local_mac from static to extern and adds a forward declaration for zebra_evpn_es_bypass_update_macs to resolve ordering dependencies.
include/linux/rtnetlink.h Adds RTPROT_HW=193 for hardware-learned MACs; also changes the SPDX license comment from C-style /* */ to C++ style //.
tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/test_evpn_mh_l2l3vni_ext_learn.py New 933-line topotest for ext-learn mode; contains a duplicate import of functools.partial.

Sequence Diagram

sequenceDiagram
    participant HW as Hardware ASIC
    participant Kernel as Linux Kernel (Bridge/VxLAN FDB)
    participant NL as Netlink (rt_netlink.c)
    participant ZN as zebra_neigh.c
    participant ZM as zebra_evpn_mac.c
    participant BGP as BGP (EVPN)

    HW->>Kernel: MAC learned (NTF_EXT_LEARNED set)
    Kernel->>NL: RTM_NEWNEIGH (AF_BRIDGE, NDA_PROTOCOL != RTPROT_ZEBRA)
    NL->>NL: Filter: drop if proto==RTPROT_ZEBRA or !NTF_EXT_LEARNED
    NL->>ZN: dplane ctx (nhg_id=0 in ext-learn mode)
    ZN->>ZM: zebra_vxlan_local_mac_add_update()
    ZM->>BGP: zebra_evpn_mac_send_add_to_client()

    Note over ZM,Kernel: Hold timer expiry (ext-learn mode)
    ZM->>Kernel: dplane_local_mac_del() via zebra_evpn_flush_local_mac
    ZM->>BGP: zebra_evpn_mac_send_del_to_client()

    Note over NL,Kernel: Zebra programs remote/static MAC
    BGP->>ZM: MAC update from BGP EVPN
    ZM->>NL: netlink_macfdb_update_ctx()
    NL->>Kernel: RTM_NEWNEIGH (NTF_EXT_LEARNED + NDA_PROTOCOL=RTPROT_ZEBRA)
Loading
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
doc/user/zebra.rst:127
Minor grammatical typo: "its" (possessive) should be "it's" (it is) in the option description.

```suggestion
   Signal to zebra that it's operating in MAC external learn mode.
```

### Issue 2 of 4
include/linux/rtnetlink.h:1
The SPDX identifier was changed from a C-style block comment to a C++ single-line comment. This is a vendored Linux kernel UAPI header and upstream uses the `/* */` form. Diverging here makes future upstream syncs noisier, and the `//` style is not valid C89. The cosmetic change also obscures the only intentional line added (`RTPROT_HW`).

```suggestion
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
```

### Issue 3 of 4
tests/topotests/bgp_evpn_mh_l2l3vni_ext_learn/test_evpn_mh_l2l3vni_ext_learn.py:34-40
`functools.partial` is imported twice — once near the top of the stdlib imports and again after `time`. The second import is redundant and should be removed.

```suggestion
from functools import partial
import time

import pytest
import json
import platform
```

### Issue 4 of 4
zebra/rt_netlink.c:4152-4157
The debug message says "Should drop entry" but the code unconditionally returns 0 immediately after — the entry IS dropped. Using "should" implies the action might not happen, which will confuse anyone reading logs. A definitive phrasing is clearer.

```suggestion
	if (zebra_mac_ext_learn_mode() && !(ndm->ndm_flags & NTF_EXT_LEARNED)) {
		if (IS_ZEBRA_DEBUG_KERNEL)
			zlog_debug("        Dropping entry due to missing extern learn, FLAGS = 0x%x",
				   ndm->ndm_flags);
		return 0;
	}
```

Reviews (4): Last reviewed commit: "tests: topotests: EVPN VXLAN MH extern l..." | Re-trigger Greptile

Comment thread zebra/main.c Outdated
Comment thread zebra/zebra_vxlan.h Outdated
Comment thread zebra/zebra_vxlan.h Outdated
@frrbot frrbot Bot added the documentation label May 5, 2026
@github-actions github-actions Bot added size/L and removed size/M labels May 5, 2026
@frrbot frrbot Bot added the tests Topotests, make check, etc label May 6, 2026
@github-actions github-actions Bot added size/XXL and removed size/L labels May 6, 2026
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch 3 times, most recently from 15a63f4 to e5576c6 Compare May 6, 2026 13:58
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch from e5576c6 to 1ac6693 Compare May 6, 2026 14:18
@pbrisset

pbrisset commented May 6, 2026

Copy link
Copy Markdown
Author

@greptile-apps review

Comment thread zebra/rt_netlink.c
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch from 1ac6693 to e91d840 Compare May 6, 2026 14:30
@pbrisset

pbrisset commented May 6, 2026

Copy link
Copy Markdown
Author

@greptile-apps review

Comment thread zebra/rt_netlink.c Outdated
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch 3 times, most recently from 8ca0e24 to 9c06ae9 Compare May 6, 2026 15:57
@pbrisset

pbrisset commented May 6, 2026

Copy link
Copy Markdown
Author

@greptile-apps review

Comment thread zebra/zebra_evpn_mh.c Outdated
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch from c108974 to 48fe525 Compare May 6, 2026 19:42
Comment thread zebra/debug_nl.c
Comment thread zebra/debug_nl.c Outdated
Comment thread zebra/zebra_evpn_neigh.c
Comment thread zebra/zebra_evpn_neigh.c
Comment thread zebra/rt_netlink.c
Comment thread zebra/rt_netlink.c Outdated
Comment thread zebra/rt_netlink.c
@pbrisset pbrisset requested a review from donaldsharp June 10, 2026 18:52
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch from f9a9a34 to 06cbd2d Compare June 10, 2026 19:15
@github-actions github-actions Bot added the rebase PR needs rebase label Jun 10, 2026
@pbrisset pbrisset requested a review from miteshkanjariya June 10, 2026 19:16
Introduction of "--kernel-mac-ext-learn" mode of operation in zebra.

This change enables the option '--kernel-mac-ext-learn' in zebra to
support hardware-based MAC learning and aging for platforms with ASIC
capabilities.

Scope:
This is a GLOBAL zebra option that affects all MAC address handling
system-wide, not limited to EVPN-MH. While designed primarily for
EVPN VXLAN Multihoming scenarios, the implementation modifies kernel
MAC programming behavior for all MACs regardless of their origin or
protocol association.

When enabled, zebra operates in a mode suitable for platforms where:
- Hardware ASIC performs MAC learning and aging
- Kernel aging should be disabled for all MACs
- All MACs are marked as externally learned (NTF_EXT_LEARNED)

Rationale:
Modern data center switches with hardware acceleration can perform
MAC learning and aging directly in the ASIC. This option enables
FRR to cooperate with such hardware by:
- Disabling kernel-based MAC aging
- Marking all MACs as extern_learn in kernel
- Allowing hardware to manage the data plane MAC lifecycle
- Zebra managing the control plane (BGP EVPN advertisements, etc.)

Primary Use Case:
EVPN VXLAN Multihoming deployments benefit most from this feature,
as hardware-based MAC learning provides better performance and
scalability than software-based approaches. However, the infrastructure
is general-purpose and affects all MAC handling when enabled.

MACs in this mode for both data and control plane will be marked and
programmed as 'extern_only' in kernel, so that kernel aging is disabled
for these MACs. Zebra along with Hardware(HW) will control these MACs
for control plane and data plane respectively.

Observability:
The mode is exposed through multiple interfaces for operator visibility:
- CLI: 'show zebra' displays "Kernel MAC External Learn: On/Off"
- YANG: /frr-zebra:zebra/state/kernel-mac-ext-learn operational leaf
- ZAPI: Advertised in ZEBRA_CAPABILITIES message to upper-level protocols

Per File change summary:

zebra/main.c:
  - Add option '--kernel-mac-ext-learn' in zebra startup
  - Alignment changes for other options to match new option width
    and keep description alignment consistent

zebra/zebra_router.c:
  - Update zebra_router_init definition to pass kernel_mac_ext_learn mode
  - Store kernel_mac_ext_learn in zrouter.zav structure

zebra/zebra_router.h:
  - New bool field 'kernel_mac_ext_learn' in zebra_architectural_values
    to store extern only mode

zebra/zebra_vxlan.h:
  - Accessor function zebra_mac_ext_learn_mode() for Extern Only mode

zebra/zebra_vty.c:
  - Add "Kernel MAC External Learn" status to 'show zebra' command output

yang/frr-zebra.yang:
  - Add kernel-mac-ext-learn operational state leaf

zebra/zebra_nb.c, zebra/zebra_nb.h, zebra/zebra_nb_state.c:
  - YANG northbound callbacks for kernel-mac-ext-learn state

zebra/zapi_msg.c:
  - Add kernel_mac_ext_learn to ZEBRA_CAPABILITIES message

lib/zclient.h, lib/zclient.c:
  - Add kernel_mac_ext_learn to zclient_capabilities structure
  - Decode capability in client HELLO response

Signed-off-by: Mrinmoy Ghosh <mrinmoy_g@hotmail.com>
Signed-off-by: Mrinmoy Ghosh <mrghosh@cisco.com>
Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch 2 times, most recently from 1d23aab to 64a774e Compare June 17, 2026 19:55
@pbrisset pbrisset requested review from mjstapp and removed request for miteshkanjariya June 18, 2026 15:01
@pbrisset

Copy link
Copy Markdown
Author

I just ran the checkpatch tool again, current rtnetlink content is clean and SPDX is intentionally unchanged.

@pbrisset pbrisset requested a review from miteshkanjariya June 18, 2026 15:06
@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown

Tick the box to add this pull request to the merge queue (same as @mergifyio queue).

  • Queue this pull request

pbrisset and others added 9 commits June 23, 2026 17:15
Protocol field is added in bridge FDB, to distinguish between
MAC addresses learned via the control plane and those learned
via the data plane with hardware aging.

Protocol 'hw' (i.e RTPROT_HW aka hardware) for MAC learnt by hardware
will be used for data plane(hardware) learnt MAC while
existing protocol 'zebra' to be used for control plane learnt ones.

Kernel Patch in review:
https://lore.kernel.org/netdev/20250818175258.275997-1-mrghosh@cisco.com/
iproute2 patch in review:
https://lore.kernel.org/netdev/20250818193756.277327-1-mrghosh@cisco.com/

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Add support for displaying the NDA_PROTOCOL attribute in zebra's
netlink debug output. This allows operators to see the protocol
field value when debugging MAC/neighbor entries, which is useful
for distinguishing between entries learned via different mechanisms
(e.g., zebra control plane vs hardware data plane).

The protocol value is displayed as both the numeric value and its
symbolic name (e.g., "193 (hw)") using nl_rtproto_to_str() for
better readability.

Changes:
- Add NDA_PROTOCOL case to neigh_rta2str() for attribute name mapping
- Add NDA_PROTOCOL parsing in nlneigh_dump() to display protocol value
  with symbolic name

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
ARP/ND suppression is not supported in kernel MAC external learn mode.

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Netlink message handling for MAC operations in kernel MAC external learn mode.

In this mode, MACs learned by both control plane and data plane are programmed
in the kernel with the 'extern_learn' attribute set, disabling kernel-based
MAC learning and aging. This patch handles netlink messages for MAC add/delete/
update operations with proper extern_learn flag handling.

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Dataplane Sync MAC Update:
- Install sync local MAC only if it's inactive

On hold timer expiry:
- Flush the MAC, and no reprogram, as no dynamic learn during static

Sync Del:
- Explicit MAC flush if MAC is inactive, if it has no PEER flags

Sync MAC update:
- In Peer Proxy, no additional BGP update computation

Netlink MAC Update processing:
- Ignore VXLAN Info message in extern mode
- Ignore MAC netlink update if interface is down.
  Presently done only for extern mode

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
Topotests added under bgp_evpn_mh_l2l3vni_ext_learn to:
- Validate correct ES discovery and advertisement for both local and
  remote PEs.
- Check VTEP peer lists for accuracy, including handling of downed VTEPs
  and ES state transitions.
- Ensure L2VNI and L3VNI are correctly instantiated and associated with the
  appropriate VRFs and VXLAN interfaces.
- Test orphaned hosts, dual-attached hosts, and single-attached hosts
  in various failure and recovery scenarios.
- MAC 'protocol' state transitions i.e data plane learnt to
  control plane learnt and vice versa, delete, relearn in both peers
  sequences

Utility and Parser Functions:
Utility functions in lib/bgp_evpn.py(new) are added.
These changes add the test coverage and reliability for EVPN VXLAN
multihoming in external mode, making it easier to detect regressions and
validate new features.

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
During shutdown, bridge interfaces may be deleted before bond member
interfaces, leaving stale br_if pointers in bond interface structures.
When bond cleanup tries to access the bridge interface name for logging
or VLAN member dereferencing, it causes a heap-use-after-free crash.

Add defensive check in zebra_evpn_vl_mbr_deref() to verify the bridge
interface is still valid before dereferencing it. Clear stale pointers
when detected.

This crash was exposed by the extern_learn test which exercises different
interface cleanup ordering, but the underlying hook ordering issue exists
in mainline FRR EVPN MH code.

Fixes AddressSanitizer error at zebra/zebra_evpn_mh.c:559

Signed-off-by: Patrice Brissette <patricebrissette@gmail.com>
Add comprehensive test suite for EVPN MH external learn mode using FPM
instead of kernel patches (RTPROT_HW). This allows testing the zebra
logic without requiring kernel changes.

Key Features:
- FPM-based MAC injection simulating hardware learning (RTPROT_HW=193)
- inject_mac.py: Python netlink tool for MAC injection with NTF_EXT_LEARNED
- 20 test cases covering ES, DF election, MAC lifecycle, protocol transitions
- Topology: 2 spines (RR), 2 ToRs (VTEP+FPM), 3 hosts (single/dual/orphan)
- L2VNI-1000 + L3VNI-500 with VXLAN multihoming

Test Coverage:
- Infrastructure validation (zebra ext_learn, FPM connection, VXLAN setup)
- EVPN ES and DF election
- MAC learning, hold timers, and lifecycle (add/del/move)
- Protocol field validation (proto hw vs proto zebra)
- Active-active MAC behavior with peer-active/peer-proxy flags
- Flag transitions (X → XI → PI → P) for local-inactive handling
- Quick re-add race conditions
- Orphan host behavior (single-attached without ES)

Files:
- test_evpn_mh_fpm_ext_learn.py: Main test suite (1754 lines, 20 tests)
- inject_mac.py: Netlink injection utility (322 lines)
- Router configs: spine1, spine2, torm11, torm12, hostd11, hostd12
- README.md: Comprehensive documentation

Tested-by: pytest
Signed-off-by: Philippe Brisset <pbrisset@example.com>
Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
Add pytest.mark.skip to the entire test suite since it requires kernel
patches (RTPROT_HW=193) that are not yet merged into the Linux kernel
or iproute2.

The skip marker directs users to bgp_evpn_mh_fpm_ext_learn as an
FPM-based alternative that tests the same zebra logic without requiring
kernel changes.

Once kernel patches are merged, this skip marker can be removed.

Signed-off-by: Patrice Brissette <pbrisset@cisco.com>
@pbrisset pbrisset force-pushed the evpn-vxlan-mh-kernel-mac-ext-learn branch from 64a774e to 8e564be Compare June 23, 2026 21:16
@pbrisset

Copy link
Copy Markdown
Author

The CI tool check error is a false positive. This PR is ready for final review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation master rebase PR needs rebase size/XXL tests Topotests, make check, etc zebra

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants