Skip to content

zebra: skip inactive nexthop when building NHG for kernel install#22133

Closed
GaladrielZhao wants to merge 2 commits into
FRRouting:masterfrom
GaladrielZhao:inactive_path_skip_kernel
Closed

zebra: skip inactive nexthop when building NHG for kernel install#22133
GaladrielZhao wants to merge 2 commits into
FRRouting:masterfrom
GaladrielZhao:inactive_path_skip_kernel

Conversation

@GaladrielZhao

Copy link
Copy Markdown
Contributor

When a nexthop in an NHG's depends tree becomes inactive (e.g., due to
recursive resolution failure), its singleton NHG is never programmed
into the kernel. However, zebra_nhg_nhe2grp_internal() still references that
NHE's ID in the nh_grp array sent to the kernel, which fails the entire
NHG install and cascades into route install failures for every route
referencing the group.

Add a NEXTHOP_FLAG_ACTIVE check before appending an NHE from the depends
tree to the dataplane install group, so that inactive singletons are
skipped rather than poisoning the whole NHG programming.

@frrbot frrbot Bot added tests Topotests, make check, etc zebra labels Jun 1, 2026
@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a kernel NHG install failure that occurs when a recursive nexthop in an NHG's depends tree becomes inactive: zebra_nhg_nhe2grp_internal was still including the inactive singleton's ID in the dataplane group array, causing EINVAL from the kernel and cascading route install failures. A NEXTHOP_FLAG_ACTIVE check is added to skip inactive entries before they are appended, accompanied by a new topotest that removes a resolving /32 route to trigger the scenario.

  • zebra/zebra_nhg.c: Inserts an active-flag guard in zebra_nhg_nhe2grp_internal between the recursive-resolution step and the depends_is_empty branch; placed before the group-within-group recursion path, so it tests only the head nexthop of any sub-group encountered.
  • tests/topotests/zebra_nhg_inactive_skip/: New topotest topology with a two-path recursive ECMP static route; exercises initial install, one-nexthop degradation, and full restoration.

Confidence Score: 4/5

The core fix correctly prevents inactive singleton NHEs from poisoning the dataplane group array; the main concern is that the check is placed one branch too early relative to the group-within-group recursion path.

The fix targets singleton NHEs (the common and currently-only real case), where checking nhg.nexthop->flags is correct. Positioning the guard before the depends_is_empty branch means that if a sub-group were ever passed as a depend, only its head nexthop would be tested — potentially dropping an entire sub-group whose head happens to be inactive but whose other members are still active. The existing code comment acknowledges group-within-group is not used today, so this is not an immediate regression, but the placement leaves a subtle trap for future changes.

zebra/zebra_nhg.c — specifically the placement of the new active check relative to the depends_is_empty branch.

Important Files Changed

Filename Overview
zebra/zebra_nhg.c Adds NEXTHOP_FLAG_ACTIVE guard in zebra_nhg_nhe2grp_internal; check is placed before the depends_is_empty branch so it also applies to sub-groups, where it tests only the head nexthop of the group.
tests/topotests/zebra_nhg_inactive_skip/test_zebra_nhg_inactive_skip.py New topotest covering the inactive-nexthop scenario; the log-file assertion has a fragile hard-coded path that silently passes when the file is absent.
tests/topotests/zebra_nhg_inactive_skip/r1/frr.conf Minimal router config setting up two interfaces and a recursive ECMP route for the new topotest; no issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[zebra_nhg_nhe2grp_internal called] --> B[Iterate nhg_depends tree]
    B --> C{NEXTHOP_GROUP_RECURSIVE?}
    C -- Yes --> D[zebra_nhg_resolve]
    D --> E{resolve succeeded?}
    E -- No --> F[log error, continue]
    C -- No --> G
    E -- Yes --> G
    G{NEW: depend->nhg.nexthop && !NEXTHOP_FLAG_ACTIVE?}
    G -- Yes --> H[log debug, skip - continue]
    G -- No --> I{depends_is_empty?}
    I -- No --> J[Group within group: recurse into sub-group]
    J --> B
    I -- Yes --> K{NEXTHOP_GROUP_VALID?}
    K -- No --> L[log, skip - continue]
    K -- Yes --> M{INSTALLED or QUEUED?}
    M -- No --> N[log, skip - continue]
    M -- Yes --> O{Duplicate ID?}
    O -- Yes --> P[skip - continue]
    O -- No --> Q[Append depend ID to nh_grp array]
    Q --> B
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
zebra/zebra_nhg.c:3325-3334
**Active check fires on sub-groups, tests only the head nexthop**

The new guard is placed before the `depends_is_empty` branch, so it also runs when `depend` is a sub-group (group-within-group path). In that case `depend->nhg.nexthop` is the head of the linked list; if that head nexthop is inactive while the remaining nexthops in the sub-group are still active, the entire sub-group is silently skipped instead of letting the recursive call process each member individually. Moving the check inside the `else` branch (singleton case, where `nhg.nexthop` is always a single entry) limits its effect to the intended target and lets the recursion handle partial sub-groups correctly.

### Issue 2 of 2
tests/topotests/zebra_nhg_inactive_skip/test_zebra_nhg_inactive_skip.py:156-160
**Log-file assertion silently passes when the file doesn't exist**

`r1.net.cmd("grep -c ... /tmp/r1/zebra.log 2>/dev/null || echo 0")` returns `"0"` both when there are no matching log lines AND when the file is absent (or zebra wasn't configured to write to that path). If the log file never materialises the `assert error_count == 0` always passes, making this step a no-op. Consider verifying the file exists before grepping, or use the topotest router log helpers that expose the actual daemon log path rather than hard-coding `/tmp/r1/zebra.log`.

Reviews (1): Last reviewed commit: "tests: add topotest for NHG inactive nex..." | Re-trigger Greptile

Comment thread zebra/zebra_nhg.c Outdated
When a nexthop in an NHG's depends tree becomes inactive (e.g., due to
recursive resolution failure), its singleton NHG is never programmed
into the kernel. However, zebra_nhg_nhe2grp_internal() still references that
NHE's ID in the nh_grp array sent to the kernel, which fails the entire
NHG install and cascades into route install failures for every route
referencing the group.

Add a NEXTHOP_FLAG_ACTIVE check before appending an NHE from the depends
tree to the dataplane install group, so that inactive singletons are
skipped rather than poisoning the whole NHG programming.

Signed-off-by: Yuqing Zhao <galadriel.zyq@alibaba-inc.com>
@GaladrielZhao GaladrielZhao force-pushed the inactive_path_skip_kernel branch from dab5a79 to 50d16c7 Compare June 1, 2026 15:14
@riw777 riw777 added the bugfix label Jun 1, 2026
@riw777

riw777 commented Jun 1, 2026

Copy link
Copy Markdown
Member

@donaldsharp does this overlap with some of your existing nhg work?

Add a topotest that verifies zebra correctly skips inactive nexthops
when building a nexthop group for kernel installation.  The test uses
recursive static routes so that removing a resolving route makes one
recursive nexthop unresolvable while the NHE persists in the group's
depends tree.

Signed-off-by: Yuqing Zhao <yuqing.zyq@alibaba-inc.com>
@GaladrielZhao GaladrielZhao force-pushed the inactive_path_skip_kernel branch from 78c8900 to ab9c40e Compare June 2, 2026 04:20
@GaladrielZhao

Copy link
Copy Markdown
Contributor Author

ci:rerun

1 similar comment
@GaladrielZhao

Copy link
Copy Markdown
Contributor Author

ci:rerun

@donaldsharp

Copy link
Copy Markdown
Member

I don't even understand how we can have gotten to this state. It makes no sense to me. Please provide a topotest showing the problem at the very least

@GaladrielZhao

GaladrielZhao commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

I don't even understand how we can have gotten to this state. It makes no sense to me. Please provide a topotest showing the problem at the very least

Hi @donaldsharp , thanks for the review.

Here's the scenario where we hit this.
We have a static route to 1::1 over two ECMP nexthops — 2064:100::1d (NHG 246) and 2064:200::1e (NHG 256), forming NHG 387. Both 2064:100::1d and 2064:200::1e resolve recursively via fc06::2 and fc08::2.
When 2064:100::1d is withdrawn, zebra marks NHG 246 inactive. NHG 387 is subsequently rebuilt, but zebra_nhg_nhe2grp_internal() still includes the inactive NHG 246 in the nh_grp[] array.

Output of show nexthop-group rib at this point:

ID: 387 (zebra)
     Nexthop Count: 2
     Valid
     Depends: (246) (256)
        via 2064:100::1d (vrf default) inactive, label 16, seg6 fd00:201:201:1::, weight 1
        via 2064:200::1e (vrf default) (recursive), label 32, seg6 fd00:202:202:2::, weight 1
           via fc06::2, Ethernet12 (vrf default), label 32, weight 1
           via fc08::2, Ethernet4 (vrf default), label 32, weight 1

The kernel rejects with EINVAL because the inactive nexthop is unreachable:

2026/05/29 18:48:27 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=4875, pid=4173641508
2026/05/29 18:48:27 ZEBRA: [J7K9Z-9M7DT] Nexthop dplane ctx 0x55aea0e8c7a0, op NH_INSTALL, nexthop ID (387), result FAILURE
2026/05/29 18:48:27 ZEBRA: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (387[246/256]) into the kernel

A topotest will be provided to demonstrate this scenario.

@GaladrielZhao

Copy link
Copy Markdown
Contributor Author

Hi @donaldsharp , this issue is caused by the skip_kernel logic in the RIB/FIB. In RIB/FIB, the received NHGs are set to skip kernel programming and sent to FPM directly, with NEXTHOP_GROUP_INSTALLED being set after sending to dplane but never cleared when the nexthop becomes unreachable. Function nhe2grp_internal() then includes these inactive NHGs in nh_grp[] since they still pass the VALID && INSTALLED check.

Our proposal is to introduce a new flag NEXTHOP_GROUP_INSTALLED_FPM_ONLY to track the install state of skip-kernel NHGs separately to avoid this. Implementation will be updated in the RIB/FIB PR#21415 . Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants