os_dep/linux: usb_ops_linux: decrement per-AC counter on USB submit failure#200
Closed
breakneck-git wants to merge 1 commit into
Closed
os_dep/linux: usb_ops_linux: decrement per-AC counter on USB submit failure#200breakneck-git wants to merge 1 commit into
breakneck-git wants to merge 1 commit into
Conversation
…ailure usb_write_port() unconditionally increments the per-access-category counter (voq_cnt / viq_cnt / beq_cnt / bkq_cnt) under pxmitpriv->lock before calling usb_submit_urb(). The matching decrement happens *only* in the completion callback usb_write_port_complete(), which is invoked when the URB completes (success, peer cancel, kill, etc.). When usb_submit_urb() itself returns a non-zero error (e.g. -ENOMEM, -ENODEV, -EPIPE, -ESHUTDOWN, transient USB stack errors), the URB is *not* queued and the completion callback will *not* run. The function then takes the error branch and `goto exit` straight to free_xmitbuf, leaving the counter permanently inflated by one. Each subsequent submit failure compounds the leak. Once voq_cnt/viq_cnt/beq_cnt/bkq_cnt exceeds the per-AC threshold checked in rtw_os_need_stop_queue() (NR_XMITBUFF/4), the TX path calls netif_stop_subqueue() with no path to wake (since wake also lives in the completion callback that never runs). Result: TX permanently stalls and is recoverable only by module reload (rmmod 88XXau / modprobe 88XXau). In practice this is reachable on any USB link with intermittent errors (marginal cable, heat, EMI, autosuspend race, hub disconnect/reconnect storms). The driver continues to look "associated" because beacons keep arriving and the MLME layer is unaffected -- only TX dies. Fix: take the same lock the increment uses and decrement the same counter (selected by pxmitbuf->flags, which was set under the lock at increment time) before falling through to the existing -ENODEV detection / goto exit. Mirrors the cleanup that usb_write_port_complete() would do if the URB had been queued.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a permanent TX stall caused by per-AC queue counter leak on USB submit failure. Counter inflates by one on every submit error; once it crosses the threshold that triggers
netif_stop_subqueue(), TX is stuck until module reload (the wake path lives in the completion callback that the failed URB will never reach).Root cause
usb_write_port()increments the per-access-category counter (voq_cnt/viq_cnt/beq_cnt/bkq_cnt) underpxmitpriv->lockbefore callingusb_submit_urb(). The matching decrement lives only in the completion callbackusb_write_port_complete(), which is invoked when the URB completes — including kill / disconnect / surprise-removed paths viausb_kill_urb().When
usb_submit_urb()returns non-zero (-ENOMEM,-ENODEV,-EPIPE,-ESHUTDOWN, transient USB stack errors), the URB was not queued. The completion callback will not run. The counter is permanently inflated by one.Each subsequent submit failure compounds the leak. Once any of
{voq,viq,beq,bkq}_cntexceeds the threshold checked inrtw_os_need_stop_queue(), the TX path callsnetif_stop_subqueue(). The corresponding wake also lives inusb_write_port_complete()— which won't run because there's no in-flight URB to complete. Result: permanent TX stall, recoverable only byrmmod 88XXau / modprobe 88XXau.This is reachable on any USB link with intermittent errors (marginal cable, heat, EMI, autosuspend races, hub reconnect storms). The driver continues to look "associated" — beacons still arrive, MLME state is fine — only TX dies, which makes the bug particularly user-confusing.
Fix
Decrement the same counter that was incremented at the top of the function, under the same lock, on the submit-failure branch. Counter is selected by
pxmitbuf->flagswhich was set under the lock at increment time, so the choice is unambiguous.This mirrors exactly what
usb_write_port_complete()would do if the URB had been queued, restoring the counter invariant.Test environment
0bda:0811) on Raspberry Pi 2B, kernel 6.12.47voq_cnt/viq_cnt/beq_cnt/bkq_cntare touched in exactly 3 places (init=0, inc in this fn, dec in completion) — verified by exhaustive grepusb_kill_urb()triggers the completion callback with error status, so the kill/cancel paths do hit the existing decrement (no double-decrement risk added by this patch)usb_write_port, status=-Nlines accumulating, but the counter inflation itself is silent until queue stopsDisclosure
This patch is the result of a code-review pass conducted with the help of an LLM (Claude). The reasoning was verified by reading the increment site, the completion callback decrement site, the
usb_kill_urbflow, and the Linux USB API contract forusb_submit_urberrors. I am not a kernel engineer.