core: rtw_xmit: gate scan_state TX-block by _FW_UNDER_SURVEY to prevent permanent TX stall#62
Conversation
…nt permanent TX stall When rtw_scan_timeout_handler() (core/rtw_mlme.c) fires after a stalled scan, it clears the upper-level _FW_UNDER_SURVEY flag in mlmepriv and notifies cfg80211, but does NOT reset mlmeext->scan_state. The low-level scan_state stays in SCAN_PROCESS forever, because the only path that resets it to SCAN_DISABLE is the SCAN_COMPLETE handler in core/rtw_mlme_ext.c, which the stalled scan never reaches. After this race, rtw_xmit_ac_blocked() permanently returns true for the scan-state check, blocking all TX. RX continues normally (no scan gate on the RX path), beacons are received, and the station stays associated -- wpa_supplicant reports COMPLETED, signal is fine, but no traffic actually flows. User-visible recovery is `wpa_cli reconnect`: a full re-assoc goes through init_mlme_ext_priv() which unconditionally resets scan_state to SCAN_DISABLE. Fix: gate the scan_state TX-block check by _FW_UNDER_SURVEY. If the upper layer no longer thinks a scan is in progress, do not block TX even if the low-level scan_state is stuck. The new condition is strictly a subset of the old one -- when both layers agree a scan is in progress, behaviour is unchanged. Defensive: doesn't touch the state machine, doesn't risk HW init inconsistency. Tested on Realtek 8821AU (USB ID 0bda:0811) on Raspberry Pi 2B, kernel 6.12.47, against an associated 2.4 GHz AP with periodic NetworkManager-driven scans. Before the patch: random permanent TX stalls after hours of uptime (RX continues, association held). After: TX continues to flow even when scan_state desyncs from the upper layer.
|
Thanks for the PR. Would you be interested in taking over maintenance of this and the rtl8821au repos? I have more more projects going than I have time for here at my github site and it would be cool for these old drivers to continue to get some attention as they are good for specific use cases. |
|
Hi @morrownr, Thanks so much for the trust and the offer — it genuinely means a lot, especially given what these drivers have meant for the 8821AU/8812AU community over the years. I have to decline, though, and I want to be honest about why so the offer goes to someone who can do it justice:
What I can realistically commit to is continuing what's already started. The same LLM analysis surfaced a few more concrete bugs in the same defensive shape as this PR — a TX queue counter leak on USB submit failure, VLA + unbounded Thanks again for the offer and for keeping these drivers usable for as long as you have. The 8821AU is still the only working option for a lot of us — me included. |
|
I am not a career programmer either. I took classes in FORTRAN and COBOL when in college back in the dark ages because it was required for my major- economics. I have self-taught several languages over the years but I am far from a wireless programmer. I have learned a lot but I leave the hard stuff the ones that are good at it. My point being is that you can contribute greatly and learn something along the way. Let me point you to a different repo where you might contribute and learn things: https://github.com/lwfinger/rtw88 There were 3 of us started a project to modernize and upstream Realtek USB WiFi 5 support to the Linux kernel in early 2024. Larry Finger, who passed away in May of 2024, Bitterblue Smith (AKA Dubhater) and me. All of the drivers are now in the kernel but testing is always needed and it is a fully mac80211 repo which means you would be testing and working on the modern Linux wireless standard which is something that will work for a LONG time. I hope to see you over in that repo. |
|
Hi @morrownr, Thanks for the pointer — I'll definitely give it a try. The supported list in lwfinger/rtw88 includes RTL8821AU (USB ID Plan: collect a baseline on the current rtl8812au DKMS driver (RPi 2B, 2.4 GHz home AP, kernel 6.12.47) — TCP/UDP throughput via iperf3, idle latency over 30 min, RX drop rate, signal stability, scan-induced disruption. That's actually running on the box right now. Then install lwfinger/rtw88 alongside, blacklist Realistically what I'll provide is a data point on the 8821AU USB path on low-end ARMv7 with a real-world STA load profile against a typical 2.4 GHz home AP. Not deep kernel work, but should at least surface "works"/"doesn't work" specifics for that chip on that distro, which I gather is part of what testers can usefully add. — breakneck-git |
lwfinger/rtw88 is not just any repo. The drivers at that repo are now in the Linux kernel. rtl8812au and rtl8821/11au have been in the Linux kernel since kernel 6.14. Good testing and reporting is still needed so find what you can and report it. You will also find two more repos here at this site called rtw89 and mt76 which are similar to rtw88 but handle WiFi 6 and 7 adapters. The Main Menu here at this site: https://github.com/morrownr/USB-WiFi You might find it to be interesting. |
|
Thanks — really appreciate the broader context. Heads-up: the patch I sent in this PR (#62) actually introduces a build break on The minimal fix is up in #65 (and the same rename for the 8821au sibling fork is in morrownr/8821au-20210708#201) — pure mechanical On the rtw88 side: a first compatibility / comparison write-up for RTL8821AU (USB Thanks for pointing me at rtw89, mt76 and the USB-WiFi hub. From a quick scan: rtw89 covers Realtek Wi-Fi 6/6E/7 chips (PCIe), and mt76 is MediaTek-only — neither matches my current 8821AU adapter, but the USB-WiFi Plug-and-Play list is exactly what I'll consult before the next purchase. Genuinely useful — thanks. |
|
That’s all well and good. First, an works for me Even in Kernel source 7.0 both defines does not exist. No idea where they come from Found the construct instead: |
Reset embeddedandroid branch to upstream commit 9704072 (Sep 2025), the last good revision before upstream PR morrownr#62 introduced an undefined _FW_UNDER_SURVEY identifier. Re-apply the kernel 6.16+ compat patches on top: - Makefile: EXTRA_CFLAGS -> ccflags-y, $(src) -> $(M) - include/osdep_service_linux.h: del_timer_sync / from_timer compat shim for kernel 6.16+ - os_dep/linux/ioctl_cfg80211.c: radio_idx in set_wiphy_params / set_tx_power / get_tx_power callbacks on kernel >= 6.17
|
It's broken at commit 1be3d39 and 9e91355. I think this needs to be reverted or needs to be fixed as it's currently not compilable. |
|
I am sorry that there is a problem. If you guys can come to a conclusion and one of you submits a PR to fix the problem, I will merge it. My attention is entirely on other repos having to do with modern, mac80211, drivers and information. |
|
I agree with @fishhf. It should be reverted. It hasn't had NEARLY enough testing. This was not tested on actual hardware, nor was it, apparently, even compiled. The fact that the commit summary liberally refers to the wrong variable or a non-existent one terrifies me. |
|
I have reverted the patch that appears to have been the problem. I could use confirmation. Something that is not clear to me is why you have not moved on from this old, not-very-good, driver to the new modern, mac80211 driver that is in kernel 6.14+ and is available to compile/install for older kernels? It is a superior driver compared to the one in this repo and using it would let those who use it become familiar with current Linux wireless programming standards. Here is the repo where the new driver can be installed from if you are on a kernel older than 6.14: |
The whatever Debian in my raspberry pi 5 doesn't have the 6.14+ kernel yet. Building this would be simpler, in fact i was looking for the old repo https://github.com/morrownr/8812au-20210629 with a specific commit 8eb3e30b2f2e29fcbe12bbc171e95180bb966cc0 that was stable for me for years. For me I also have other projects going on, and I just need to get this particular wifi dongle up and running with the least effort. |
I run Debian 13 on the system I am on right now. It is currently running 7.0.4+deb13-amd64. 6.18 is available also. I don't run pure Debian on my Pi4B but the RasPiOS flowed out kernel 6.18 earlier this week. I'm not trying to push you or anyone else to do something you don't want to do. However, I need you to know that my priorities are with modern versions of the drivers. If you do want to stay with kernel 6.12, the following repo can be installed and it contains the modern driver that is in kernel 6.14+: https://github.com/lwfinger/rtw88 Remember to run the following to ensure this driver is fully removed: sudo sh remove-driver.sh The new driver is much better than this one, especially if you do monitor mode and/or do things that require some capabilities that are not available in this driver. Cheers |
|
I can answer with regard to us. We have a system combination that works very well. Raspberry Pi, USB/WiFi adapter, Ubuntu, NodeJS. It's been frozen. Qualifying a new system is that last thing we wanna do. So when you say "better", it does not mean much stand alone. Does is support more users? New bands? Sync faster? Prolly not. What we DO do is occasionally look at USB/WiFi adapters to see if there is a more performant model with comparable pricing - a significant increase in range would be tempting. That might drive us to a new driver. Your driver repo's are lovely; I am grateful for them. It would be nice if they stayed where they are but I have offline copies of what we need. You have done a great service to the industry. |
|
@fishhf @briantbutton @EmbeddedAndroid If you need this specific driver (8812au) instead of the RTW88 for newer kernel (6.14 to 6.19 and new 7.0 branch), you can use the following repo: https://github.com/cybersnow02/8812au-20210820 |
|
Brian,
I understand this and I have worked with a few organizations on projects probably similar to yours. I'm actually not an IT guy, I just play one on TV. My undergrad and grad degrees are in economics.
That is correct. I'll elaborate. I can sum up my use of the word "better" with two words: supportability and flexibility. Stuff happens. What if you need a mod to the rtl8812au driver here in this repo? Getting any help from Realtek for these out-of-kernel drivers is really not going to happen unless you are buying chips in lots of 10k from Realtek. On the other hand, with the modern mac80211 drivers that are handled in-kernel gives all of us access to information that allows us to work almost any problem.
I'm aware of two new adapters that should be on the market soon and should have really good range. There are a couple of good places to post your requirements in issues and get my attention: https://github.com/morrownr/USB-WiFi https://github.com/morrownr/mt76 Linux wireless is much different that it was even 5 years ago. For USB WiFi, we used to struggle with having drivers. That is a fading memory. We still have two primary players in USB WiFi: Mediatek and Realtek. If you consider the chips from these two companies and modern mac80211 driver availablity in the Linux kernel, we are missing only one driver: rtl8922au. And it is available and should go into the kernel soon. Work on it goes on in the following repo: https://github.com/morrownr/rtw89 Let me repeat that: We are one driver away from ALL WiFi 5, 6 and 7 USB chips having modern drivers in-kernel. Let me point you to a product that is available: https://www.amazon.com/BrosTrend-AXE3000-Linux-WiFi-Adapter/dp/B0F6MY7H62 Note that this Brostrend adapter advertises as being for Linux. Notice that the ad says nothing about Windows. I have the box that mine came it. The front of the box says Linux, no mention of Windows. Read the reviews. It seems the only bad reviews are from Windows users that had to go download a driver. I guess the Windows users can't read. More "made for Linux" adapters are on the way. |
Summary
Fixes a permanent TX stall caused by a race between the upper-level (
mlmepriv._FW_UNDER_SURVEY) and low-level (mlmeextpriv.scan_state) MLME state when a scan times out.Same root cause and identical patch shape was also submitted to:
aircrack-ng/rtl8812au(referencing their open issue #778 — "Radio scans introduce lag spikes")morrownr/8821au-20210708(sister repo with shared MLME code)Root cause
rtw_scan_timeout_handler()(core/rtw_mlme.c) clears the upper-level_FW_UNDER_SURVEYflag and notifies cfg80211, but does not resetmlmeext->scan_state. The only place that resetsscan_statetoSCAN_DISABLEin the normal completion path is theSCAN_COMPLETEhandler incore/rtw_mlme_ext.c, which a stalled scan never reaches.After this race,
rtw_xmit_ac_blocked()(core/rtw_xmit.c) permanently returns_TRUEfor the scan-state check, blocking all TX. RX continues unaffected (no scan gate on the RX path), beacons arrive, the station stays associated —wpa_supplicantreportsCOMPLETED, signal is fine, but no traffic flows.wpa_cli reconnectis the user-visible workaround because a full re-assoc goes throughinit_mlme_ext_priv()which unconditionally resetsscan_statetoSCAN_DISABLE.Fix
Gate the scan-state TX-block check by
_FW_UNDER_SURVEY. If the upper layer no longer thinks a scan is in progress, do not block TX even if the low-levelscan_stateis stuck.The new condition is strictly a subset of the old one — when both layers agree a scan is in progress, behaviour is unchanged. Defensive: doesn't touch the state machine, doesn't risk HW init inconsistency.
Test environment
Tested on the 8821AU sister chip (8821au-20210708 fork). The 8812AU and 8821AU share
core/rtw_xmit.cverbatim, so the fix applies identically. If a maintainer of this repo wants to verify on actual 8812AU hardware before merging, that's reasonable — the change is small and the code path identical._FW_UNDER_SURVEY=1,scan_state∉ {DISABLE, BACKOP}_FW_UNDER_SURVEY=0,scan_state∉ {DISABLE, BACKOP} (the desync)_FW_UNDER_SURVEY=1,scan_state ∈ {DISABLE, BACKOP}_FW_UNDER_SURVEY=0,scan_state = DISABLE