Skip to content

core: rtw_xmit: gate scan_state TX-block by _FW_UNDER_SURVEY to prevent permanent TX stall#62

Merged
morrownr merged 1 commit into
morrownr:mainfrom
breakneck-git:fix-tx-stall-scan-state-desync
Apr 25, 2026
Merged

core: rtw_xmit: gate scan_state TX-block by _FW_UNDER_SURVEY to prevent permanent TX stall#62
morrownr merged 1 commit into
morrownr:mainfrom
breakneck-git:fix-tx-stall-scan-state-desync

Conversation

@breakneck-git

Copy link
Copy Markdown
Contributor

Summary

Fixes a permanent TX stall caused by a race between the upper-level (mlmepriv._FW_UNDER_SURVEY) and low-level (mlmeextpriv.scan_state) MLME state when a scan times out.

Same root cause and identical patch shape was also submitted to:

Root cause

rtw_scan_timeout_handler() (core/rtw_mlme.c) clears the upper-level _FW_UNDER_SURVEY flag and notifies cfg80211, but does not reset mlmeext->scan_state. The only place that resets scan_state to SCAN_DISABLE in the normal completion path is the SCAN_COMPLETE handler in core/rtw_mlme_ext.c, which a stalled scan never reaches.

After this race, rtw_xmit_ac_blocked() (core/rtw_xmit.c) permanently returns _TRUE for the scan-state check, blocking all TX. RX continues unaffected (no scan gate on the RX path), beacons arrive, the station stays associated — wpa_supplicant reports COMPLETED, signal is fine, but no traffic flows.

wpa_cli reconnect is the user-visible workaround because a full re-assoc goes through init_mlme_ext_priv() which unconditionally resets scan_state to SCAN_DISABLE.

Fix

Gate the scan-state TX-block check by _FW_UNDER_SURVEY. If the upper layer no longer thinks a scan is in progress, do not block TX even if the low-level scan_state is stuck.

The new condition is strictly a subset of the old one — when both layers agree a scan is in progress, behaviour is unchanged. Defensive: doesn't touch the state machine, doesn't risk HW init inconsistency.

Test environment

Tested on the 8821AU sister chip (8821au-20210708 fork). The 8812AU and 8821AU share core/rtw_xmit.c verbatim, so the fix applies identically. If a maintainer of this repo wants to verify on actual 8812AU hardware before merging, that's reasonable — the change is small and the code path identical.

Layer state Old behaviour New behaviour
_FW_UNDER_SURVEY=1, scan_state ∉ {DISABLE, BACKOP} block TX block TX (unchanged)
_FW_UNDER_SURVEY=0, scan_state ∉ {DISABLE, BACKOP} (the desync) block TX (forever) don't block TX
_FW_UNDER_SURVEY=1, scan_state ∈ {DISABLE, BACKOP} don't block TX don't block TX (unchanged)
_FW_UNDER_SURVEY=0, scan_state = DISABLE don't block TX don't block TX (unchanged)

…nt permanent TX stall

When rtw_scan_timeout_handler() (core/rtw_mlme.c) fires after a stalled
scan, it clears the upper-level _FW_UNDER_SURVEY flag in mlmepriv and
notifies cfg80211, but does NOT reset mlmeext->scan_state. The
low-level scan_state stays in SCAN_PROCESS forever, because the only
path that resets it to SCAN_DISABLE is the SCAN_COMPLETE handler in
core/rtw_mlme_ext.c, which the stalled scan never reaches.

After this race, rtw_xmit_ac_blocked() permanently returns true for
the scan-state check, blocking all TX. RX continues normally (no scan
gate on the RX path), beacons are received, and the station stays
associated -- wpa_supplicant reports COMPLETED, signal is fine, but no
traffic actually flows.

User-visible recovery is `wpa_cli reconnect`: a full re-assoc goes
through init_mlme_ext_priv() which unconditionally resets scan_state
to SCAN_DISABLE.

Fix: gate the scan_state TX-block check by _FW_UNDER_SURVEY. If the
upper layer no longer thinks a scan is in progress, do not block TX
even if the low-level scan_state is stuck. The new condition is
strictly a subset of the old one -- when both layers agree a scan is
in progress, behaviour is unchanged. Defensive: doesn't touch the
state machine, doesn't risk HW init inconsistency.

Tested on Realtek 8821AU (USB ID 0bda:0811) on Raspberry Pi 2B,
kernel 6.12.47, against an associated 2.4 GHz AP with periodic
NetworkManager-driven scans. Before the patch: random permanent TX
stalls after hours of uptime (RX continues, association held).
After: TX continues to flow even when scan_state desyncs from the
upper layer.
@morrownr morrownr merged commit 1be3d39 into morrownr:main Apr 25, 2026
@morrownr

Copy link
Copy Markdown
Owner

Hi @breakneck-git

Thanks for the PR. Would you be interested in taking over maintenance of this and the rtl8821au repos? I have more more projects going than I have time for here at my github site and it would be cool for these old drivers to continue to get some attention as they are good for specific use cases.

@morrownr

@breakneck-git

breakneck-git commented Apr 26, 2026

Copy link
Copy Markdown
Contributor Author

Hi @morrownr,

Thanks so much for the trust and the offer — it genuinely means a lot, especially given what these drivers have meant for the 8821AU/8812AU community over the years.

I have to decline, though, and I want to be honest about why so the offer goes to someone who can do it justice:

  • I'm not a programmer by trade. This patch was produced and reasoned out by an LLM (Claude) acting as a coding assistant — I didn't write or read the code myself. My role was directing it: telling it what to investigate, providing context about my setup, and making it re-check its own work (re-reading code, cross-referencing both forks, verifying each claim against the actual source) before letting it open a PR. I do make a real effort to keep it from doing anything stupid, but I'm under no illusion that being a careful operator of an AI makes me qualified to maintain a kernel driver.
  • My own programming skill is low. I wouldn't be able to review incoming PRs at the depth that the project or its users deserve.
  • I have exactly one test device — a Realtek RTL8821AU dongle (USB ID 0bda:0811) on a Raspberry Pi 2B running Raspberry Pi OS, kernel 6.12. Anything outside that path — 8812AU, AP mode, monitor, P2P, mesh, MCC, ARM64, x86, other kernels — I genuinely can't validate.

What I can realistically commit to is continuing what's already started. The same LLM analysis surfaced a few more concrete bugs in the same defensive shape as this PR — a TX queue counter leak on USB submit failure, VLA + unbounded copy_from_user in ioctl_mp.c reachable under CAP_NET_ADMIN, and a couple of state-desync issues structurally identical to the scan_state one. I'll prepare PRs for those over the next few days, small and well-described, so they're easy for you to evaluate. Beyond that I don't think I'd be a good steward of the repo.

Thanks again for the offer and for keeping these drivers usable for as long as you have. The 8821AU is still the only working option for a lot of us — me included.

@breakneck-git

@morrownr

Copy link
Copy Markdown
Owner

@breakneck-git

I am not a career programmer either. I took classes in FORTRAN and COBOL when in college back in the dark ages because it was required for my major- economics. I have self-taught several languages over the years but I am far from a wireless programmer. I have learned a lot but I leave the hard stuff the ones that are good at it. My point being is that you can contribute greatly and learn something along the way.

Let me point you to a different repo where you might contribute and learn things:

https://github.com/lwfinger/rtw88

There were 3 of us started a project to modernize and upstream Realtek USB WiFi 5 support to the Linux kernel in early 2024. Larry Finger, who passed away in May of 2024, Bitterblue Smith (AKA Dubhater) and me. All of the drivers are now in the kernel but testing is always needed and it is a fully mac80211 repo which means you would be testing and working on the modern Linux wireless standard which is something that will work for a LONG time.

I hope to see you over in that repo.

@breakneck-git

Copy link
Copy Markdown
Contributor Author

Hi @morrownr,

Thanks for the pointer — I'll definitely give it a try. The supported list in lwfinger/rtw88 includes RTL8821AU (USB ID 0bda:0811), which is exactly the chip I have, so it's a relevant test case.

Plan: collect a baseline on the current rtl8812au DKMS driver (RPi 2B, 2.4 GHz home AP, kernel 6.12.47) — TCP/UDP throughput via iperf3, idle latency over 30 min, RX drop rate, signal stability, scan-induced disruption. That's actually running on the box right now. Then install lwfinger/rtw88 alongside, blacklist 88XXau, switch to rtw88_usb, and re-run the same script under matched conditions (same AP, same physical position, similar time-of-day). Diff the numbers, write up a comparison report, and file whatever I find — both wins and rough edges — in lwfinger/rtw88.

Realistically what I'll provide is a data point on the 8821AU USB path on low-end ARMv7 with a real-world STA load profile against a typical 2.4 GHz home AP. Not deep kernel work, but should at least surface "works"/"doesn't work" specifics for that chip on that distro, which I gather is part of what testers can usefully add.

— breakneck-git

@morrownr

Copy link
Copy Markdown
Owner

@breakneck-git

The supported list in lwfinger/rtw88 includes RTL8821AU (USB ID 0bda:0811), which is exactly the chip I have, so it's a relevant test case.

lwfinger/rtw88 is not just any repo. The drivers at that repo are now in the Linux kernel. rtl8812au and rtl8821/11au have been in the Linux kernel since kernel 6.14. Good testing and reporting is still needed so find what you can and report it.

You will also find two more repos here at this site called rtw89 and mt76 which are similar to rtw88 but handle WiFi 6 and 7 adapters.

The Main Menu here at this site:

https://github.com/morrownr/USB-WiFi

You might find it to be interesting.

@morrownr

@breakneck-git

Copy link
Copy Markdown
Contributor Author

@morrownr

Thanks — really appreciate the broader context.

Heads-up: the patch I sent in this PR (#62) actually introduces a build break on master of this fork. The check_fwstate(..., _FW_UNDER_SURVEY) identifier doesn't exist here — it only lives as an alias in the aircrack-ng/rtl8812au tree (where it macros to WIFI_SITE_MONITOR). On your forks the canonical name is WIFI_UNDER_SURVEY. Honest mistake on my side: I had the aircrack-ng repo cloned in parallel and the LLM that drafted the patch picked the wrong symbol from there.

The minimal fix is up in #65 (and the same rename for the 8821au sibling fork is in morrownr/8821au-20210708#201) — pure mechanical s/_FW_UNDER_SURVEY/WIFI_UNDER_SURVEY/, no logic change. On the 8821au side CI confirms it builds clean across gcc-10/11/12 (the only red check there is Codespell on pre-existing typos in README.md / FAQ.md, unrelated to the change). Could you take a quick look so master is unblocked on both forks? My follow-up patches (#63, #64 here; #199, #200 on the sibling) sit on top and need this in first.

On the rtw88 side: a first compatibility / comparison write-up for RTL8821AU (USB 0bda:0811) is already up — lwfinger/rtw88#443. Short version: works out-of-the-box on RPi 2B, kernel 6.12; RX-drop ~72% lower than the DKMS driver, idle latency ~49% better once power-save is forced off (mac80211 default PS=on was the dominant idle-latency cost — that surprised me).

Thanks for pointing me at rtw89, mt76 and the USB-WiFi hub. From a quick scan: rtw89 covers Realtek Wi-Fi 6/6E/7 chips (PCIe), and mt76 is MediaTek-only — neither matches my current 8821AU adapter, but the USB-WiFi Plug-and-Play list is exactly what I'll consult before the next purchase. Genuinely useful — thanks.

@mkreisl

mkreisl commented May 2, 2026

Copy link
Copy Markdown

That’s all well and good.
In Kernel 6.12.85 (Raspberry Pi Kernel Source), neither _FW_UNDER_SURVEY nor WIFI_SITE_MONITOR is defined

First, an

echo "diff --git a/core/rtw_xmit.c b/core/rtw_xmit.c
index 1e061d5.. 7a4882c 100644
--- a/core/rtw_xmit.c
+++ b/core/rtw_xmit.c
@@ -17,6 +17,10 @@
 #include <drv_types.h>
 #include <hal_data.h>

+#if !defined (_FW_UNDER_SURVEY)
+#define _FW_UNDER_SURVEY 0x00000800
+#endif
+
 static u8 P802_1H_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0xf8 };
 static u8 RFC1042_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0x00 };


" | patch -p1 -F3 || :

works for me

Even in Kernel source 7.0 both defines does not exist. No idea where they come from

Found the construct instead:

drivers/net/wireless/realtek/rtl8192cu/include/rtw_mlme.h:#define	WIFI_SITE_MONITOR			0x00000800		//to indicate the station is under site surveying
drivers/net/wireless/realtek/rtl8192cu/include/rtw_mlme.h:#define _FW_UNDER_SURVEY	WIFI_SITE_MONITOR
d

EmbeddedAndroid added a commit to EmbeddedAndroid/8812au-20210820 that referenced this pull request May 14, 2026
Reset embeddedandroid branch to upstream commit 9704072 (Sep 2025),
the last good revision before upstream PR morrownr#62 introduced an
undefined _FW_UNDER_SURVEY identifier. Re-apply the kernel 6.16+
compat patches on top:
- Makefile: EXTRA_CFLAGS -> ccflags-y, $(src) -> $(M)
- include/osdep_service_linux.h: del_timer_sync / from_timer
  compat shim for kernel 6.16+
- os_dep/linux/ioctl_cfg80211.c: radio_idx in set_wiphy_params /
  set_tx_power / get_tx_power callbacks on kernel >= 6.17
@fishhf

fishhf commented May 14, 2026

Copy link
Copy Markdown

It's broken at commit 1be3d39 and 9e91355.

make ARCH=arm64 CROSS_COMPILE= -C /lib/modules/6.12.25+rpt-rpi-2712/build M=/home/fish/opt/8812au-20210820  modules
make[1]: Entering directory '/usr/src/linux-headers-6.12.25+rpt-rpi-2712'
  CC [M]  /home/fish/opt/8812au-20210820/core/rtw_xmit.o
/home/fish/opt/8812au-20210820/core/rtw_xmit.c: In function ‘rtw_xmit_ac_blocked’:
/home/fish/opt/8812au-20210820/core/rtw_xmit.c:6305:41: error: ‘_FW_UNDER_SURVEY’ undeclared (first use in this function); did you mean ‘WIFI_UNDER_SURVEY’?
 6305 |                 if (check_fwstate(mlme, _FW_UNDER_SURVEY)
      |                                         ^~~~~~~~~~~~~~~~
      |                                         WIFI_UNDER_SURVEY

I think this needs to be reverted or needs to be fixed as it's currently not compilable.
I will be running 9704072 instead.
@breakneck-git Although you aren't a programmer, did you verify the code changes made by the LLM to ensure it's doing what it claims it is doing? Because your change is being merged into different downstream projects.

@morrownr

Copy link
Copy Markdown
Owner

I am sorry that there is a problem. If you guys can come to a conclusion and one of you submits a PR to fix the problem, I will merge it.

My attention is entirely on other repos having to do with modern, mac80211, drivers and information.

@briantbutton

Copy link
Copy Markdown
Collaborator

I agree with @fishhf. It should be reverted. It hasn't had NEARLY enough testing. This was not tested on actual hardware, nor was it, apparently, even compiled.

The fact that the commit summary liberally refers to the wrong variable or a non-existent one terrifies me.

@morrownr

Copy link
Copy Markdown
Owner

I have reverted the patch that appears to have been the problem. I could use confirmation.

Something that is not clear to me is why you have not moved on from this old, not-very-good, driver to the new modern, mac80211 driver that is in kernel 6.14+ and is available to compile/install for older kernels? It is a superior driver compared to the one in this repo and using it would let those who use it become familiar with current Linux wireless programming standards.

Here is the repo where the new driver can be installed from if you are on a kernel older than 6.14:

https://github.com/lwfinger/rtw88

@fishhf

fishhf commented May 15, 2026

Copy link
Copy Markdown

Something that is not clear to me is why you have not moved on from this old, not-very-good, driver to the new modern, mac80211 driver that is in kernel 6.14+ and is available to compile/install for older kernels? It is a superior driver compared to the one in this repo and using it would let those who use it become familiar with current Linux wireless programming standards.

The whatever Debian in my raspberry pi 5 doesn't have the 6.14+ kernel yet. Building this would be simpler, in fact i was looking for the old repo https://github.com/morrownr/8812au-20210629 with a specific commit 8eb3e30b2f2e29fcbe12bbc171e95180bb966cc0 that was stable for me for years. For me I also have other projects going on, and I just need to get this particular wifi dongle up and running with the least effort.

@morrownr

Copy link
Copy Markdown
Owner

@fishhf

The whatever Debian in my raspberry pi 5 doesn't have the 6.14+ kernel yet.

I run Debian 13 on the system I am on right now. It is currently running 7.0.4+deb13-amd64. 6.18 is available also. I don't run pure Debian on my Pi4B but the RasPiOS flowed out kernel 6.18 earlier this week.

I'm not trying to push you or anyone else to do something you don't want to do. However, I need you to know that my priorities are with modern versions of the drivers. If you do want to stay with kernel 6.12, the following repo can be installed and it contains the modern driver that is in kernel 6.14+:

https://github.com/lwfinger/rtw88

Remember to run the following to ensure this driver is fully removed:

sudo sh remove-driver.sh

The new driver is much better than this one, especially if you do monitor mode and/or do things that require some capabilities that are not available in this driver.

Cheers

@briantbutton

Copy link
Copy Markdown
Collaborator

I can answer with regard to us. We have a system combination that works very well. Raspberry Pi, USB/WiFi adapter, Ubuntu, NodeJS. It's been frozen. Qualifying a new system is that last thing we wanna do.

So when you say "better", it does not mean much stand alone. Does is support more users? New bands? Sync faster? Prolly not. What we DO do is occasionally look at USB/WiFi adapters to see if there is a more performant model with comparable pricing - a significant increase in range would be tempting. That might drive us to a new driver.

Your driver repo's are lovely; I am grateful for them. It would be nice if they stayed where they are but I have offline copies of what we need. You have done a great service to the industry.

@cybersnow02

Copy link
Copy Markdown

@fishhf @briantbutton @EmbeddedAndroid If you need this specific driver (8812au) instead of the RTW88 for newer kernel (6.14 to 6.19 and new 7.0 branch), you can use the following repo: https://github.com/cybersnow02/8812au-20210820

@morrownr

Copy link
Copy Markdown
Owner

Brian,

It's been frozen. Qualifying a new system is that last thing we wanna do.

I understand this and I have worked with a few organizations on projects probably similar to yours. I'm actually not an IT guy, I just play one on TV. My undergrad and grad degrees are in economics.

So when you say "better", it does not mean much stand alone.

That is correct. I'll elaborate.

I can sum up my use of the word "better" with two words: supportability and flexibility. Stuff happens. What if you need a mod to the rtl8812au driver here in this repo? Getting any help from Realtek for these out-of-kernel drivers is really not going to happen unless you are buying chips in lots of 10k from Realtek. On the other hand, with the modern mac80211 drivers that are handled in-kernel gives all of us access to information that allows us to work almost any problem.

What we DO do is occasionally look at USB/WiFi adapters to see if there is a more performant model with comparable pricing - a significant increase in range would be tempting.

I'm aware of two new adapters that should be on the market soon and should have really good range. There are a couple of good places to post your requirements in issues and get my attention:

https://github.com/morrownr/USB-WiFi

https://github.com/morrownr/mt76

Linux wireless is much different that it was even 5 years ago. For USB WiFi, we used to struggle with having drivers. That is a fading memory. We still have two primary players in USB WiFi: Mediatek and Realtek. If you consider the chips from these two companies and modern mac80211 driver availablity in the Linux kernel, we are missing only one driver: rtl8922au. And it is available and should go into the kernel soon. Work on it goes on in the following repo:

https://github.com/morrownr/rtw89

Let me repeat that: We are one driver away from ALL WiFi 5, 6 and 7 USB chips having modern drivers in-kernel.

Let me point you to a product that is available:

https://www.amazon.com/BrosTrend-AXE3000-Linux-WiFi-Adapter/dp/B0F6MY7H62

Note that this Brostrend adapter advertises as being for Linux. Notice that the ad says nothing about Windows. I have the box that mine came it. The front of the box says Linux, no mention of Windows. Read the reviews. It seems the only bad reviews are from Windows users that had to go download a driver. I guess the Windows users can't read.

More "made for Linux" adapters are on the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants