[BUG REPORT] Flint 3 (GL-BE9300) – Watchdog Reset triggered by hostapd EAPOL Retry-Loop & DMS Conflict

Maeddes · February 26, 2026, 4:30pm

Product: GL.iNet Flint 3 (GL-BE9300)
Firmware: Beta (kernel 5.4.213, IPQ5332 SoC)
Severity: High – complete system freeze, web interface and SSH unreachable, requires manual power cycle (pull power plug)
Date: 2026-02-26

Summary

A temporary SSID conflict (secondary AP briefly active with same SSID, only 80 seconds) triggers an uncontrolled EAPOL 4-Way Handshake retry-loop in hostapd. This leads to resource exhaustion (Memory/CPU) over ~25 minutes, resulting in a complete system freeze. The web interface and SSH become unreachable; the router does not recover on its own. A manual power cycle (pulling the power plug) is required. No kernel panic, no watchdog reset, no automatic recovery – the hardware watchdog either did not fire or is not configured.

The issue is a combination of two bugs that reinforce each other:

The QCA driver accepts a DMS-incompatible client into a "Zombie-Association" state instead of rejecting it cleanly.
hostapd has no retry-limit or backoff for failed 4-Way Handshakes, causing an infinite loop that exhausts system resources.

Environment


Router	GL.iNet Flint 3, GL-BE9300
Kernel	Linux 5.4.213, ARM64, IPQ5332 SoC (4 cores)
RAM	881 MB usable
WiFi chipsets	QCN9224 (5/6 GHz), QCA5332 (2.4 GHz)
Active services	ECM, NSS offload, Netify Agent 5.1.24, NordVPN WireGuard, ZeroTier, Samba 4.18.8
Trigger	AVM FRITZ!Box 7590, same SSID briefly activated via hardware button (80 s)
Uptime at crash	~68 hours (booted 2026-02-23 ~20:32)

Precise Timeline

Both log sources (Flint 3 syslog + FRITZ!Box event log) correlated by wall clock:

16:10–16:16   Clients da:66:ee:d7:f3:93 and 62:de:21:52:e4:ec begin roaming between bands
16:16:29      kernel: ieee80211_recv_asreq – "assoc req from dms not-supported sta" (first occurrence)
16:16:36–48   ⚠ EAPOL loop starts on wlan01: 1/4 → 2/4 → 1/4 → 2/4 → ... never reaches 3/4
16:19:23      ⚡ FRITZ!Box WLAN ON  (2.4+5 GHz, SSID "elias")  ← from FRITZ!Box event log
16:20:43      ⚡ FRITZ!Box WLAN OFF                             ← from FRITZ!Box event log
              [80 seconds of SSID conflict were sufficient to trigger the crash]
16:23:42      ⚠ Last normal syslog entry – syslogd can no longer write (resource exhaustion)
16:23–16:49   ████ 25-minute syslog silence – kernel IRQs still alive, userspace fully frozen
16:49:11      kernel: [245824s] lan1 link down  ← last kernel-level event
16:49:15      kernel: [245828s] lan1 link up, speed 2500  ← absolute last log entry
~16:49        ❌ SYSTEM FROZEN – web interface unreachable, SSH unreachable
              No watchdog reset, no automatic recovery – manual power cycle required
16:33:26      Router boots after manual power cycle (fresh boot, kernel uptime [0s])

Technical Root Cause Analysis

Bug 1 – DMS Incompatibility creates a "Zombie-Association"

The QCA driver identifies a DMS (Directed Multicast Service) request from the STA but fails to reject the association with Status Code 43 (Unsupported Capability). Instead it logs the error and proceeds, leaving the connection in an unstable state:

kernel: [243869] wlan: [0:E:ANY] ieee80211_recv_asreq: assoc req from dms not-supported sta :62:de:21:52:e4:ec
kernel: [243862] wlan: [0:E:ANY] ieee80211_recv_asreq: assoc req from dms not-supported sta :da:66:ee:d7:f3:93

The client is now "associated" in a state the driver itself considers invalid – a Zombie-Association that feeds directly into the EAPOL loop below.

Bug 2 – hostapd EAPOL Infinite Retry (no retry-limit, no backoff)

hostapd enters an infinite loop of Message 1/4 → 2/4 without a retry counter, exponential backoff, or temporary client ban. Each retry holds kernel PTK negotiation state:

16:16:36  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake
16:16:36  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:37  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake  ← retry, no 3/4
16:16:37  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:38  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake  ← retry
16:16:38  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:39  hostapd: wlan01: STA 62:de:21:52:e4:ec IEEE 802.11: disassociated
           ... client re-associates immediately, entire loop repeats ...

Parallel: a second client (da:66:ee) looping without even sending a 2/4 reply, causing hostapd to spam msg 1/4 every second with no response.

Bug 3 – Complete System Freeze, no recovery mechanism

Continuous sub-second EAPOL retries from two STAs accumulate kernel PTK state. After ~7 minutes of looping, syslogd stops recording ~25 minutes before the final freeze – indicating a full userspace hang. Neither the OOM killer, nor procd, nor the hardware watchdog triggered any recovery. The web interface and SSH both became unreachable. The only resolution was pulling the power plug manually.

16:23:42  hostapd: wlan02: STA da:66:ee:d7:f3:93 IEEE 802.11: disassociated  ← LAST SYSLOG ENTRY
          [25 minutes of complete silence]
16:49:11  kernel: [245824s] rtl8372-mdio: lan1 link down    ← hardware IRQ still functional
16:49:15  kernel: [245828s] rtl8372-mdio: lan1 link up, speed 2500
          << no kernel panic, no OOM killer, no watchdog reset >>
          << web interface unreachable, SSH unreachable >>
          << router did NOT recover – manual power cycle required >>

Steps to Reproduce

Set up GL-BE9300 with a specific SSID and at least two WiFi clients that trigger the DMS error on association (look for ieee80211_recv_asreq: assoc req from dms not-supported sta in the logs).
Briefly activate a second AP nearby broadcasting the same SSID and security settings (even 80 seconds is sufficient).
Trigger client roaming/association attempts between the two APs.
Monitor hostapd logs for continuous 1/4 → 2/4 cycles.
After ~25 minutes of no new syslog entries: complete freeze. Web interface and SSH unreachable. Manual power cycle required.

Expected vs. Actual Behaviour

	Expected	Actual
DMS-unsupported client	Rejected at association with Status Code 43, client not accepted	Accepted with error logged, Zombie-Association state feeds EAPOL loop
EAPOL failure handling	Client banned after N retries, kernel state freed	Infinite retry loop, state accumulates until freeze
Resource exhaustion	OOM killer or procd respawn of hostapd, or watchdog reset as last resort	Complete silent freeze, no OOM, no respawn, no watchdog – requires manual power cycle

Suggested Fixes

1. Implement EAPOL retry-limit in hostapd
Count per-client 1/4 retransmissions. After N (e.g. 5) consecutive failures, call ap_sta_disconnect() and add the MAC to a temporary deny-list (e.g. 60 s). Relevant hostapd.conf knob: wpa_ptk_rekey=60.

2. Strict DMS handling – prevent Zombie-Association
Ensure the QCA driver/hostapd rejects DMS-requesting STAs with Status Code 43 (Unsupported Capability) at association if the capability is unsupported, instead of logging and proceeding into an invalid state.

3. procd monitoring of hostapd
Ensure hostapd is correctly monitored by procd with a respawn limit, so a stuck instance is restarted before a full system freeze occurs.

4. Verify hardware watchdog is active and correctly configured
In this incident the hardware watchdog did not fire – the system froze indefinitely without any automatic recovery. The watchdog should be a last-resort safety net. Please verify that the watchdog daemon is running and that the kick interval is correctly configured for the IPQ5332 platform.

will.qiu · February 28, 2026, 8:13am

Hi

Could you please clarify the following:

Can this issue be reproduced consistently?
What firmware version is your Flint 3 currently running? You can find it under System → Upgrade
Please export the full device logs and send them to us via private message so we can analyze further

Also, please try the following to see if it helps:

Try setting the 2.4 GHz Wi-Fi mode to 802.11n/g, since some IoT devices may not work well with ax/be modes.

uci set wireless.wifi0.hwmode='11ng'
uci commit wireless

/etc/init.d/network restart

Try disabling 802.11v on the corresponding SSID, as the log indicates that the issue appears to be related to roaming.

# On Main SSID 
uci set wireless.wifi2g.bss_transition='0'

# On Guest SSID
uci set wireless.guest2g.bss_transition='0'

# On MLO SSID
uci set wireless.wlanmld2g.bss_transition='0'

uci commit wireless
/etc/init.d/network restart

Maeddes · March 6, 2026, 6:45am

Final Update: Workaround confirmed — FRITZ!Box SSID conflict test passed

After confirming the bss_transition=0 fix resolves the POCO and Lenovo issues, we ran the original freeze-trigger test: activating the FRITZ!Box 7530 with the same SSID ("elias") while clients were actively roaming between bands.

Result: No EAPOL loop, no freeze, router remained fully stable throughout.

Log shows clean roaming under SSID conflict conditions — including rapid band switching within the same second that previously triggered the crash:

07:38:37  wlan1: STA 46:39:.. pairwise key handshake completed ✓
07:38:37  wlan0: STA 46:39:.. disassociated → authenticated
07:38:38  wlan0: STA 46:39:.. pairwise key handshake completed ✓

No dms not-supported, no EAPOL retry loop, no syslog silence, no freeze. Web interface and SSH remained reachable throughout.

The complete workaround — apply to all six interfaces

uci set wireless.wifi2g.bss_transition='0'
uci set wireless.wlanmld2g.bss_transition='0'
uci set wireless.wifi5g.bss_transition='0'
uci set wireless.wlanmld5g.bss_transition='0'
uci set wireless.wifi6g.bss_transition='0'
uci set wireless.wlanmld6g.bss_transition='0'
uci commit wireless
wifi reload

Important: The fix must be applied to all six interfaces simultaneously. Setting only the 2.4 GHz interfaces caused a secondary issue: the Lenovo Tab P11 Pro (a non-MLO device) got authenticated on wlan12 (the MLO 5 GHz link) but was immediately disassociated — repeated three times, then gave up entirely. With inconsistent bss_transition values across the MLO bundle, non-MLO clients could not associate on any interface at all. Setting all six interfaces consistently resolved both the POCO EAPOL loop and the Lenovo association failure, while preserving full MLO functionality on 5/6 GHz.

One final remark on the freeze behavior itself

Even though the workaround now prevents the freeze, the original failure mode remains concerning and should be addressed independently. When the EAPOL loop occurred, the router entered a complete silent freeze — web interface unreachable, SSH unreachable, no OOM killer, no procd respawn, no watchdog reset. The only recovery was physically unplugging the power cable.

For a router, having to pull the power cable as the only recovery option is not ok. A router should never require manual intervention to recover from a software-level fault. I would strongly recommend:

Hardware watchdog: Verify that the watchdog is correctly configured and actively kicking on the IPQ5332 platform — in this incident it did not fire despite a complete userspace freeze
procd monitoring: Ensure procd properly monitors and respawns hostapd if it becomes unresponsive
EAPOL retry limit: Consider a maximum retry limit in hostapd to prevent unbounded EAPOL loops from consuming kernel state in the first place

The workaround addresses the trigger. The resilience gap remains open.

will.qiu · March 19, 2026, 8:51am

We need to collect additional logs and configuration details at the time the issue occurs so we can work with Qualcomm to diagnose and troubleshoot it.

Could you please help by following the steps below?

Configure the BE9300 to a state where the issue can be reproduced. After this, no further configuration changes should be needed for the following steps.
Run the commands below to place the relevant configurations and logs into a specified directory.
(Since you mentioned that a reboot is required to regain access to the device, the data is currently stored in flash. If storage space is limited, please insert a USB drive and adjust the directory accordingly.)

mkdir -p /root/wifi_debug
cp /etc/config/wireless /root/wifi_debug/ 2>/dev/null
cp /etc/config/network  /root/wifi_debug/ 2>/dev/null
cp /etc/config/firewall /root/wifi_debug/ 2>/dev/null
cp /etc/config/dhcp     /root/wifi_debug/ 2>/dev/null
cp /etc/config/system   /root/wifi_debug/ 2>/dev/null
cp /var/run/hostapd-*.conf /root/wifi_debug/ 2>/dev/null
cp /tmp/run/hostapd-*.conf /root/wifi_debug/ 2>/dev/null
cp /var/run/wpa_supplicant-*.conf /root/wifi_debug/ 2>/dev/null
cp /tmp/run/wpa_supplicant-*.conf /root/wifi_debug/ 2>/dev/null
logread > /root/wifi_debug/logread.txt
dmesg > /root/wifi_debug/dmesg.txt
logread -f >> /root/wifi_debug/logread.txt &
iwinfo > /root/wifi_debug/iwinfo.txt 2>&1
iw dev > /root/wifi_debug/iw_dev.txt 2>&1
ubus call network.wireless status > /root/wifi_debug/wireless_status.json
ip addr > /root/wifi_debug/ip_addr.txt
ip route > /root/wifi_debug/ip_route.txt

Reproduce the issue and wait until it occurs.
After rebooting, archive the logs and configurations and send them to us via private message:

tar -czf /root/wifi_debug_$(date +%Y%m%d_%H%M%S).tar.gz -C /tmp wifi_debug

Maeddes · March 19, 2026, 12:26pm

will.qiu:

We need to collect additional logs and configuration details at the time the issue occurs so we can work with Qualcomm to diagnose and troubleshoot it.

Reproduce the issue and wait until it occurs.

After rebooting, archive the logs and configurations and send them to us via private message:
tar -czf /root/wifi_debug_$(date +%Y%m%d_%H%M%S).tar.gz -C /tmp wifi_debug

Thank you for the detailed instructions. I will run the debug collection script and reproduce the issue this weekend when the network is not in active use. I will send the logs via private message afterwards.

will.qiu · March 20, 2026, 4:04am

Thank you for your cooperation, and we look forward to your update!

danp · April 30, 2026, 11:13pm

Hi,

Do you need any additional logs? That error also raised on one of my flint3 routers.

I have not tried yet suggested solution.

will.qiu · May 6, 2026, 6:55am

Hi,

Are you experiencing exactly the same issue, including:

Using another router configured with the same SSID as the BE9300
Logs showing wireless client devices encountering a dms not-supported issue

kernel: [243869] wlan: [0:E:ANY] ieee80211_recv_asreq: assoc req from dms not-supported sta :62:de:21:52:e4:ec
kernel: [243862] wlan: [0:E:ANY] ieee80211_recv_asreq: assoc req from dms not-supported sta :da:66:ee:d7:f3:93

Logs showing the wireless client stuck in a Wi-Fi handshake loop (1/4 → 2/4 → 1/4 → 2/4 → … never reaching 3/4)

16:16:36  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake
16:16:36  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:37  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake  ← retry, no 3/4
16:16:37  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:38  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake  ← retry
16:16:38  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:39  hostapd: wlan01: STA 62:de:21:52:e4:ec IEEE 802.11: disassociated
           ... client re-associates immediately, entire loop repeats ...

The device becomes unresponsive (unable to access the web UI or SSH) and requires a manual power cycle to recover

If so, please export the logs following the steps above and send them to us via private message.

Thank you for your cooperation and patience!

danp · May 10, 2026, 1:54pm

I'm experiencing almost the same thing, except not with a different router but with IoT devices.

After a while, if the bss_transition option is enabled (which is the default setting), some of them start flooding the Flint3 (firmware version 4.8.4), causing moments when, even though I’m using the computer via Ethernet, they’re able to overload the router so much that I experience lag and they can even disconnect me.

This flood of IoT devices doesn’t happen all the time, but once in a while, when it starts, I have to cut the power to the whole house to reset them (these include, for example, Wi-Fi plugs for outdoor blinds). That’s why I turned off bss_transition a few days ago, and so far, it looks promising.

In that case, should I reconfigure wireless to get the logs or you are aware about such issue?

will.qiu · May 13, 2026, 7:59am

Hi

Sorry for the late reply.

Yes, if possible, please re-configure the wireless settings, reproduce the issue again, and then provide the logs for us to analyze.

This will help us determine whether it is the same issue as previously reported, and also help us collect more cases that may assist with troubleshooting and resolving the problem.

In addition, could you please clarify your network environment?

How is the BE9300 currently connected to the internet?
For example, is it directly connected to the ISP modem for dialing, or is there another upstream router involved?
Besides the BE9300, are there any other wireless routers in the home network?
If so, are there any duplicated SSIDs?
Approximately how many wireless and IoT devices are connected in the home network?

danp · May 14, 2026, 8:18am

Hi,

it is directly connected to GPON, then using PPPoE for authentication
currently there are no additional wireless router at home. SSID name and channels are different comparing to what my neighbour have
in summary, it is 11 devices connected to the IoT SSID. Each network band has own name: Nebula_2G, Nebula and Nebula_6G. Currently MLO SSID is disabled, but even when it is enabled it does not affect on the network issue.

I will try collect logs next week - sorry for such delay.

will.qiu · May 14, 2026, 9:07am

Thank you for providing the detailed information.

That is perfectly fine for us — we will wait for your logs next week.