[BUG REPORT] Flint 3 (GL-BE9300) – Watchdog Reset triggered by hostapd EAPOL Retry-Loop & DMS Conflict

Product: GL.iNet Flint 3 (GL-BE9300)
Firmware: Beta (kernel 5.4.213, IPQ5332 SoC)
Severity: High – complete system freeze, web interface and SSH unreachable, requires manual power cycle (pull power plug)
Date: 2026-02-26


Summary

A temporary SSID conflict (secondary AP briefly active with same SSID, only 80 seconds) triggers an uncontrolled EAPOL 4-Way Handshake retry-loop in hostapd. This leads to resource exhaustion (Memory/CPU) over ~25 minutes, resulting in a complete system freeze. The web interface and SSH become unreachable; the router does not recover on its own. A manual power cycle (pulling the power plug) is required. No kernel panic, no watchdog reset, no automatic recovery – the hardware watchdog either did not fire or is not configured.

The issue is a combination of two bugs that reinforce each other:

  1. The QCA driver accepts a DMS-incompatible client into a "Zombie-Association" state instead of rejecting it cleanly.
  2. hostapd has no retry-limit or backoff for failed 4-Way Handshakes, causing an infinite loop that exhausts system resources.

Environment

Router GL.iNet Flint 3, GL-BE9300
Kernel Linux 5.4.213, ARM64, IPQ5332 SoC (4 cores)
RAM 881 MB usable
WiFi chipsets QCN9224 (5/6 GHz), QCA5332 (2.4 GHz)
Active services ECM, NSS offload, Netify Agent 5.1.24, NordVPN WireGuard, ZeroTier, Samba 4.18.8
Trigger AVM FRITZ!Box 7590, same SSID briefly activated via hardware button (80 s)
Uptime at crash ~68 hours (booted 2026-02-23 ~20:32)

Precise Timeline

Both log sources (Flint 3 syslog + FRITZ!Box event log) correlated by wall clock:

16:10–16:16   Clients da:66:ee:d7:f3:93 and 62:de:21:52:e4:ec begin roaming between bands
16:16:29      kernel: ieee80211_recv_asreq – "assoc req from dms not-supported sta" (first occurrence)
16:16:36–48   ⚠ EAPOL loop starts on wlan01: 1/4 → 2/4 → 1/4 → 2/4 → ... never reaches 3/4
16:19:23      ⚡ FRITZ!Box WLAN ON  (2.4+5 GHz, SSID "elias")  ← from FRITZ!Box event log
16:20:43      ⚡ FRITZ!Box WLAN OFF                             ← from FRITZ!Box event log
              [80 seconds of SSID conflict were sufficient to trigger the crash]
16:23:42      ⚠ Last normal syslog entry – syslogd can no longer write (resource exhaustion)
16:23–16:49   ████ 25-minute syslog silence – kernel IRQs still alive, userspace fully frozen
16:49:11      kernel: [245824s] lan1 link down  ← last kernel-level event
16:49:15      kernel: [245828s] lan1 link up, speed 2500  ← absolute last log entry
~16:49        ❌ SYSTEM FROZEN – web interface unreachable, SSH unreachable
              No watchdog reset, no automatic recovery – manual power cycle required
16:33:26      Router boots after manual power cycle (fresh boot, kernel uptime [0s])

Technical Root Cause Analysis

Bug 1 – DMS Incompatibility creates a "Zombie-Association"

The QCA driver identifies a DMS (Directed Multicast Service) request from the STA but fails to reject the association with Status Code 43 (Unsupported Capability). Instead it logs the error and proceeds, leaving the connection in an unstable state:

kernel: [243869] wlan: [0:E:ANY] ieee80211_recv_asreq: assoc req from dms not-supported sta :62:de:21:52:e4:ec
kernel: [243862] wlan: [0:E:ANY] ieee80211_recv_asreq: assoc req from dms not-supported sta :da:66:ee:d7:f3:93

The client is now "associated" in a state the driver itself considers invalid – a Zombie-Association that feeds directly into the EAPOL loop below.

Bug 2 – hostapd EAPOL Infinite Retry (no retry-limit, no backoff)

hostapd enters an infinite loop of Message 1/4 → 2/4 without a retry counter, exponential backoff, or temporary client ban. Each retry holds kernel PTK negotiation state:

16:16:36  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake
16:16:36  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:37  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake  ← retry, no 3/4
16:16:37  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:38  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: sending 1/4 msg of 4-Way Handshake  ← retry
16:16:38  hostapd: wlan01: STA 62:de:21:52:e4:ec WPA: received EAPOL-Key frame (2/4 Pairwise)
16:16:39  hostapd: wlan01: STA 62:de:21:52:e4:ec IEEE 802.11: disassociated
           ... client re-associates immediately, entire loop repeats ...

Parallel: a second client (da:66:ee) looping without even sending a 2/4 reply, causing hostapd to spam msg 1/4 every second with no response.

Bug 3 – Complete System Freeze, no recovery mechanism

Continuous sub-second EAPOL retries from two STAs accumulate kernel PTK state. After ~7 minutes of looping, syslogd stops recording ~25 minutes before the final freeze – indicating a full userspace hang. Neither the OOM killer, nor procd, nor the hardware watchdog triggered any recovery. The web interface and SSH both became unreachable. The only resolution was pulling the power plug manually.

16:23:42  hostapd: wlan02: STA da:66:ee:d7:f3:93 IEEE 802.11: disassociated  ← LAST SYSLOG ENTRY
          [25 minutes of complete silence]
16:49:11  kernel: [245824s] rtl8372-mdio: lan1 link down    ← hardware IRQ still functional
16:49:15  kernel: [245828s] rtl8372-mdio: lan1 link up, speed 2500
          << no kernel panic, no OOM killer, no watchdog reset >>
          << web interface unreachable, SSH unreachable >>
          << router did NOT recover – manual power cycle required >>

Steps to Reproduce

  1. Set up GL-BE9300 with a specific SSID and at least two WiFi clients that trigger the DMS error on association (look for ieee80211_recv_asreq: assoc req from dms not-supported sta in the logs).
  2. Briefly activate a second AP nearby broadcasting the same SSID and security settings (even 80 seconds is sufficient).
  3. Trigger client roaming/association attempts between the two APs.
  4. Monitor hostapd logs for continuous 1/4 → 2/4 cycles.
  5. After ~25 minutes of no new syslog entries: complete freeze. Web interface and SSH unreachable. Manual power cycle required.

Expected vs. Actual Behaviour

Expected Actual
DMS-unsupported client Rejected at association with Status Code 43, client not accepted Accepted with error logged, Zombie-Association state feeds EAPOL loop
EAPOL failure handling Client banned after N retries, kernel state freed Infinite retry loop, state accumulates until freeze
Resource exhaustion OOM killer or procd respawn of hostapd, or watchdog reset as last resort Complete silent freeze, no OOM, no respawn, no watchdog – requires manual power cycle

Suggested Fixes

1. Implement EAPOL retry-limit in hostapd
Count per-client 1/4 retransmissions. After N (e.g. 5) consecutive failures, call ap_sta_disconnect() and add the MAC to a temporary deny-list (e.g. 60 s). Relevant hostapd.conf knob: wpa_ptk_rekey=60.

2. Strict DMS handling – prevent Zombie-Association
Ensure the QCA driver/hostapd rejects DMS-requesting STAs with Status Code 43 (Unsupported Capability) at association if the capability is unsupported, instead of logging and proceeding into an invalid state.

3. procd monitoring of hostapd
Ensure hostapd is correctly monitored by procd with a respawn limit, so a stuck instance is restarted before a full system freeze occurs.

4. Verify hardware watchdog is active and correctly configured
In this incident the hardware watchdog did not fire – the system froze indefinitely without any automatic recovery. The watchdog should be a last-resort safety net. Please verify that the watchdog daemon is running and that the kick interval is correctly configured for the IPQ5332 platform.

Hi

Could you please clarify the following:

  1. Can this issue be reproduced consistently?
  2. What firmware version is your Flint 3 currently running? You can find it under System → Upgrade
  3. Please export the full device logs and send them to us via private message so we can analyze further

Also, please try the following to see if it helps:

  1. Try setting the 2.4 GHz Wi-Fi mode to 802.11n/g, since some IoT devices may not work well with ax/be modes.
uci set wireless.wifi0.hwmode='11ng'
uci commit wireless

/etc/init.d/network restart
  1. Try disabling 802.11v on the corresponding SSID, as the log indicates that the issue appears to be related to roaming.
# On Main SSID 
uci set wireless.wifi2g.bss_transition='0'

# On Guest SSID
uci set wireless.guest2g.bss_transition='0'

# On MLO SSID
uci set wireless.wlanmld2g.bss_transition='0'

uci commit wireless
/etc/init.d/network restart

Final Update: Workaround confirmed — FRITZ!Box SSID conflict test passed

After confirming the bss_transition=0 fix resolves the POCO and Lenovo issues, we ran the original freeze-trigger test: activating the FRITZ!Box 7530 with the same SSID ("elias") while clients were actively roaming between bands.

Result: No EAPOL loop, no freeze, router remained fully stable throughout.

Log shows clean roaming under SSID conflict conditions — including rapid band switching within the same second that previously triggered the crash:

07:38:37  wlan1: STA 46:39:.. pairwise key handshake completed ✓
07:38:37  wlan0: STA 46:39:.. disassociated → authenticated
07:38:38  wlan0: STA 46:39:.. pairwise key handshake completed ✓

No dms not-supported, no EAPOL retry loop, no syslog silence, no freeze. Web interface and SSH remained reachable throughout.


The complete workaround — apply to all six interfaces

uci set wireless.wifi2g.bss_transition='0'
uci set wireless.wlanmld2g.bss_transition='0'
uci set wireless.wifi5g.bss_transition='0'
uci set wireless.wlanmld5g.bss_transition='0'
uci set wireless.wifi6g.bss_transition='0'
uci set wireless.wlanmld6g.bss_transition='0'
uci commit wireless
wifi reload

Important: The fix must be applied to all six interfaces simultaneously. Setting only the 2.4 GHz interfaces caused a secondary issue: the Lenovo Tab P11 Pro (a non-MLO device) got authenticated on wlan12 (the MLO 5 GHz link) but was immediately disassociated — repeated three times, then gave up entirely. With inconsistent bss_transition values across the MLO bundle, non-MLO clients could not associate on any interface at all. Setting all six interfaces consistently resolved both the POCO EAPOL loop and the Lenovo association failure, while preserving full MLO functionality on 5/6 GHz.


One final remark on the freeze behavior itself

Even though the workaround now prevents the freeze, the original failure mode remains concerning and should be addressed independently. When the EAPOL loop occurred, the router entered a complete silent freeze — web interface unreachable, SSH unreachable, no OOM killer, no procd respawn, no watchdog reset. The only recovery was physically unplugging the power cable.

For a router, having to pull the power cable as the only recovery option is not ok. A router should never require manual intervention to recover from a software-level fault. I would strongly recommend:

  • Hardware watchdog: Verify that the watchdog is correctly configured and actively kicking on the IPQ5332 platform — in this incident it did not fire despite a complete userspace freeze

  • procd monitoring: Ensure procd properly monitors and respawns hostapd if it becomes unresponsive

  • EAPOL retry limit: Consider a maximum retry limit in hostapd to prevent unbounded EAPOL loops from consuming kernel state in the first place

The workaround addresses the trigger. The resilience gap remains open.

2 Likes

We need to collect additional logs and configuration details at the time the issue occurs so we can work with Qualcomm to diagnose and troubleshoot it.

Could you please help by following the steps below?

  1. Configure the BE9300 to a state where the issue can be reproduced. After this, no further configuration changes should be needed for the following steps.

  2. Run the commands below to place the relevant configurations and logs into a specified directory.
    (Since you mentioned that a reboot is required to regain access to the device, the data is currently stored in flash. If storage space is limited, please insert a USB drive and adjust the directory accordingly.)

mkdir -p /root/wifi_debug
cp /etc/config/wireless /root/wifi_debug/ 2>/dev/null
cp /etc/config/network  /root/wifi_debug/ 2>/dev/null
cp /etc/config/firewall /root/wifi_debug/ 2>/dev/null
cp /etc/config/dhcp     /root/wifi_debug/ 2>/dev/null
cp /etc/config/system   /root/wifi_debug/ 2>/dev/null
cp /var/run/hostapd-*.conf /root/wifi_debug/ 2>/dev/null
cp /tmp/run/hostapd-*.conf /root/wifi_debug/ 2>/dev/null
cp /var/run/wpa_supplicant-*.conf /root/wifi_debug/ 2>/dev/null
cp /tmp/run/wpa_supplicant-*.conf /root/wifi_debug/ 2>/dev/null
logread > /root/wifi_debug/logread.txt
dmesg > /root/wifi_debug/dmesg.txt
logread -f >> /root/wifi_debug/logread.txt &
iwinfo > /root/wifi_debug/iwinfo.txt 2>&1
iw dev > /root/wifi_debug/iw_dev.txt 2>&1
ubus call network.wireless status > /root/wifi_debug/wireless_status.json
ip addr > /root/wifi_debug/ip_addr.txt
ip route > /root/wifi_debug/ip_route.txt
  1. Reproduce the issue and wait until it occurs.

  2. After rebooting, archive the logs and configurations and send them to us via private message:

tar -czf /root/wifi_debug_$(date +%Y%m%d_%H%M%S).tar.gz -C /tmp wifi_debug

Thank you for the detailed instructions. I will run the debug collection script and reproduce the issue this weekend when the network is not in active use. I will send the logs via private message afterwards.

Thank you for your cooperation, and we look forward to your update!