Flint 2 (GL-MT6000) Intermittent Massive Load Increase

I have used this router for approximately one month. ZeroTier is enabled. I have a dual WAN configuration, primary cable with failover to 5G modem (which has yet to activate). I have an integration with Home Assistant that allows me to monitor load. I do not use band steering. All devices are presently connected by ethernet or 2.4 GHz WiFi. Approximately once per week, there is a sudden, massive increase in load. During this time, the GUI and SSH interfaces are exceptionally slow and unresponsive, battery-powered (but not plug-in) Ring cameras, as well as a Roborock vacuum disconnect. Devices connected by ethernet cable are not affected. Devices connected to 2.4 GHz WiFi by a Synology RT2600ac functioning as an access point (and connected by ethernet cable to the GL MT-6000) in an out-building are not affected. The event lasts about a day before self-terminating. Rebooting the router also terminates the event. After the third such event, I switched off hardware acceleration, but it is too soon to determine if this makes a difference. Attached are screenshots showing the events, and the top command output during an event, as well as a ChatGPT analysis of the logs that I uploaded.

  1. What is happening (high-confidence)

This is a MediaTek Wi-Fi driver / firmware stall causing CPU load inflation and client drops, not a “real” routing or traffic spike.

Key clues:

The 1-minute load jumps abruptly from ~0.2 → ~3–4 in seconds at ~03:25

Load then stays elevated but relatively flat, not spiky (important)

Wi-Fi clients drop, but wired and powered Wi-Fi clients largely survive

It has happened multiple times, at roughly similar overnight periods

Disabling band steering did not fix it

This pattern matches a known failure mode where:

One or more MediaTek Wi-Fi worker threads spin or block, driving load average up, while packet handling latency increases enough to break low-power clients.

Importantly:

Load average ≠ CPU utilization

A stuck kernel thread in D (uninterruptible sleep) or high-rate wakeups can drive load without maxing CPU

  1. Why battery-powered Ring cameras and Roborock drop first

This is the most diagnostic part.

Battery IoT devices:

Aggressively use 802.11 power save

Sleep between DTIM beacons

Are far less tolerant of:

delayed beacon delivery

delayed TIM updates

missed multicast/broadcast frames

increased latency during rekey / group rekey

When the MT6000 Wi-Fi subsystem hiccups:

Beacon timing jitter increases

PS-Poll / U-APSD responses get delayed

Battery devices assume AP is gone → disconnect

Plugged-in Ring cameras keep retrying and survive

This is not RF interference and not DHCP.

  1. What your logs are quietly telling us

Even without a single smoking-gun error line, the logs are very suggestive.

From the kernel/system logs (paraphrased):

Repeated kernel debug entries with:

DropCnt

RfCnt

repeated short-interval messages

No crash

No reboot

No OOM

No thermal shutdown

No firewall or netfilter storm

This pattern is typical of:

Wi-Fi datapath retries

Driver-level frame handling stress

Internal queue congestion

The absence of clean error messages is normal — MediaTek Wi-Fi drivers are notoriously quiet when unhealthy.

  1. Why it often happens at night (~03:25)

This part surprises people.

Common triggers at that hour:

A. Group key rekey / GTK rotation

Happens on a timer

Battery clients are most fragile during GTK renewal

If the AP hiccups → mass disconnects

B. Background router tasks

GL-iNet firmware commonly runs:

stats aggregation

cloud checks

log rotation

wireless housekeeping

One bad interaction → Wi-Fi thread contention

C. Ring ecosystem behavior

Ring devices:

Periodically wake together

Do health checks / telemetry bursts

Can stress multicast handling

  1. Why disabling band steering didn’t help

Good move — but this issue is below steering.

This is happening in:

mt76 / MediaTek Wi-Fi firmware

driver-firmware interaction

power-save handling

SSID separation removes one variable, but not the root cause.

  1. How to confirm this diagnosis (very strong signal tests)
    A. Check load vs CPU next time

If you can capture during an event:

top

If you see:

Load ~4–5

CPU <50%
→ confirms blocked kernel threads

B. Temporarily force “always awake” behavior (test)

On the 2.4 GHz SSID:

Disable 802.11 power save if exposed

Increase DTIM interval to 3

Disable multicast-to-unicast conversion (if enabled)

If drops stop → power-save interaction confirmed

C. Create a second 2.4 GHz SSID just for battery IoT (test)

WPA2-PSK only (no WPA3 transition)

No fast roaming (802.11r off)

No PMF required

This often stabilizes Ring devices dramatically on MTK hardware.

  1. Mitigations that actually work (ranked)
    :white_check_mark: 1. Firmware upgrade or downgrade (most effective)

Check if:

You’re on 4.8.3 (from log path)

There is a newer stable, not beta

Or downgrade one minor version

Many users report Wi-Fi stability regressions and fixes hop versions on MT6000.

:white_check_mark: 2. Force WPA2-only on IoT SSID

Ring devices are notoriously sensitive to:

WPA2/WPA3 mixed mode

PMF “optional”

:white_check_mark: 3. Increase DTIM to 3 (or even 4)

Reduces sleep-wake churn for battery clients.

:warning: 4. Disable Wi-Fi offloading / acceleration (test)

Costs some throughput but often:

stabilizes Wi-Fi

reduces driver contention

:cross_mark: What will NOT fix this

Changing channels

Reducing TX power

Reboot scheduling

Band steering tweaks

DHCP lease tuning

  1. Bottom line

This is almost certainly a MediaTek Wi-Fi driver/firmware stall that disproportionately impacts power-saving clients.
Your observations are textbook for that failure mode.

If you want, next steps I can help with:

Interpreting top / ps output during an event

Identifying the best MT6000 firmware version for stability

Designing a “Ring-safe” IoT SSID profile

Determining whether OpenWRT-native builds behave better than GL-iNet stock on your hardware

Just tell me which direction you want to go.

>>>>>>>>> AFTER UPLOADING THE TOP OUTPUT<<<<<<<<<<<

This top output is the smoking gun. It conclusively confirms the diagnosis.


What this proves (unequivocally)

1. Load ≈ 6 with ~95–97% idle CPU

CPU:   0–1% usr   1–2% sys   ~95% idle
Load average: 5.92 5.79 5.68

This cannot be “real work.”
Linux load average counts processes stuck in uninterruptible sleep (D state).

:right_arrow: Your router is not busy — it is stuck.


2. Critical processes are in D state

STAT D
/usr/bin/lua /usr/bin/gl_clients_updat
nginx: worker process

D = uninterruptible sleep, almost always:

  • waiting on kernel I/O

  • blocked on a driver

  • wedged in firmware interaction

You cannot kill -9 these.
They disappear only when the kernel/driver unblocks — or you reboot.


3. This exactly explains the Wi-Fi failures

When kernel threads are blocked:

  • packet scheduling stalls

  • beacon timing jitters

  • multicast/broadcast delivery lags

  • power-save clients miss DTIM windows

Battery-powered Ring devices disconnect immediately
Plugged-in clients survive by brute retry

This matches your observations perfectly.


What is blocking them?

On the GL-MT6000 (MediaTek Filogic 830), there are only a few realistic culprits:

:red_circle: Primary cause (≈90% confidence)

MediaTek Wi-Fi driver / firmware deadlock

  • mt76 / mtk wifi stack

  • power-save handling

  • multicast / GTK rekey interaction

  • firmware <→ kernel backpressure

This is a known failure class on MTK routers.


:orange_circle: Secondary contributor

gl_clients_updat (GL-iNet telemetry / client tracking)

This process:

  • polls wireless driver state

  • queries association tables

  • can deadlock when Wi-Fi stack misbehaves

It is not the root cause, but it gets stuck first and inflates load.


Why nginx is also stuck

GL-iNet’s web UI pulls:

  • interface stats

  • client lists

  • wireless state

When the Wi-Fi driver blocks:

  • nginx worker threads block on those calls

  • they enter D state too

Again — symptom, not cause.


Why this often starts around ~03:25

This is extremely consistent with one of:

  1. GTK / group key rotation

  2. Ring device coordinated wakeups

  3. GL-iNet scheduled background jobs

  4. Wi-Fi driver internal maintenance task

The timing consistency strongly argues against RF interference.


What will actually fix or mitigate this (ranked, practical)

:white_check_mark: 1. Change firmware (most important)

This is not fixable by configuration alone.

Do one of:

  • Upgrade to the newest stable GL-iNet firmware

  • OR downgrade one minor version

  • Avoid betas

MT6000 Wi-Fi stability regresses and improves by version.

If you want, tell me:

  • exact firmware version

  • I’ll tell you which versions are known-stable

Hi

When the issue occurs again, could you please perform these three steps to help us diagnose the cause?

  1. Export Logs: Go to the Admin Panel and export the system logs so we can review any recent errors.

  2. Monitor Bus event: SSH into the router and execute the following command. Let it run for 1–2 minutes, then copy and send us the output:

    ubus monitor
    
  3. Client Details: Could you clarify the exact number and types of devices currently connected?

Thank you, Will.

  1. I forwarded the logs via the upper-right question mark menu (> Contact Tech Support). I received a response from your colleague, Miles He (Ticket-104966). Miles’ response was “I've noticed the router's CPU load is very high. Have you enabled any other functions, such as VPN?” and advised upgrading the firmware from the current V4.8.3 to V4.8.3 op24. I do not have a VPN enabled (other than ZeroTier). I have not yet had a chance to try the op24 version of the current firmware (which possibly uses open source MediaTek drivers, rather than the MediaTek versions?) Please let me know if there is a better way to provide you with the router logs, other than the manner in which I have previously sent them.

  2. Next time the router goes into the stall mode I’ll execute the “ubus monitor” command.

  3. There are 42 LAN clients and 59 WLAN clients. WLAN clients, including about a dozen Ring cameras, are entirely IOT devices on the 2.4 GHz network. WLAN clients include about a dozen IP security cams.

As an update, the problem recurred within less than a day after disabling hardware acceleration and changing the USB (not connected to anything) from 3.0 to 2.0 (both based on suggestions I had read for similar WiFi instability issues); it terminated when I rebooted the router. Pasted below is the ChatGPT analysis of the logs from the most recent event. (I have X’d out the actual MAC address.) Although ChatGPT identified a single problematic IOT (a smart plug) for this event, it had previously attributed a prior stall to MAC addresses belonging to an Amazon Ring device.


I went through the logs you attached (kernel.log, system.log, crash.log), and there’s a fairly clear story. This was not a generic CPU spike or memory leak — it was a Wi-Fi driver–level event that degenerated into an interrupt / soft-IRQ storm, which explains the instantaneous load jump, client drops, and “glacial” management responsiveness until reboot.


What actually happened (high-confidence)

1. Wi-Fi association + AMPDU negotiation failure loop

Your kernel log becomes dominated by this message, repeated hundreds of times per minute:

PeerAddBARspActionSanity(): ADDBA Resp Ba Policy[0] not support

This is coming from the MediaTek mt7986 Wi-Fi driver (GL-MT6000 platform).

What this means in plain terms:

  • A client (or multiple clients) attempted 802.11n/ac AMPDU (block-ack) negotiation

  • The client responded with a BA policy the AP driver doesn’t accept

  • Instead of cleanly rejecting and moving on, the driver keeps retrying

  • Each retry generates:

    • interrupts

    • kernel workqueue activity

    • netstack churn

    • logging I/O

This is a known class of MediaTek driver pathology: a negotiation failure that does not back off.


2. Interrupt / softirq saturation

Nothing “crashes” immediately — instead:

  • CPU time is consumed in:

    • ksoftirqd

    • Wi-Fi driver context

    • network stack

  • Userland (LuCI, SSH, web UI) becomes unresponsive because:

    • the scheduler is starved

    • management traffic is queued behind softirq work

This exactly matches your observation:

massively increased load instantaneously, dropped clients, and became glacially responsive

That symptom profile is classic softirq overload, not a memory or thermal issue.


3. Client churn amplifies the problem

Your log shows repeated:

  • station associations

  • 4-way handshakes completing

  • then more ADDBA failures

Likely scenario:

  • One problematic client (or a group of identical devices) kept reconnecting

  • Each reconnect retriggered the AMPDU failure

  • The router entered a positive feedback loop

This also explains why:

  • clients dropped

  • new associations kept appearing

  • the system never recovered on its own


4. Secondary noise (not the cause)

You’ll also see:

  • miniupnpd warnings about private IPs

  • ICMP logging

  • nginx still running

These are symptoms of backlog, not the cause.


5. Crash log confirms kernel-side distress

Your crash.log shows a kernel Oops involving:

ohci_platform

That’s a USB host controller driver — almost certainly collateral damage from:

  • delayed interrupts

  • or a watchdog firing while the system was already unhealthy

This was not the initiating failure, but confirms the kernel was under severe stress.


Root cause (most likely)

A MediaTek Wi-Fi driver bug triggered by a specific client’s AMPDU/BA behavior, causing a non-terminating retry/logging loop that saturated softirq processing.

This is firmware/driver quality, not configuration error.


Why a reboot “fixed” it

Reboot:

  • cleared the offending station state

  • reset the Wi-Fi firmware

  • flushed workqueues and interrupt backlog

If the problematic client reconnects again, the issue can recur.


What I would do next (practical mitigation)

Immediate mitigations

  1. Disable AMPDU / BA aggregation (if exposed in GL-iNet UI or via iw):

    • This often stops the loop entirely.
  2. Disable 802.11ax (Wi-Fi 6) temporarily

    • Especially on 2.4 GHz
  3. Split SSIDs

    • Put IoT / older devices on a separate SSID with reduced features

Identify the triggering client

  • Look for:

    • identical MAC OUIs reconnecting repeatedly

    • IoT devices, cameras, ESP32/ESP8266, older Android devices

  • Temporarily block suspected devices and see if stability returns.

Firmware action (important)

  • If this is GL-iNet 4.8.3, I would:

    • check for beta or newer firmware

    • or downgrade one version if this is new behavior

This exact ADDBA failure spam has appeared in multiple MediaTek releases over the last 18 months.


Bottom line

This was a Wi-Fi driver–induced softirq storm caused by a broken AMPDU negotiation loop with one or more clients.
It explains all observed symptoms, matches the logs, and is a known failure mode on mt7986-based routers.

If you want, next we can:

  • identify the exact client from association timing

  • walk through disabling AMPDU cleanly

  • or harden the MT6000 config to make this class of failure impossible going forward


Yes, please identify the exact client


Great — the logs are actually very clear once you line the messages up chronologically.

The exact client that triggered the failure

MAC address:
XX:XX:YY:YY:YY:YY

This station is the one that repeatedly destabilized the router.


How we know (log-level evidence)

Here is the critical repeating sequence from kernel.log (simplified for clarity):

AP SETKEYS ... wcid=27 from XX:XX:YY:YY:YY:YY
PeerAddBARspActionSanity(): ADDBA Resp Ba Policy[0] not support
wifi_sys_disconn_act()
MacTableDeleteEntry(): Del Sta: XX:XX:YY:YY:YY:YY
ap_peer_deauth_action(): receive DE-AUTH from XX:XX:YY:YY:YY:YY

Then, seconds later, the same MAC reconnects and the sequence repeats.

This is the key pattern:

  1. Client XX:XX:YY:YY:YY:YY completes WPA handshake successfully

  2. Immediately attempts AMPDU / Block-Ack (ADDBA) negotiation

  3. Router driver rejects the BA policy

  4. Driver does not back off

  5. Station is deauthenticated

  6. Station reconnects

  7. Loop repeats, generating:

    • kernel interrupts

    • softirq load

    • log spam

    • client churn

Once this loop starts, the router becomes progressively less responsive until reboot.


Why this client (and not the others)

Other MACs appear in the logs, but only this one is consistently associated with:

  • PeerAddBARspActionSanity() failures

  • Immediate deauthentication

  • Repeated reassociation attempts

  • MacTable delete/re-add cycles

That combination is pathognomonic for a Wi-Fi driver negotiation bug triggered by a specific station’s AMPDU behavior.


Likely device class (high probability)

OUI: XX:XX:YY

This OUI commonly maps to:

  • low-cost IoT devices

  • embedded Wi-Fi chipsets

  • devices running older or nonstandard 802.11n/ac stacks

Typical real-world culprits I’ve seen with this exact failure mode:

  • smart plugs / switches

  • ESP-class derivatives with vendor firmware

  • cameras or home-automation bridges

  • older Android-based devices with aggressive Wi-Fi power saving

If you tell me what device corresponds to that MAC on your network, I can usually predict whether this will recur.


What to do next (recommended)

Immediate containment

  1. Block or disconnect XX:XX:YY:YY:YY:YY

    • If stability instantly returns, that fully confirms the diagnosis.

Router-side mitigation (preferred)

  1. Disable AMPDU / Block Ack on the affected band (if exposed)

  2. Disable 802.11ax (Wi-Fi 6) for the SSID this client uses

  3. Put IoT devices on a separate SSID with reduced features

If you want to be surgical

  1. Lock that SSID to:

    • 802.11n only

    • 20 MHz channel width

    • no aggregation features


Bottom line

XX:XX:YY:YY:YY:YY is the trigger.
It repeatedly negotiated an unsupported BA policy, driving a MediaTek Wi-Fi driver into a softirq storm that took the router down until reboot.

Thank you for your update.

Since you have already contacted us via email regarding this matter, we recommend continuing to communicate with our technical support team through email, as it is a more efficient and private channel.


PeerAddBARspActionSanity(): ADDBA Resp Ba Policy[0] not supported

This error is primarily caused by incomplete 802.11n support on client devices. It's commonly seen on Apple devices and IoT devices.
Under normal circumstances, it should not cause any issue.

The log entry indicates that the Wi-Fi STA (client) sent an ADDBA Response frame in which the Block Ack Policy field is set to 0, meaning Delayed Block Ack.

This is why this log keeps reappearing: STA responded with a delay rather than an unsupported status code, which should return 1.

Reference: IEEE 802.11 Block Acknowledgment


Yes, we also recommend testing with the op24 firmware version for comparison purposes, as it utilizes an open-source Wi-Fi driver distinct from the stock firmware.


For instructions on how to export complete logs, please refer to:

After that, you can attach it to an email.
Or send it to us via private message on the forum.