GL-MT1300 freeze, needs reboot

Hi,

I recently got a GL-MT1300 (Beryl) router and was quite happy with it, until I noticed it regularly freeze and sometimes reboot by itself.
By freeze, I mean connected devices can not ping the router, each other, or anything else.

I have the issue on firmware version 3.211, and I tried using the 3.215 beta1 one with similar outcome.

My setup is pretty straightforward:

  • WAN port connected to my ISP modem (in bridge mode)
  • Two LAN port connected to computers, and some devices on WiFi
  • Only one static DHCP lease
  • DNS configured to use Cloudflare with DNS over TLS

And that’s it. The only thing “specific” is that one device keep reconnecting to WiFi, it’s a “smart” device and that’s how it works.

Trying to debug things a bit I tried to find log after the device is restarted, but I found none. I also tried logging them continuously through SSH, and the last time it broke, I only got a hint that something’s gone wrong:

Sat Jun 18 01:06:15 2022 daemon.info dnsmasq-dhcp[3125]: DHCPREQUEST(br-lan) 192.168.34.101 --:–:–:–:–:–
Sat Jun 18 01:06:15 2022 daemon.info dnsmasq-dhcp[3125]: DHCPACK(br-lan) 192.168.34.101 --:–:–:–:–:-- espressif
Sat Jun 18 01:06:15 2022 user.notice mtk-wifi: new_station --:–:–:–:–:-- rax0
Sat Jun 18 01:06:15 2022 kern.warn kernel: [65725.284437] e8c, flush one!
Sat Jun 18 01:06:16 2022 kern.warn kernel: [65725.450231] ea6, flush one!
Sat Jun 18 01:06:16 2022 kern.warn kernel: [65725.920512] Rcv Wcid(3) AddBAReq
Sat Jun 18 01:06:16 2022 kern.warn kernel: [65725.924527] Start Seq = 00000000
Sat Jun 18 01:06:17 2022 kern.warn kernel: [65726.890314] f94, flush one!
Sat Jun 18 01:06:21 2022 kern.warn kernel: [65731.052736] 244, flush one!
Sat Jun 18 01:06:23 2022 kern.warn kernel: [65732.497192] 335, flush one!
Sat Jun 18 01:06:25 2022 kern.warn kernel: [65735.142493] 3aa, flush one!
Sat Jun 18 01:06:29 2022 kern.warn kernel: [65739.384709] 793, flush one!
Sat Jun 18 01:06:30 2022 kern.warn kernel: [65739.442472] 4df, flush one!
Sat Jun 18 01:06:30 2022 kern.warn kernel: [65740.183845] 817, flush one!
Sat Jun 18 01:06:31 2022 kern.warn kernel: [65741.388078] 549, flush one!
Sat Jun 18 01:06:34 2022 user.notice mtk-wifi: del_station --:–:–:–:–:-- rax0
Sat Jun 18 01:06:34 2022 kern.warn kernel: [65743.979777] AP SETKEYS DONE - AKMMap=WPA2PSK, PairwiseCipher=AES, GroupCipher=AES, wcid=3 from C4:DD:57:8D:40:BC
Sat Jun 18 01:06:34 2022 kern.warn kernel: [65743.979777]
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPDISCOVER(br-lan) --:–:–:–:–:–
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPOFFER(br-lan) 192.168.34.101 --:–:–:–:–:–
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPREQUEST(br-lan) 192.168.34.101 --:–:–:–:–:–
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPACK(br-lan) 192.168.34.101 --:–:–:–:–:-- espressif
Sat Jun 18 01:06:34 2022 user.notice mtk-wifi: new_station --:–:–:–:–:-- rax0
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65744.754855] Rcv Wcid(3) AddBAReq
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65744.758300] Start Seq = 00000000
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65744.989442] b2a, flush one!
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65745.074565] 617, flush one!
Sat Jun 18 01:06:36 2022 kern.warn kernel: [65745.628829] b96, flush one!
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.433537] c1c, flush one!
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.611247] Unhandled kernel unaligned access[#1]:
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.616051] CPU: 2 PID: 31121 Comm: kworker/2:0 Not tainted 4.14.241 #0
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.622817] Workqueue: events_long nf_ct_iterate_destroy [nf_conntrack]
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.629407] task: 8fe29980 task.stack: 8e252000
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.633921] $ 0 : 00000000 00000001 005e673f 00000001
client_loop: send disconnect: Broken pipe

It seems that there was a kernel panic, but only the first few lines got through SSH before the connection stopped working.

I have no idea what else to check from there.

My first thought would be: has it got a stable power source?
Sounds for me like a unstable power supply.

I’m using the power adapter that came with it in the box, plugged in an UPS. I could try looking for another one with the same specs, but I’m not sure the spares one I have around would be better anyway.

Also, it’s not easy to troubleshoot; yesterday it crashed three times in a two hours timeframe, and its been running fine since then.
Something else I’m currently trying was to stop the device that keeps disconnecting/reconnecting to the WiFi, but it is not clear if that helped or if I’ve just been lucky so far.

It was just a guess, because I can imagine a lot of people will use another USB-C source … okay.

Next guess: Did you keepd your settings while jumping around the firmwares? Than just upgrade a last time with the latest stable but do not keep the settings.

I did keep settings when moving up to the last version (when I got the device a few weeks ago), and also kept them when going to the beta version, which might have been a source of problem.

When I reverted to the stable one (3.211) I did clean settings (and packages, since I don’t use anything beyond the basic features). So far no crash, but I also stopped that pesky wifi device.

I guess I’ll keep this running for a few days to see if it remain stable. It would be nice if that was all there is to it, since it’s otherwise a pretty neat device.

So, I got this as clean as possible, including clearing all settings, but the issue happened again. Router froze with blue light, then restarted.

The last log I got through SSH before the network got down:

Mon Jun 20 18:34:12 2022 user.notice mtk-wifi: del_station --:--:--:--:--:-- rax0
Mon Jun 20 18:34:12 2022 daemon.info dnsmasq-dhcp[21064]: DHCPDISCOVER(br-lan) --:--:--:--:--:--
Mon Jun 20 18:34:12 2022 daemon.info dnsmasq-dhcp[21064]: DHCPOFFER(br-lan) 192.168.34.151 --:--:--:--:--:--
Mon Jun 20 18:34:12 2022 kern.warn kernel: [173998.835108] AP SETKEYS DONE - AKMMap=WPA2PSK, PairwiseCipher=AES, GroupCipher=AES, wcid=2 from C4:DD:57:8D:40:BC
Mon Jun 20 18:34:12 2022 kern.warn kernel: [173998.835108]
Mon Jun 20 18:34:12 2022 daemon.info dnsmasq-dhcp[21064]: DHCPREQUEST(br-lan) 192.168.34.151 --:--:--:--:--:--
Mon Jun 20 18:34:12 2022 daemon.info dnsmasq-dhcp[21064]: DHCPACK(br-lan) 192.168.34.151 --:--:--:--:--:-- espressif
Mon Jun 20 18:34:12 2022 user.notice mtk-wifi: new_station --:--:--:--:--:-- rax0
Mon Jun 20 18:34:13 2022 kern.warn kernel: [173999.611894] Rcv Wcid(2) AddBAReq
Mon Jun 20 18:34:13 2022 kern.warn kernel: [173999.615321] Start Seq = 00000000
Mon Jun 20 18:34:31 2022 user.notice mtk-wifi: del_station --:--:--:--:--:-- rax0
Mon Jun 20 18:34:31 2022 kern.warn kernel: [174017.789568] AP SETKEYS DONE - AKMMap=WPA2PSK, PairwiseCipher=AES, GroupCipher=AES, wcid=2 from C4:DD:57:8D:40:BC
Mon Jun 20 18:34:31 2022 kern.warn kernel: [174017.789568]
Mon Jun 20 18:34:31 2022 daemon.info dnsmasq-dhcp[21064]: DHCPDISCOVER(br-lan) --:--:--:--:--:--
Mon Jun 20 18:34:31 2022 daemon.info dnsmasq-dhcp[21064]: DHCPOFFER(br-lan) 192.168.34.151 --:--:--:--:--:--
Mon Jun 20 18:34:31 2022 daemon.info dnsmasq-dhcp[21064]: DHCPREQUEST(br-lan) 192.168.34.151 --:--:--:--:--:--
Mon Jun 20 18:34:31 2022 daemon.info dnsmasq-dhcp[21064]: DHCPACK(br-lan) 192.168.34.151 --:--:--:--:--:-- espressif
Mon Jun 20 18:34:31 2022 user.notice mtk-wifi: new_station --:--:--:--:--:-- rax0
Mon Jun 20 18:34:32 2022 kern.warn kernel: [174018.571080] Rcv Wcid(2) AddBAReq
Mon Jun 20 18:34:32 2022 kern.warn kernel: [174018.574719] Start Seq = 00000000
Mon Jun 20 18:34:50 2022 user.notice mtk-wifi: del_station --:--:--:--:--:-- rax0
Mon Jun 20 18:34:50 2022 daemon.info dnsmasq-dhcp[21064]: DHCPDISCOVER(br-lan) --:--:--:--:--:--
Mon Jun 20 18:34:50 2022 daemon.info dnsmasq-dhcp[21064]: DHCPOFFER(br-lan) 192.168.34.151 --:--:--:--:--:--
Mon Jun 20 18:34:50 2022 kern.warn kernel: [174036.432044] AP SETKEYS DONE - AKMMap=WPA2PSK, PairwiseCipher=AES, GroupCipher=AES, wcid=2 from --:--:--:--:--:--
Mon Jun 20 18:34:50 2022 kern.warn kernel: [174036.432044]
Mon Jun 20 18:34:50 2022 daemon.info dnsmasq-dhcp[21064]: DHCPREQUEST(br-lan) 192.168.34.151 --:--:--:--:--:--
Mon Jun 20 18:34:50 2022 daemon.info dnsmasq-dhcp[21064]: DHCPACK(br-lan) 192.168.34.151 --:--:--:--:--:-- espressif
Mon Jun 20 18:34:50 2022 user.notice mtk-wifi: new_station --:--:--:--:--:-- rax0
Mon Jun 20 18:34:51 2022 kern.warn kernel: [174037.205511] Rcv Wcid(2) AddBAReq
Mon Jun 20 18:34:51 2022 kern.warn kernel: [174037.209115] Start Seq = 00000000
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.501549] Unhandled kernel unaligned access[#1]:
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.506440] CPU: 3 PID: 12812 Comm: kworker/3:0 Not tainted 4.14.241 #0
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.513288] Workqueue: events_long nf_ct_iterate_destroy [nf_conntrack]
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.519964] task: 8cdb0000 task.stack: 8b5e0000
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.524556] $ 0   : 00000000 00000001 0103ad77 001fff00
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.529853] $ 4   : 0103ad37 80743c40 900dc7e6 00000001
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.535150] $ 8   : 00000000 00000001 00009e4b 001d6381
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.540447] $12   : 00000001 0000023a 00000000 77f512a0
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.545745] $16   : 8e150558 8e15055c 8b4200e0 fffffff0
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.551042] $20   : 00000000 00000b0a 805e0000 00000009
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.556339] $24   : 00000000 80008f34
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.561639] $28   : 8b5e0000 8b5e1da0 8e150000 8e14b464
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.566937] Hi    : 0000021e
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.569884] Lo    : 3750e000
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.572863] epc   : 803851f0 dst_release+0x20/0xb4
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.577762] ra    : 8e14b464 nf_ct_ext_destroy+0x44/0x64 [nf_conntrack]
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.584431] Status: 11007c03     KERNEL EXL IE
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.588693] Cause : 40800010 (ExcCode 04)
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.592767] BadVA : 0103ad77
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.595718] PrId  : 0001992f (MIPS 1004Kc)
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.599877] Modules linked in: rt2800usb rt2800lib pppoe ppp_async option usb_wwan sierra_net sierra rt2x00usb rt2x00lib rndis_host qmi_wwan pppox ppp_mppe ppp_generic nf_nat_pptp nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nf_conntrack_pptp mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE huawei_cdc_ncm ebtable_nat ebtable_filter ebtable_broute cp210x cfg80211 cdc_ncm cdc_ether xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_hashlimit xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_IPMARK xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY wireguard usbserial usbnet ts_fsm ts_bm slhc nft_set_rbtree nft_set_hash nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.671418]  nft_reject nft_redir_ipv4 nft_redir nft_quota nft_numgen nft_nat nft_meta nft_masq_ipv4 nft_masq nft_log nft_limit nft_flow_offload nft_exthdr nft_ct nft_counter nft_chain_route_ipv6 nft_chain_route_ipv4 nft_chain_nat_ipv4 nf_tables_ipv6 nf_tables_ipv4 nf_tables_inet nf_tables nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda mt_wifi iptable_raw iptable_mangle iptable_filter
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.741972]  ipt_ECN ipheth ip_tables ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_3 crc_ccitt compat_xtables compat cdc_wdm cdc_acm xt_u32 fuse sch_teql sch_sfq sch_red sch_prio sch_pie sch_multiq sch_gred sch_fq sch_dsmark sch_codel sch_cbq em_text em_nbyte em_meta em_cmp act_simple act_police act_pedit act_ipt act_gact act_csum libcrc32c act_connmark sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred i2c_gpio i2c_algo_bit i2c_dev ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.812787]  ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6t_NPT ip6t_MASQUERADE nf_nat_masquerade_ipv6 nf_nat nf_conntrack nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ip6_udp_tunnel udp_tunnel tun vfat fat ntfs nls_utf8 nls_iso8859_1 nls_cp437 sha1_generic ecb uas mmc_block usb_storage sdhci_pltfm sdhci mtk_sd mmc_core leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd uhci_hcd ohci_platform ohci_hcd ahci libahci libata ehci_platform sd_mod scsi_mod ehci_hcd gpio_button_hotplug ext4 mbcache jbd2 exfat usbcore nls_base usb_common mii crc32c_generic
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.875606] Process kworker/3:0 (pid: 12812, threadinfo=8b5e0000, task=8cdb0000, tls=00000000)
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.884267] Stack : 00000000 fffffff0 75015000 0000065c 8b4200e0 8e150558 8e15055c 8e14b464
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.892696]         00000000 00000b0a 805e0000 00000009 8b4200e0 8b420120 00000003 8e140474
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.901120]         8cdb0000 805e0000 8b4200e0 8e1412e4 8b4200e0 803b8b00 8063dda0 805de1f4
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.909542]         8123fda0 00000001 8b4200e0 8e1420b4 8cdb0000 804c5310 8cdb0320 804c0000
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.917965]         805e0000 806012a0 8e150000 8e150000 00000020 00000001 8e1504e4 8e14d184
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.926390]         ...
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.928915] Call Trace:
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.931475] [<803851f0>] dst_release+0x20/0xb4
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.936155] [<8e14b464>] nf_ct_ext_destroy+0x44/0x64 [nf_conntrack]
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.942549] [<8e140474>] nf_conntrack_free+0x30/0x80 [nf_conntrack]
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.948925] [<803b8b00>] nf_conntrack_destroy+0x20/0x2c
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.954272] [<8e1420b4>] nf_ct_iterate_destroy+0x1d4/0x4b4 [nf_conntrack]
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.961143] Code: afb00014  0000000f  24820040 <c0500000> 2603ffff  e0430000  1060fffc  2610ffff  0000000f
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.970957]
Mon Jun 20 18:35:00 2022 kern.warn kernel: [174045.974000] ---[ end trace c71ecd6281b715ce ]---

Searching around doesn’t yield much for this. This thread seems to have a similar stacktrace before the kernel panic, but no solution so far.

All I can say is that it’s unlikely to be an issue related to free memory. Monitoring free/available memory show no issue there:

Mon Jun 20 18:34:50 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         253124       55704      166464         156       30956      159288
Swap:             0           0           0
Mon Jun 20 18:52:09 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         253124       55900      166264         160       30960      159092
Swap:             0           0           0

I’m not an expert such deep into the kernel but from what I can read:

  • nf_conntrack is the ‘netfilter’
  • nf_ct_iterate_destroy wants to clear up a connection
  • Modules linked in: rt2800usb rt2800lib […] is maybe some kind if closed source issue

All this should appear more often, if it is a general issue. I can’t reproduce or confirm it on my beryl (firmwares from last stable over beta to snapshot)
Another guess: is IPv6 active?

I can’t find the rcu-Part in the OpenWRT source, quick. Maybe someone else with more insights can take over :slight_smile:

I did not enable IPv6 (my ISP doesn’t even provide it anyway).

I do not have anything peculiar setup like custom firewall configuration or anything. The only changes I made from a full reset (installing the latest stable release without keeping settings or packages) is setup some static DHCP leases and changing the IP range.
I doubt this much would have such an impact, especially seeing the small amount of reports.

What you are saying is interesting though. I did mention it because I was suspicious of it, but I have a device that literally connect and disconnect every twenty seconds or so. I could see that triggering a rare issue in this area (I doubt any sane network would have such a situation).

I will disable this again, and see if I can keep a stable system for longer.

Same issue here. Brand new Beryl with firmware 3.211. Used 3 different power supplies. The original USB-C that came with the router, an Apple 67W MacBook charger, and the output from Thunderbolt 3 dock (100W). Same thing. Runs great for a few hours/days. Then locks up. Can’t ping it, can’t connect to it in any way. Only a hard reboot restarts this.

I’ll post more info from logs the next time it happens now that I’m looking for it.

Cool unit, but I can’t recommend this to anyone because of instability.

I tried the two “fixes” from that other thread : disabling hardware NAT offload and stopping gl_tertf. I figured, worst case scenario it would not change anything, so why not.

At the very least the logs are not spammed with the following line anymore.

kern.warn kernel: [ 4595.849410] 545, flush one!

I’ll let it run this way, if it gets an uptime of more than two days then maybe these would help.

I’m now at 2 days 7 hours of uptime, while it would not go over 10 hours previously. I’m not sure the “fixes” above did anything (I’ll give it a bit more time) but it is encouraging at least.

I will let developer check your fix. Maybe hardware NAT and gl_tertf can be a problem.