I recently got a GL-MT1300 (Beryl) router and was quite happy with it, until I noticed it regularly freeze and sometimes reboot by itself.
By freeze, I mean connected devices can not ping the router, each other, or anything else.
I have the issue on firmware version 3.211, and I tried using the 3.215 beta1 one with similar outcome.
My setup is pretty straightforward:
WAN port connected to my ISP modem (in bridge mode)
Two LAN port connected to computers, and some devices on WiFi
Only one static DHCP lease
DNS configured to use Cloudflare with DNS over TLS
And that’s it. The only thing “specific” is that one device keep reconnecting to WiFi, it’s a “smart” device and that’s how it works.
Trying to debug things a bit I tried to find log after the device is restarted, but I found none. I also tried logging them continuously through SSH, and the last time it broke, I only got a hint that something’s gone wrong:
Sat Jun 18 01:06:15 2022 daemon.info dnsmasq-dhcp[3125]: DHCPREQUEST(br-lan) 192.168.34.101 --:–:–:–:–:–
Sat Jun 18 01:06:15 2022 daemon.info dnsmasq-dhcp[3125]: DHCPACK(br-lan) 192.168.34.101 --:–:–:–:–:-- espressif
Sat Jun 18 01:06:15 2022 user.notice mtk-wifi: new_station --:–:–:–:–:-- rax0
Sat Jun 18 01:06:15 2022 kern.warn kernel: [65725.284437] e8c, flush one!
Sat Jun 18 01:06:16 2022 kern.warn kernel: [65725.450231] ea6, flush one!
Sat Jun 18 01:06:16 2022 kern.warn kernel: [65725.920512] Rcv Wcid(3) AddBAReq
Sat Jun 18 01:06:16 2022 kern.warn kernel: [65725.924527] Start Seq = 00000000
Sat Jun 18 01:06:17 2022 kern.warn kernel: [65726.890314] f94, flush one!
Sat Jun 18 01:06:21 2022 kern.warn kernel: [65731.052736] 244, flush one!
Sat Jun 18 01:06:23 2022 kern.warn kernel: [65732.497192] 335, flush one!
Sat Jun 18 01:06:25 2022 kern.warn kernel: [65735.142493] 3aa, flush one!
Sat Jun 18 01:06:29 2022 kern.warn kernel: [65739.384709] 793, flush one!
Sat Jun 18 01:06:30 2022 kern.warn kernel: [65739.442472] 4df, flush one!
Sat Jun 18 01:06:30 2022 kern.warn kernel: [65740.183845] 817, flush one!
Sat Jun 18 01:06:31 2022 kern.warn kernel: [65741.388078] 549, flush one!
Sat Jun 18 01:06:34 2022 user.notice mtk-wifi: del_station --:–:–:–:–:-- rax0
Sat Jun 18 01:06:34 2022 kern.warn kernel: [65743.979777] AP SETKEYS DONE - AKMMap=WPA2PSK, PairwiseCipher=AES, GroupCipher=AES, wcid=3 from C4:DD:57:8D:40:BC
Sat Jun 18 01:06:34 2022 kern.warn kernel: [65743.979777]
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPDISCOVER(br-lan) --:–:–:–:–:–
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPOFFER(br-lan) 192.168.34.101 --:–:–:–:–:–
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPREQUEST(br-lan) 192.168.34.101 --:–:–:–:–:–
Sat Jun 18 01:06:34 2022 daemon.info dnsmasq-dhcp[3125]: DHCPACK(br-lan) 192.168.34.101 --:–:–:–:–:-- espressif
Sat Jun 18 01:06:34 2022 user.notice mtk-wifi: new_station --:–:–:–:–:-- rax0
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65744.754855] Rcv Wcid(3) AddBAReq
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65744.758300] Start Seq = 00000000
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65744.989442] b2a, flush one!
Sat Jun 18 01:06:35 2022 kern.warn kernel: [65745.074565] 617, flush one!
Sat Jun 18 01:06:36 2022 kern.warn kernel: [65745.628829] b96, flush one!
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.433537] c1c, flush one!
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.611247] Unhandled kernel unaligned access[#1]:
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.616051] CPU: 2 PID: 31121 Comm: kworker/2:0 Not tainted 4.14.241 #0
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.622817] Workqueue: events_long nf_ct_iterate_destroy [nf_conntrack]
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.629407] task: 8fe29980 task.stack: 8e252000
Sat Jun 18 01:06:37 2022 kern.warn kernel: [65746.633921] $ 0 : 00000000 00000001 005e673f 00000001
client_loop: send disconnect: Broken pipe
It seems that there was a kernel panic, but only the first few lines got through SSH before the connection stopped working.
I’m using the power adapter that came with it in the box, plugged in an UPS. I could try looking for another one with the same specs, but I’m not sure the spares one I have around would be better anyway.
Also, it’s not easy to troubleshoot; yesterday it crashed three times in a two hours timeframe, and its been running fine since then.
Something else I’m currently trying was to stop the device that keeps disconnecting/reconnecting to the WiFi, but it is not clear if that helped or if I’ve just been lucky so far.
It was just a guess, because I can imagine a lot of people will use another USB-C source … okay.
Next guess: Did you keepd your settings while jumping around the firmwares? Than just upgrade a last time with the latest stable but do not keep the settings.
I did keep settings when moving up to the last version (when I got the device a few weeks ago), and also kept them when going to the beta version, which might have been a source of problem.
When I reverted to the stable one (3.211) I did clean settings (and packages, since I don’t use anything beyond the basic features). So far no crash, but I also stopped that pesky wifi device.
I guess I’ll keep this running for a few days to see if it remain stable. It would be nice if that was all there is to it, since it’s otherwise a pretty neat device.
Searching around doesn’t yield much for this. This thread seems to have a similar stacktrace before the kernel panic, but no solution so far.
All I can say is that it’s unlikely to be an issue related to free memory. Monitoring free/available memory show no issue there:
Mon Jun 20 18:34:50 UTC 2022
total used free shared buff/cache available
Mem: 253124 55704 166464 156 30956 159288
Swap: 0 0 0
Mon Jun 20 18:52:09 UTC 2022
total used free shared buff/cache available
Mem: 253124 55900 166264 160 30960 159092
Swap: 0 0 0
I’m not an expert such deep into the kernel but from what I can read:
nf_conntrack is the ‘netfilter’
nf_ct_iterate_destroy wants to clear up a connection
Modules linked in: rt2800usb rt2800lib […] is maybe some kind if closed source issue
All this should appear more often, if it is a general issue. I can’t reproduce or confirm it on my beryl (firmwares from last stable over beta to snapshot)
Another guess: is IPv6 active?
I can’t find the rcu-Part in the OpenWRT source, quick. Maybe someone else with more insights can take over
I did not enable IPv6 (my ISP doesn’t even provide it anyway).
I do not have anything peculiar setup like custom firewall configuration or anything. The only changes I made from a full reset (installing the latest stable release without keeping settings or packages) is setup some static DHCP leases and changing the IP range.
I doubt this much would have such an impact, especially seeing the small amount of reports.
What you are saying is interesting though. I did mention it because I was suspicious of it, but I have a device that literally connect and disconnect every twenty seconds or so. I could see that triggering a rare issue in this area (I doubt any sane network would have such a situation).
I will disable this again, and see if I can keep a stable system for longer.
Same issue here. Brand new Beryl with firmware 3.211. Used 3 different power supplies. The original USB-C that came with the router, an Apple 67W MacBook charger, and the output from Thunderbolt 3 dock (100W). Same thing. Runs great for a few hours/days. Then locks up. Can’t ping it, can’t connect to it in any way. Only a hard reboot restarts this.
I’ll post more info from logs the next time it happens now that I’m looking for it.
Cool unit, but I can’t recommend this to anyone because of instability.
I tried the two “fixes” from that other thread : disabling hardware NAT offload and stopping gl_tertf. I figured, worst case scenario it would not change anything, so why not.
At the very least the logs are not spammed with the following line anymore.
kern.warn kernel: [ 4595.849410] 545, flush one!
I’ll let it run this way, if it gets an uptime of more than two days then maybe these would help.
I’m now at 2 days 7 hours of uptime, while it would not go over 10 hours previously. I’m not sure the “fixes” above did anything (I’ll give it a bit more time) but it is encouraging at least.
I had a similar problem with Beryl MT1300. The unit freezes randomly, with no internet traffic nor able to ping the device. A hard power OFF/ON is the only way to make it functional again.
I also noticed that the plastic casing is very warm (compared to MicroTik, RasPi, and other SBC devices).
I have experienced desktops freezing where the CPU fan has failed.
My hypothesis is that the freeze is caused by the hardware overheating in the GLi-Net devices. Now I keep the MT1300 flipped (the bottom facing up) to allow more convection of the warm air coming out of the device. Even then the top plastic is still quite warm. Not sure if GLi-Net is installing heat sinks on the CPU, and other chips (Network, RAM, etc.,)
Pls note, one common problem causing MT1300 to freeze is a bad power adapter and power source. It may crash the kernel. Heat is not a problem, unless it is hardware failure.
Same issue here. I tried all the chargers including 2 original ones and 3 other high quality branded ones. I switched with a second bought Beryl. The same issue persists. Beryl doesn’t use wifi radio, because I have two AP’s connected to it and a USB modem for backup connection. The uptime is maximum 3 days, then restarts, then after another maximum 2 days it freezes and needs restart.
Have you tried the proposed fixes above? It had no negative impact and I’ve gone from a maximum uptime of a day to not having any random freeze since I last posted here.
I’ve tried, no luck for me. I think that my problem is memory related, because I follow my ram usage. It stays at 37-39%, when it’s going higher than that In the next 2-3 hours it will reboot or freeze. I left it all alone with only one device connected, with VPN server on, USB tethering, PPPoE, SD samba sharing on and it got to 7 days of uptime, so then I was convinced that the popular power supply problem isn’t really the case.
I bought one Huawei E3276 so it would be officialy supported by GL.iNet and two Huawei E3372s (sees it as modem device, AP commands work) and E3372h (sees it as tethering device) because it was posted on the tutorial that it works. Found them here (Internet - GL.iNet Docs 3)