Flint 2 GL-MT6000 crash - gl_sdk4_tertf

I’m running the latest firmware v4.8.3 on my Flint 2. It rebooted today, saving a crash log that indicates that the gl_sdk4_tertf kernel module did something bad during WPA processing.

Oops#1 Part1
<7>[602995.790988]  iptable_raw
<5>[602995.866016] 7986@C15L3,WPABuildPairMsg1() 5310: <=== send Msg1 of 4-way
<7>[602995.878005]  iptable_nat iptable_mangle iptable_filter ipt_ECN ipheth ip6table_raw ip_tables huawei_cdc_ncm exfat crc_ccitt cdc_wdm cdc_ncm cdc_ether cdc_acm asn1_decoder arptable_filter arpt_mangle arp_tables fuse sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_nat nf_nat ip6t_NPT nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb veth tun ovpn_dco_v2 udp_tunnel ip6_udp_tunnel dns_mark vfat fat ntfs nls_utf8 nls_iso8859_1 nls_cp437 shortcut_fe_ipv6 shortcut_fe seqiv ghash_generic gcm ctr chacha20poly1305 mtdoops mtk_warp mtkhnat leds_gpio
<7>[602995.878079]  uhci_hcd ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl kmwan ehci_hcd gpio_button_hotplug gl_sdk4_tertf gl_repeater gl_sdk4_black_white_list f2fs ext4 mbcache jbd2 conninfra crc32c_generic crc32_generic gl_sdk4_hw_info
<7>[602995.996734] CPU: 3 PID: 20559 Comm: lua Tainted: P                  5.4.238 #0
<7>[602996.004028] Hardware name: GL.iNet GL-MT6000 (DT)
<7>[602996.008806] pstate: 00000005 (nzcv daif -PAN -UAO)
<7>[602996.013722] pc : subnet_free+0x37c/0xbc0 [gl_sdk4_tertf]
<7>[602996.019136] lr : subnet_free+0x364/0xbc0 [gl_sdk4_tertf]
<7>[602996.024518] sp : ffffffc01198bcc0
<7>[602996.027906] x29: ffffffc01198bcc0 x28: ffffff803a300928
<7>[602996.033290] x27: ffffffc0089735d0 x26: ffffffc010ace938
<7>[602996.038673] x25: ffffffc010ace7b0 x24: ffffffc010ace8c4
<7>[602996.044057] x23: ffffff8032a24000 x22: ffffff803a3008f0
<7>[602996.049440] x21: ffffffc0089735d8 x20: 0000000000000000
<7>[602996.054823] x19: ffffff803a300930 x18: 0000000000000000
<7>[602996.060207] x17: 0000000000000000 x16: 0000000000000000
<7>[602996.065591] x15: 0000000000000000 x14: 0000000000000000
<7>[602996.070975] x13: 0000000000000000 x12: 0000000000000000
<7>[602996.076358] x11: 0000000000000000 x10: ffffffc01198bcc0
<7>[602996.081742] x9 : 00000000ffffffd0 x8 : ffffff8031973000
<7>[602996.087125] x7 : 0000000000000027 x6 : ffffff8031973b0f
<7>[602996.092510] x5 : 0000000000000000 x4 : 0000000000000028
<7>[602996.097894] x3 : 0000000000001000 x2 : 0000000000000000
<7>[602996.103278] x1 : ffffffc0089735d8 x0 : ffffff8032a24000
<7>[602996.108663] Call trace:
<7>[602996.111219]  subnet_free+0x37c/0xbc0 [gl_sdk4_tertf]
<7>[602996.116261]  seq_read+0x13c/0x528
<7>[602996.119655]  proc_reg_read+0x5c/0xc8
<7>[602996.123306]  __vfs_read+0x18/0x40
<7>[602996.126694]  vfs_read+0xc8/0x158
<7>[602996.129996]  ksys_read+0x4c/0xc8
<7>[602996.133298]  __arm64_sys_read+0x18/0x20
<7>[602996.137211]  el0_svc_common.constprop.2+0x7c/0x110
<7>[602996.142074]  el0_svc_handler+0x20/0x80
<7>[602996.145897]  el0_svc+0x8/0x680
<0>[602996.149029] Code: 54000180 aa1403e2 aa1503e1 aa1703e0 (f8410443)
<4>[602996.155195] ---[ end trace 4248425451975989 ]---

I believe the faulty kernel module is important for high packet throughput. Is there a fix for this that doesn’t reduce router performance?

Hi

Could you please clarify and provide the following:

  1. Could this issue be reproduced consistently?
  2. If so, could you provide detailed steps to reproduce it?
  3. Please export the full logs and send them to us via private message so we can investigate further.

How to export logs:

How to send private messages:

Thanks Will.

Clarifications:

  1. This is the first time I investigated my wifi drop-outs. I don’t know if a crash has happened in the past. The wifi is good most of the time, but every few weeks I notice it dropping out. It might happen more often where I didn’t notice it.
  2. I can’t make it fail. It just spontaneously crashed and rebooted.
  3. Logs sent via DM

Thank you for providing the logs.
We’ll have our R&D team review them and will keep you updated if there are any findings.

After reviewing the logs, the R&D team found that a device is repeatedly attempting to connect to the MT6000 at a high frequency, but consistently fails during the handshake process.

Could you try to identify this device and either reconfigure its Wi-Fi settings to make it connected or disable its Wi-Fi temporarily to see if it’s related to the issue?

The device’s MAC address is b8:f0:09:xx:xx:5d, and the wireless chipset vendor is Espressif Inc.

Hi Will. There is no longer anything on my wifi network matching that MAC address. I can see it in the list of offline clients but it’s not live at present, and I have no way of working out what type of device it is.

I’ve setup a probe to periodically check MAC addresses to see if it returns.

I’ve found the device with that MAC address. It’s a light bulb made by Mirabella (model “Genio A60 Bulb”) using the Tuya platform.

So, as mentioned earlier, if you check its configuration to ensure it can establish a proper Wi-Fi connection, or temporarily disable its Wi-Fi for a while, will the issue happen again?

There is no option to disable wifi, but I will turn it off for a while, then on again, to see if the fault returns. There are no configuration changes I can make to change the wifi - it’s working now without me having to make any changes.

Regardless of the behaviour of the light bulb, it shouldn’t be possible to for a (presumably unauthenticated) device to cause a router to crash during a wifi handshake.

We may not have explained this clearly earlier:

At this stage, we’re mainly trying to confirm whether the issue is related to this factor. That’s why we suggest temporarily disabling it, or ensuring it can maintain a stable Wi-Fi connection, and then observing whether the issue still occurs.

If it is confirmed to be related, we can then proceed with further steps to address it.

Thankyou for explaining. That makes sense.

I can remove the bulb and replace it with a different brand. If the router crashes again, this may help narrowing down the cause. On the other hand, leaving the bulb in place might lead to another crash, which also helps pin down the cause.

Please advise what I can do to help progress this.

For now, please:

  1. As previously suggested, temporarily remove the bulb to narrow down the troubleshooting scope.

  2. Configure the system logs to be stored on Flash, so that after a fault occurs, you can reboot the device, retrieve the complete logs, and send them to us:

uci set system.@system[0].log_size='512'
uci set system.@system[0].log_file='/root/system.log'
uci commit system
/etc/init.d/log restart

Since this issue is related to a kernel module, it may take some time for us to investigate after receiving the log.
We may also need to compile some special firmwares to enable kernel debug logging for further analysis.

I have turned off the light and also applied the log config changes. I will advise when the router next reboots, including providing the content of the logfile.

1 Like

Thanks for letting us know—we’ll wait for your update.