My area only gets VDSL2 (NBN FTTN Australia), so I typically have to bridge by having wall socket -> modem to receive VDSL2 signal in bridging mode (LAN Port) -> Flint 2 (WAN port). My ISP uses DHCP, and this has worked well for months to have the Flint 2 as my main router.
About 2 days ago, I updated the flint to v4.8.2 release 3, and now the DHCP/DSL connection to the flint frequently fails resulting in massive latency between the Flint and the internet and plenty of dropped packets, making my internet practically unusable. Not infrequently it then completely fails and I lose internet connectivity altogether, and the homepage says “The interface is connected, but the internet can’t be accessed.” I've tried rebooting everything multiple times with no success; occasionally everything will work great for ~40-60min before failing dramatically again.
I've contacted my ISP and they've ran line tests and isolated the issue to the Flint 2; if I run the modem in router mode I can connect to the internet just fine, and if I bridge it to another router then it also works just fine.
I have tried rolling back to 4.7.7 but after restoring my settings it fails again…
The logs are as follows, where we can see eth0 keeps timing out.
[ 429.478066] mt7530 mdio-bus:1f lan3: Link is Down
[ 429.482837] br-lan: port 3(lan3) entered disabled state
[ 431.526673] mt7530 mdio-bus:1f lan3: Link is Up - 1Gbps/Full - flow control rx/tx
[ 431.534188] br-lan: port 3(lan3) entered blocking state
[ 431.539414] br-lan: port 3(lan3) entered forwarding state
[ 432.932851] mtk_soc_eth 15100000.ethernet eth1: transmit timed out
[ 432.939068] [mtk_pending_work] No need to do FE reset !
[ 438.052854] mtk_soc_eth 15100000.ethernet eth1: transmit timed out
[ 438.059055] [mtk_pending_work] No need to do FE reset !
[ 451.108837] mtk_soc_eth 15100000.ethernet eth0: transmit timed out
[ 451.115036] [mtk_pending_work] No need to do FE reset !
[ 456.996833] mtk_soc_eth 15100000.ethernet eth0: transmit timed out
[ 457.003024] QDMA Tx Info
[ 457.005548] err_cnt = 3
[ 457.005550] is_qfsm_hang = 1
[ 457.010855] is_qfwd_hang = 1
[ 457.013725] -- -- -- -- -- -- --
[ 457.016943] MTK_QDMA_FSM = 0x700
[ 457.020160] MTK_QDMA_FWD_CNT = 0x0
[ 457.023551] MTK_QDMA_FQ_CNT = 0x69e0800
[ 457.027374] ==============================
[ 457.031480] [mtk_pending_work] No need to do FE reset !
[ 458.052845] QDMA Tx Info
[ 458.055380] err_cnt = 4
[ 458.055382] is_qfsm_hang = 1
[ 458.060703] is_qfwd_hang = 1
[ 458.063576] -- -- -- -- -- -- --
[ 458.066802] MTK_QDMA_FSM = 0x700
[ 458.070023] MTK_QDMA_FWD_CNT = 0x0
[ 458.073416] MTK_QDMA_FQ_CNT = 0x69e0800
[ 458.077243] ==============================
[ 459.108824] QDMA Tx Info
[ 459.111358] err_cnt = 5
[ 459.111359] is_qfsm_hang = 1
[ 459.116668] is_qfwd_hang = 1
[ 459.119538] -- -- -- -- -- -- --
[ 459.122757] MTK_QDMA_FSM = 0x700
[ 459.125975] MTK_QDMA_FWD_CNT = 0x0
[ 459.129367] MTK_QDMA_FQ_CNT = 0x69e0800
[ 459.133193] ==============================
[ 460.164854] QDMA Tx Info
[ 460.167389] err_cnt = 6
[ 460.167391] is_qfsm_hang = 1
[ 460.172708] is_qfwd_hang = 1
[ 460.175580] -- -- -- -- -- -- --
[ 460.178801] MTK_QDMA_FSM = 0x700
[ 460.182019] MTK_QDMA_FWD_CNT = 0x0
[ 460.185412] MTK_QDMA_FQ_CNT = 0x69e0800
[ 460.189238] ==============================
[ 461.220830] QDMA Tx Info
[ 461.223368] err_cnt = 7
[ 461.223370] is_qfsm_hang = 1
[ 461.228671] is_qfwd_hang = 1
[ 461.231539] -- -- -- -- -- -- --
[ 461.234756] MTK_QDMA_FSM = 0x700
[ 461.237971] MTK_QDMA_FWD_CNT = 0x0
[ 461.241361] MTK_QDMA_FQ_CNT = 0x69e0800
[ 461.245184] ==============================
[ 462.116858] mtk_soc_eth 15100000.ethernet eth0: transmit timed out
[ 462.123159] [mtk_pending_work] No need to do FE reset !
[ 462.276828] QDMA Tx Info
[ 462.279371] err_cnt = 8
[ 462.279372] is_qfsm_hang = 1
[ 462.284674] is_qfwd_hang = 1
[ 462.287542] -- -- -- -- -- -- --
[ 462.290758] MTK_QDMA_FSM = 0x700
[ 462.293975] MTK_QDMA_FWD_CNT = 0x0
[ 462.297365] MTK_QDMA_FQ_CNT = 0x69e0800
[ 462.301188] ==============================
[ 462.884827] mtk_soc_eth 15100000.ethernet eth1: transmit timed out
[ 462.891025] [mtk_pending_work] No need to do FE reset !
[ 463.332825] QDMA Tx Info
[ 463.335365] err_cnt = 9
[ 463.335366] is_qfsm_hang = 1
[ 463.340668] is_qfwd_hang = 1
[ 463.343537] -- -- -- -- -- -- --
[ 463.346755] MTK_QDMA_FSM = 0x700
[ 463.349971] MTK_QDMA_FWD_CNT = 0x0
[ 463.353361] MTK_QDMA_FQ_CNT = 0x69e0800
[ 463.357185] ==============================
[ 464.165546] mt7530 mdio-bus:1f lan2: Link is Down
[ 464.170319] br-lan: port 2(lan2) entered disabled state
[ 464.388821] QDMA Tx Info
[ 464.391360] err_cnt = 10
[ 464.391362] is_qfsm_hang = 1
[ 464.396751] is_qfwd_hang = 1
[ 464.399619] -- -- -- -- -- -- --
[ 464.402835] MTK_QDMA_FSM = 0x700
[ 464.406052] MTK_QDMA_FWD_CNT = 0x0
[ 464.409444] MTK_QDMA_FQ_CNT = 0x69e0800
[ 464.413267] ==============================
[ 465.444820] QDMA Tx Info
[ 465.447357] err_cnt = 11
[ 465.447359] is_qfsm_hang = 1
[ 465.452762] is_qfwd_hang = 1
[ 465.455633] -- -- -- -- -- -- --
[ 465.458851] MTK_QDMA_FSM = 0x700
[ 465.462070] MTK_QDMA_FWD_CNT = 0x0
[ 465.465462] MTK_QDMA_FQ_CNT = 0x69e0800
Is anyone else having this issue, or any known fix?
Hi Will, thanks for getting back to me. Eventually the issue resolved after downgrade to 4.7.7, full factory reset, then manual re-entry of all settings rather than a luci config restore. It was painful but things seem to be working for now. If the issue comes back I’ll give your suggestions a try.
Could you elaborate on what you mean by physical layer issue/what the errors mean?
The log indicates that the driver encountered errors in the receive queue, likely caused by external factors such as faulty network cables or strong interference.
This typically pertains to issues at the OSI mode Physical Layer.
Of course, you mentioned manually configuring it instead of restoring the Luci backup, so it's also possible that some residual configurations caused the network card driver to malfunction.
Anyway, I'm glad to hear it's working properly now.
I am having a similar issue with internet disconnects. I have FTTP via an ONT and my ISP uses PPPoE.
On 4.7.7 everything was fine and physically nothing else has changed and the ONT itself reports no errors.
Upon upgrading to 4.8.2 I started to get frequent internet disconnects with similar log errors like the QDMA hang. I am using lan2 as WAN. I can PM you the full logs.
Please try upgrading the firmware to the latest stable version 4.8.3 without keeping the configuration and see if the issue persists. GL.iNet download center
I just had the disconnect again this morning so I have now done a clean install of 4.8.3 without keeping the settings. Would you still like me to send the logs?
I have not yet tried those, but given the disconnects have been happening for atleast a week now I'm a bit frustrated with the issue. On the next disconnect I will try and change the cable if I can.
@will.qiu I just had an internet outage on 4.8.3 similar to 4.8.2. During the outage I did change the WAN cable (LAN2) and restarted the interface several times but failed to get an internet connection. A restart tempoarily fixed it for 1 minute before it went off again and then came back.
Unfortunately I my SSH session got disconnected from the router right before the outage, is that a clue?. The logs I have are the kernel logs from sometime after when I restarted the router and the previous disconnects on 4.8.2.
I’m going to turn off adguard home now to stop spamming the logs so next time i can just copy paste them via the web interface rather than an SSH session which gets disconnected right before the internet outage.
Yes the logs do contain that. I had another disconnect today with 4.8.3 this morning, this time I setup a remote docker container for syslog so I can send you them later.
I did downgrade the firmware to 4.7.7 during the outage but struggled to get an internet connection just like on 4.8.3. I'll speak to my isp to see what it could be.
Before a recent ISP outage 2 weeks ago my internet was perfectly fine on 4.7.7, now on 4.7.7 I had the same issue.
Weirdly later on the 23rd when I had a fresh install of 4.7.7 it went down again in the night with the same symptoms as 4.8.2. It then worked fine for the rest of the weekend which is the longest it has worked since I started having issues.
This morning I done a fresh install of 4.8.2 and it disconnected again later in the morning, during this outage I already had a spare router setup with my isp's pppoe details preconfigured so I took the wan cable from the flint 2 and plugged it into the spare router and it connected to the internet fine.
I noticed on my flint 2 that there didn't seem to be any indication of Pppoe authentication actually happening according to the logs. So I reset the trick I did here GL-MT6000 (Flint2) How change LAN2 - LAN5 port to WAN - #15 by cyph and I made eth1 the wan port and set lan2 back to Lan and br-lan and voila it started connecting perfectly fine.
Then I tried to set eth1 back to lan and lan2 back to wan and it wouldn't connect the internet, so maybe the new firmware changes the behaviour of how the ports are managed and how internet is connected?
I can send you logs later on via DM. I also noticed there's a few comments in that post about changing JSON files, to confirm I haven't touched those at all.
It looked like 4.8.3 got pulled yesterday morning which is why I went to 4.8.2 but I am on a fresh install of 4.8.3 today with eth1 and lan1 set as lan in the Glinet settings which is in addition to me doing the quoted trick previously to make lan2 the wan port.
I did notice that on 4.8.2 when I set high amounts of log buffer in luci like 10000kb the log page would take long to load, I think this was related to the hardware acceleration work that he reported fixed in 4.8.3.
I'm not sure trying a dumb switch at the moment is going to help anything because there is no attempted pppoe request at all. The next time the issue happens, what I will do is try toggle the network acceleration, it's currently set to auto but next time I'll switch to software acceleration and yeah if that doesn't work I guess I have to revert back to eth1 as WAN and Lan 2 as Lan.
Also, why is network acceleration defaulted to hardware acceleration? I thought it was previously auto?
We updated the download server yesterday, and version 4.8.3 was mistakenly hidden for a short time.
As you’ve noticed, this has since been fixed.
That’s expected — larger log buffers can cause longer loading times since the browser needs to retrieve and render more data.
Yes, the issue you encountered was likely related to the updated Ethernet/Wi-Fi driver introduced in 4.8.2.
This driver has been reverted in 4.8.3.
After testing on versions 4.7.7 and 4.6.8, the default setting also appears to be hardware acceleration.
We’re not entirely certain if earlier versions ever defaulted to auto, but in most cases, auto mode would still use hardware acceleration automatically.
I had the outages again this morning and whilst I got a brief connection up and running I did a clean install of 4.8.3. before the outage I had not done much configuration to the router and eth1 was still set as wan.
I had an idea something is wrong with the network switch itself in the flint 2 because I couldn't see any PPPOE requests actually happening and my TP Link router keeps connecting fine during the outages.
Today this morning even after the clean install of 4.8.3 the internet disconnects were even more frequent, each time I restart the wan interface in luci get user_error or user_request I can't remember which one and nothing else. Anyways when the next outage happened shortly afterwards I thought to myself why not just try the flint 2 with only eth1 cable plugged in?
So I removed each ethernet port cable 1 by 1 and then waited with no success for internet connection until I removed the Lan2 cable which is on the LAN interface since I haven't changed it. This cable connects to a 2.5gbpe switch with an AMD computer which uses a main board that is msi x870e tomahawk that has a Realtek® 8126 5G LAN.
Even as I was typing this post I got an internet disconnect, as soon as I unplugged the cable from my amd pc to the 2.5gbps switch my internet reconnected. I do have other 2.5gbpe devices on this switch like a MSI z790 a pro ddr4 WiFi and another tplink router as access point which have never caused this problem.
It's worth noting that whenever I tested another tplink router for the internet I didn't connect the 2.5gbps switch to the router.
I do have wake on lan turned on my x870-e tomahawk and usually it's on sleep whilst the outages happen. I don't recall exactly if I have the outages whilst using this amd computer.
I built this computer late in September or early October which was just before my big ISP outage in October and then after that many outages happened on all firmwares includinf 4.7.7 which had been stable for months with the 2.5gbps switch.
I will do some investigation into this realtek chip and work outs what going on here.
Do you by chance have hardware acceleration on or hardware offloading?
If the connection table/firewall table is too full/bussy the result will be that packets get discarded on the cpu, and also fail to offload packets.
One of the few things which can be observed are bdpu messages in the logs, lan2 sending its own source address, and topology change detected on br-lan.
If that happens long enough, the router locks itself in because it sees source traffic from lan2 with the same mac address as of the router.
You end with a broken router which also stops requesting wan ip and most of the routes break, only restarts fix it.
You can try to set it to software offloading or turn it off.
Also if STP is present this can also be on the switch, make sure they run the same flavour, not stp and rstp these can also cause unexpected behaviour.