GL-AX1800 failover don't work since 4.5.0 update

I have a GL-AX1800 with a SFR FTTH Box (France) for the WAN and a USB 4G modem (Quectel Cat6 module EP06-E) failover.

No problem with 4.4.6.
But since 4.5.0 update, when I do a test by cutting the ONT box (FFTH not avaible), the GL-AX1800 detects the outage but after a few seconds, it again considers that the WAN is ok.
So it does not switch to the 4G link.

There is something wrong with the reconstructed mwan3 (renamed it as kmwan).

Do you disconnect the cable to the AX1800's WAN, or disconnect the ONT box's upstream cable, without touching the cable from ONT to AX1800?

I didn't touch the cable to the AX1800's WAN.
The objective of the test is to simulate an operator FTTH cut.

I only disconnected the cable between the ONT box and the operator Box.
As can happen (and has already happened to me several times) when the ONT box burns.
The operator box goes to the disconnected state.

ssh Go to the router and enter the following command

ip route show
cat /proc/gl-kmwan/status
cat /proc/gl-kmwan/debug

For the moment I reverted to the 4.4.6 version and everything works.
I don't have enough time to do tests at the moment.

I think it makes more sense to stay with package provides openwrt.
Version 4.5.0 introduced a homemade modified mwan3 package which does not seem to work correctly and which is not administrable from the LuCi interface.

For future versions, we should return to openwrt packages and stay within the openwrt philosophy.
This is what most of your customers are looking for.
A simplified "gl-inet" interface but the possibility of advanced configuration via Luci. kmwan is not compatible with LuCi, its advanced configuration is opaque and in my case, it does not work.

mwan3 can be used with version 4.5, but you need to stop status tracking of all interfaces under the multiwan page.However, when you do this, the page cannot display the network condition of the interface correctly.

so I'm going to stay in version 4.4.6 waiting for the next update which I hope so will be exclusively based on openwrt packages administrable from "gl-inet" web GUI and LuCi web GUI for advanced configuration :wink:

I took the time to go back to the version 4.5.16.
The bug is still present.
Below are the requested traces :

root@GL-AX1800:/etc/config# ip route show
default via 192.168.1.1 dev eth0 proto static src 192.168.1.254 metric 1
default via 100.68.95.44 dev wwan0 proto static src 100.68.95.43 metric 2
100.68.95.40/29 dev wwan0 proto static scope link metric 2
192.168.1.0/24 dev eth0 proto static scope link metric 1
192.168.13.0/24 dev br-lan proto kernel scope link src 192.168.13.254

root@GL-AX1800:/etc/config# cat /proc/gl-kmwan/debug
debug_level:0:ERR 1:WARN 2:INFO 3:DEBUG 4:DEBUG(packet sending interval:1s)
cur_level : 2
delay : 100
freq : 100
offset_time : 0
ipv4 active nodes : 2
ipv4 node total : 2
ipv6 active nodes : 0
ipv6 node total : 0
g_if_detect_time : 3000000000 ns

root@GL-AX1800:/etc/config# cat /proc/gl-kmwan/status
Interface Netdev Ifindex State TrackMode TX packets TX stamp RX packets RX stamp
modem_1_1 wwan0 8 ACTIVE passive 0 0 1 1969019743235
Track method ip Info
ping 8.8.8.8
ping 208.67.222.222
online:true state_sync:1

wan eth0 2 ACTIVE force 71 1966630042958 66 1968710995178
Track method ip Info
ping 8.8.8.8
ping 208.67.222.222
online:true state_sync:1

Once you're disconnected from the upper network, can you ssh into the router and capture the packets on the eth0 interface? The log shows that the eth0 interface received the packet.

opkg update && opkg install tcpdump
tcpdump -i eth0 -w eth0.pcap

Then you can use tools such as winscp to obtain this file.I will also try to reproduce the problem in my environment.

I tried upgrading from 4.4.6 to 4.5.16 with or without retaining the configuration, but the problem did not occur. My superior is MT3000, and the modem module is RM520N. When I disconnect the wan port of mt3000, it works normally. Switch to modem. If possible, can you capture the packet when the failover exception occurs and give it to us for analysis?
image

Disconnecting the WAN port is not a correct test.
When you disconnect the WAN port, the physical support is cut. In this case, the ping detection algorithm is not tested. In this situation the failover works but it is not representative of a real situation.
A realistic test would be to cut off the box internet access (disconnect the optical fiber for example) without cutting the physical WAN link.

1 Like

I may not have described it clearly, but my topology diagram is as follows:
image
When I disconnect the wan port of MT3000, for AX1800, its physical port is not affected. The wan port of AX1800 still gets the IP address assigned by MT3000.

Okay, sorry I didn't understand.
This test is representative.
This matches what I did as well.
If I find time, I will go back to 4.5.16 and do a network capture.