GL-MT3600BE: intermittent TCP/TLS failures when routing LAN clients to another gateway on the same LAN; fixed by SNAT

Hi everyone,

I recently replaced my main router with a GL-MT3600BE, and I started seeing intermittent connection failures in a specific routing scenario.

The issue appears when the GL-MT3600BE routes LAN client traffic to another gateway that is also inside the same LAN subnet.

Network topology

My LAN subnet is:

192.168.11.0/24

Main router:

GL-MT3600BELAN
IP: 192.168.11.1

Client:

Windows client
IP: 192.168.11.144
Default gateway: 192.168.11.1

There are two additional gateways on the same LAN:

192.168.11.16  -> Tailscale subnet router / gateway
192.168.11.21  -> fake-ip / proxy gateway

The GL-MT3600BE has static routes like:

7.0.0.0/10      via 192.168.11.21 dev br-lan
10.21.0.0/16    via 192.168.11.16 dev br-lan
172.18.20.0/24  via 192.168.11.16 dev br-lan
172.30.0.0/16   via 192.168.11.16 dev br-lan
192.168.12.0/24 via 192.168.11.16 dev br-lan

Example route output:

7.0.0.0/10 via 192.168.11.21 dev br-lan proto static
10.21.0.0/16 via 192.168.11.16 dev br-lan proto static
172.18.20.0/24 via 192.168.11.16 dev br-lan proto static
172.30.0.0/16 via 192.168.11.16 dev br-lan proto static
192.168.12.0/24 via 192.168.11.16 dev br-lan proto static

Symptom

Some HTTPS connections fail on the first attempt, but succeed immediately on the second attempt.

Example from Windows curl:

curl.exe -v -o NUL -s -w "dns=%{time_namelookup} connect=%{time_connect} starttransfer=%{time_starttransfer} total=%{time_total}`n" https://xxxx.xxx.xxx.com

Failure:

* Host xxxx.xxx.xxx.com:443 was resolved.
* IPv4: 10.21.6.74
*   Trying 10.21.6.74:443...
* schannel: disabled automatic use of client certificate
* ALPN: curl offers http/1.1
* Recv failure: Connection was reset
* schannel: failed to receive handshake, SSL/TLS connection failed
* closing connection #0dns=0.025663 connect=0.043946 starttransfer=0.000000 total=19.253947

Second attempt succeeds:

* Host xxx.xxx.xxx.com:443 was resolved.
* IPv4: 10.21.6.74
*   Trying 10.21.6.74:443...
* ALPN: server accepted http/1.1
< HTTP/1.1 200 OK
dns=0.054640 connect=0.088751 starttransfer=0.308301 total=0.316747

The same type of failure also happened with fake-ip traffic, for example:

sqlmodel.tiangolo.com -> 7.0.0.79

Failure example:

* Host sqlmodel.tiangolo.com:443 was resolved.
* IPv4: 7.0.0.79
*   Trying 7.0.0.79:443...
* Recv failure: Connection was reset
* schannel: failed to receive handshake, SSL/TLS connection faileddns=0.030430 connect=0.035898 starttransfer=0.000000 total=19.207708

Packet capture findings

On the fake-ip route, I saw TCP connection establishment, then TLS ClientHello retransmissions:

192.168.11.144 -> 7.0.0.79:443 [SYN]
7.0.0.79 -> 192.168.11.144:443 [SYN, ACK]
192.168.11.144 -> 7.0.0.79:443 [ACK]
192.168.11.144 -> 7.0.0.79:443 TLSv1.2 Client Hello
192.168.11.144 -> 7.0.0.79:443 TCP Retransmission [PSH, ACK]
...
192.168.11.144 -> 7.0.0.79:443 [RST, ACK]

For the Tailscale route, I captured on both the router and the Tailscale gateway.

On the Tailscale interface, I could see the remote host sending SYN/ACK back:

192.168.11.144 -> 10.21.6.74:443 [SYN]
10.21.6.74 -> 192.168.11.144:443 [SYN, ACK]
10.21.6.74 -> 192.168.11.144:443 [SYN, ACK retransmission]
10.21.6.74 -> 192.168.11.144:443 [SYN, ACK retransmission]
192.168.11.144 -> 10.21.6.74:443 [RST, ACK]

This made me suspect an asymmetric routing issue:

Before SNAT:

Client -> GL-MT3600BE -> LAN-side gateway -> remote target
Client <- LAN-side gateway <- remote target

The return path bypasses the GL-MT3600BE.

Important test

If I add a route directly on the Windows client, bypassing the GL-MT3600BE as the intermediate router, the problem disappears.

For example:

route add 7.0.0.0 mask 192.0.0.0 192.168.11.21 metric 1

After this, fake-ip traffic is stable:

Windows client -> 192.168.11.21 -> proxy gateway

This suggests the proxy gateway itself is probably fine.

Workaround that fixed the issue

I added an SNAT rule on the GL-MT3600BE for traffic from LAN clients to non-LAN destinations, when the egress zone is still LAN:

Source:      192.168.11.0/24
Destination: !192.168.11.0/24
Zone:        lan -> lan
Action:      SNAT to 192.168.11.1

In LuCI it shows as:

SNAT all -- 192.168.11.0/24 !192.168.11.0/24 to:192.168.11.1

After adding this rule, the packet counter increases and the intermittent HTTPS failures appear to be fixed.

The resulting path becomes symmetric:

After SNAT:
Client -> GL-MT3600BE -> LAN-side gateway -> remote target
Client <- GL-MT3600BE <- LAN-side gateway <- remote target

Question

Is this expected behavior for this kind of topology?

I understand that routing from br-lan back out through br-lan to another gateway in the same subnet can create asymmetric routing, ICMP redirect, or flow offloading edge cases.

However, this worked more reliably on my previous router ( Openwrt OpenWrt 25.12 On x86 ). On the GL-MT3600BE, I consistently saw first-attempt TCP/TLS failures until I added SNAT.

Could this be related to:

flow offloading / hardware offloading
ICMP redirectMTK acceleration
LAN-to-LAN routing via br-lan
asymmetric return path handling

Would GL.iNet recommend SNAT for this topology, or is there a better way to configure static routes to LAN-side gateways?

Thanks.

Worth trying this with the hardware acceleration disabled I think and seeing if you see the same behaviour.

Thanks. I tried disabling hardware acceleration, but it did not change the behavior.

The only thing what makes me look a bit confused is this route here.

Does tailscale not use the 100 range?

7.0.0.0/10 is supposed to be a public ip range, the 100 range is often classified as a private reserved address by the IETF, often also used for CGNAT but tailscale uses this (imho also a bit wrong in my eyes how tailscale does it but not that breaking).

also 7.0.0.0/10 does not follow the RFC1918 standard.

So likely what I see happening here is that you sinkhole route the full 7.0.0.1 to 7.63.255.255 range.

And you likely also have a security dnsmasq setting active which discards or blocks bogons which do not follow rfc1918.

Also it is worth noting that tailscale is based on the wireguard implementation, I personally don't know much about tailscale, but I do know alot about wireguard I'm confident these 100 ip on the default setup are just virtual ip and static routed.

It can become dangerous when a virtual ip overlaps real public ip and I think the bogon detection detects the route as bogon, which is right fully so, because it is not good if public ip would behave as local source traffic.

Thanks for pointing this out.

Just to clarify: the 7.0.0.0/10 route is not for Tailscale. It is used by my fake-ip DNS/proxy gateway.

I agree that 7.0.0.0/10 is unusual and not RFC1918. And I changed to use 192.18.0.16/15 now.

1 Like

192.18 is also wrong please see:

You can expand the 'Private IPv4 addresses' dropdown to see a nice table :slight_smile:

According to the engineers' analysis, the main cause of the problem is not TLS, nor the target gateway itself, but rather that when the GL-MT3600BE is used as the primary gateway, it forwards LAN client traffic to another gateway within the same LAN. This causes the return route to potentially bypass the GL, forming an asymmetric path.

The reason your added lan->lan SNAT is effective is because it changes the source address to 192.168.11.1, forcing the return packets to pass through the GL again, thus restoring a symmetric path and a stable connection.

This asymmetric traffic shouldn’t be an issue unless the traffic is passing a firewall boundary though that requires a stateful connection?

Passing in and out of the LAN interface only in one direction should not be a problem as it’s not passing the firewall so should not care about subsequent packets not being part of a stateful flow it. There should be nothing there to filter them?