Incorrect Wireguard Routing

I am experience a routing problem where an ICMP request comes in to a wireguard server but the response is routed to the main gateway not back to wg0.

I am using an MT300Nv2 as a VPN server. I want to be able to access the Mango’s LAN through wireguard. I can ping the mango itself, but get no response when pinging any host on the mango’s LAN.

In order to save ports, I reconfigured the mango so that the WAN port is tagged and carries both LAN and WAN traffic. WAN traffic is on VLAN1 (eth0.1) and LAN traffic is on VLAN5 (eth0.5).

Using tcpdump on the mango and the upstream router, I find that the ICMP requests do arrive properly over wg0. If I ping the mango from a host connected via wireguard, the ICMP reply DOES go back to wg0 and the ping is successful. However, if I ping any other host on the mango’s LAN, I see the ICMP request come in over wg0 but the ICMP echo reply is sent to the upstream router (WAN gateway) and NOT back to wg0.

Setup:
Wireguard is on 10.89.5.x. Wireguard client assigned 10.89.5.4.
LAN is 10.86.5.x
WAN is 10.86.1.x
Upstream (main) router is multi-homed at 10.86.1.1 and 10.86.5.1.
Mango router LAN is 10.86.5.2 and is assigned a static WAN 10.86.1.4 address.
Machines on the LAN all have the mango (10.86.5.2) as their gateway.

Tracing:

  • Wireguard client (10.89.5.4) sends ICMP Echo Request to 10.86.5.20 (host on the mango LAN)

  • Mango receives request on wg0 and forwards to LAN (eth0.5)

  • Mango receives response from LAN (eth0.5).

  • Mango FORWARDS RESOPNSE TO UPSTREAM ROUTER (eth0.1) NOT wg0!

Of course, the main router has no idea what to do with a 10.89.x.x address as it cannot forward to the internet and certainly won’t forward back to the mango to return via wireguard.

Here’s the request coming into, but not returning to wg0:

   tcpdump: listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
   11:20:23.345625 ip: (tos 0x0, ttl 64, id 49542, offset 0, flags [none], proto ICMP (1), length 84)
      10.89.5.4 > 10.86.5.157: ICMP echo request, id 43840, seq 0, length 64

Here’s the traffic on the mango LAN (eth0.5). We can see the request going out and the reply coming back:

tcpdump: listening on eth0.5, link-type EN10MB (Ethernet), capture size 262144 bytes
11:52:42.805539 e4:95:6e:45:b5:2c > 50:e5:49:ce:69:e6, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 31350, offset 0, flags [none], proto ICMP (1), length 84)
    10.89.5.4 > 10.86.5.157: ICMP echo request, id 49728, seq 0, length 64
11:52:42.808871 50:e5:49:ce:69:e6 > e4:95:6e:45:b5:2c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 55848, offset 0, flags [none], proto ICMP (1), length 84)
    10.86.5.157 > 10.89.5.4: ICMP echo reply, id 49728, seq 0, length 64

Here’s the “funny” part… Here’s the traffic on the WAN (eth0.1) interface:

tcpdump: listening on eth0.1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:56:03.734323 e4:95:6e:45:b5:2c > 6c:b0:ce:30:0a:9f, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 24615, offset 0, flags [none], proto ICMP (1), length 84)
    10.86.5.157 > 10.89.5.4: ICMP echo reply, id 51520, seq 0, length 64

[Note: I had to run multiple pings, so this traffic is not a single ping, but the same pattern holds over and over.]

The mango’s routing table looks fine:

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.86.1.1       0.0.0.0         UG    10     0        0 eth0.1
10.86.1.0       *               255.255.255.0   U     10     0        0 eth0.1
10.86.5.0       *               255.255.255.0   U     0      0        0 br-lan
10.89.5.0       *               255.255.255.0   U     0      0        0 wg0

I’ve uploaded a printout of my IPTABLES and the network configuration file.MangoWireguardRouting.zip (4.5 KB)

Where is this going wrong?

Any way to trace why replies destined for an address on the wg0 interface are being sent out on the WAN interface instead?

10.86.5.0       *               255.255.255.0   U     0      0        0 br-lan
10.89.5.0       *               255.255.255.0   U     0      0        0 wg0

These two rules result in incorrect routing
Why are your br-lan and WG on the same subnet?

The addresses are very similar, but they are not the same. One is 10.86 and the other is 10.89.

The routing looks proper: 10.86 is the LAN and 10.89 is wireguard.

I’m not seeing the problem, but some suggestions:

  • post the output of ip route list table all
  • what does ip route get 10.89.5.4 on the router say?
  • and ip route get 10.89.5.4 iif eth0.5? (I believe this should tell you the route taken by packets coming in on eth0.5, see ip route help.)
  • can you disable the firewall for a minute and see what happens?
root@CameraRouter:~# ip route list table all
default via 10.86.1.1 dev eth0.1 table 1 
default via 10.86.1.1 dev eth0.1 proto static metric 10 
10.86.1.0/24 dev eth0.1 proto static scope link metric 10 
10.86.5.0/24 dev br-lan proto kernel scope link src 10.86.5.2 
10.89.5.0/24 dev wg0 proto kernel scope link src 10.89.5.1 
broadcast 10.86.1.0 dev eth0.1 table local proto kernel scope link src 10.86.1.4 
local 10.86.1.4 dev eth0.1 table local proto kernel scope host src 10.86.1.4 
broadcast 10.86.1.255 dev eth0.1 table local proto kernel scope link src 10.86.1.4 
broadcast 10.86.5.0 dev br-lan table local proto kernel scope link src 10.86.5.2 
local 10.86.5.2 dev br-lan table local proto kernel scope host src 10.86.5.2 
broadcast 10.86.5.255 dev br-lan table local proto kernel scope link src 10.86.5.2 
broadcast 10.89.5.0 dev wg0 table local proto kernel scope link src 10.89.5.1 
local 10.89.5.1 dev wg0 table local proto kernel scope host src 10.89.5.1 
broadcast 10.89.5.255 dev wg0 table local proto kernel scope link src 10.89.5.1 
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
fded:72ff:6602:10::/64 dev br-lan proto static metric 1024 pref medium
unreachable fded:72ff:6602::/48 dev lo proto static metric 2147483647 error -148 pref medium
fe80::/64 dev ra0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0.1 proto kernel metric 256 pref medium
fe80::/64 dev apcli0 proto kernel metric 256 pref medium
fe80::/64 dev br-lan proto kernel metric 256 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
anycast fded:72ff:6602:10:: dev br-lan table local proto kernel metric 0 pref medium
local fded:72ff:6602:10::1 dev br-lan table local proto kernel metric 0 pref medium
anycast fe80:: dev ra0 table local proto kernel metric 0 pref medium
anycast fe80:: dev eth0 table local proto kernel metric 0 pref medium
anycast fe80:: dev apcli0 table local proto kernel metric 0 pref medium
anycast fe80:: dev eth0.1 table local proto kernel metric 0 pref medium
anycast fe80:: dev br-lan table local proto kernel metric 0 pref medium
local fe80::e495:6eff:fe05:b52c dev apcli0 table local proto kernel metric 0 pref medium
local fe80::e695:6eff:fe45:b52c dev ra0 table local proto kernel metric 0 pref medium
local fe80::e695:6eff:fe45:b52c dev eth0 table local proto kernel metric 0 pref medium
local fe80::e695:6eff:fe45:b52c dev eth0.1 table local proto kernel metric 0 pref medium
local fe80::e695:6eff:fe45:b52c dev br-lan table local proto kernel metric 0 pref medium
ff00::/8 dev ra0 table local metric 256 pref medium
ff00::/8 dev eth0 table local metric 256 pref medium
ff00::/8 dev br-lan table local metric 256 pref medium
ff00::/8 dev eth0.1 table local metric 256 pref medium
ff00::/8 dev apcli0 table local metric 256 pref medium
ff00::/8 dev wg0 table local metric 256 pref medium
root@CameraRouter:~# ip route get 10.89.5.4
10.89.5.4 dev wg0 src 10.89.5.1 uid 0 
    cache 
root@CameraRouter:~# ip route get 10.89.5.4 from 10.86.5.157 iif eth0.5
10.89.5.4 from 10.86.5.157 dev wg0 
    cache iif eth0.5 

I am not sure just how to disable the firewall, except by going into LUCI and setting all routing policies for "ACCEPT. Right now, however, only the WAN (-> LAN, ->Guest and ->Wireguard) has REJECT as the default in/out/forward policy. All others are ACCEPT.

For yucks, here’s the output of “ip rule list”

root@CameraRouter:~# ip rule list
0: from all lookup local 
1001: from all iif eth0.1 lookup main 
2001: from all fwmark 0x100/0x3f00 lookup 1 
2061: from all fwmark 0x3d00/0x3f00 blackhole
2062: from all fwmark 0x3e00/0x3f00 unreachable
32766: from all lookup main 
32767: from all lookup default

Attached are iptables -L and iptables -S outputs… Detailed IPTABLES.zip (11.3 KB)

One last point to reiterate from the OP - pinging the router (10.86.5.2) DOES correctly route the result back. Only forwarded ICMP responses are being misrouted. This makes me suspicious that the problem is some difference in the iptables between the INPUT or OUTPUT chains and the FORWARD chains.

You can try mwan3 stop.

Stopped mwan3 DID allow pings to other LAN addresses.

When I tried to restart mwan (with “mwan3 start”), I get two errors:

root@CameraRouter:~# mwan3 start
uci: Entry not found
uci: Entry not found

It doesn’t seem that the firewall was restarted.

I am in a remote location with nobody on-site. Have to be careful about locking up the devices.

A quick check of iptable output (ignoring the counts) shows only minor differences. In the FORWARD table, the ROUTE_POLICY rule is missing:

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
            ROUTE_POLICY  all  --  any    any     anywhere             anywhere            
            forwarding_rule  all  --  any    any     anywhere             anywhere             /* !fw3: Custom forwarding rule chain */
            ACCEPT     all  --  any    any     anywhere             anywhere             ctstate RELATED,ESTABLISHED /* !fw3 */
            zone_lan_forward  all  --  br-lan any     anywhere             anywhere             /* !fw3 */
            zone_wan_forward  all  --  eth0.1 any     anywhere             anywhere             /* !fw3 */
            zone_wan_forward  all  --  apcli0 any     anywhere             anywhere             /* !fw3 */
            zone_guestzone_forward  all  --  br-guest any     anywhere             anywhere             /* !fw3 */
            zone_wireguard_forward  all  --  wg0    any     anywhere             anywhere             /* !fw3 */

and, not surprisingly, no ROUTE_POLICY table is defined. OTW, the iptables are identical.

Ugh… That policy routing stuff is soo powerful that the inevitable result is that it takes eons to troubleshoot just about anything :frowning: I try to avoid it as much as possible…
To me it seems that the key is the route table entry default via 10.86.1.1 dev eth0.1 table 1 as that’s the only one I see that can override the 10.89.5.0/24 dev wg0 entry.
The rule list says 2001: from all fwmark 0x100/0x3f00 lookup 1 but I’m hazy on how to link all the iptables stuff to that rule being chosen. Would take another dive into the manuals…

Interesting. Since I could not successfully run a “mwan3 start” command, I rebooted the router. I presume that this would restart the firewall as that is the configured state and manually stopping it should not be “remembered” across a reboot.

Comparing the “iptables -L -vv” output, I find that it is identical to the output of that command when I ran it after stopping the firewall.

Comparing to an earlier iptables listing, I can confirm that the ROUTE_POLICY rule is still gone. That rule earlier was:

Chain ROUTE_POLICY (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  br-lan any     anywhere             anywhere             mark match 0x40000/0x40000
  50M 4166M ACCEPT     all  --  br-lan any     anywhere             anywhere             mark match 0x80000/0x80000
    0     0 DROP       all  --  br-guest any     anywhere             anywhere             mark match 0x40000/0x40000
    0     0 ACCEPT     all  --  br-guest any     anywhere             anywhere             mark match 0x80000/0x80000

Could those -mark fields have been a leftover from some previous VPN routing policy which was once configured on that router, but is no longer used?

Of course, the ACCEPT/DROP selections - as displayed - don’t really route traffic, just filter them.

However, previous analysis of gl-iNet’s VPN policy routing implementation does use the ROUTE_POLICY table and mark sets with a value of 0x80000 and another value.

I am wondering if this was some remnant that was not removed and if subsequent “messing around” trying to solve this problem did not finally remove that iptable entry?

Are you saying that everything is working correctly now?
Failing to remove some route or policy after a slew of changes is not surprising…

Just to close out this thread…

It is working just fine when the remnants of the VPN policy are removed.