BLUF: 2-3 times a day, LAN host name resolution will stop working. I have to log into the server and run a command, after which everything works for a while. This always happens overnight; it usually happens at some point during the day, as well.
I have a GL-iNet AX1800 running in router mode. The router claims it's running OpenWrt 21.02-SNAPSHOT r16399+173-c67509efd7. I have it configured to connect to Mullvad via Wireguard, and have a couple of LAN machines set to static IPs but am otherwise letting dhcp do its thing. I'm also excluding a small group of IPs to bypass Mullvad, because they're work-related VPN endpoints (no need to VPN-over-VPN). Finally, I've poked around in the shell to add some LAN cnames.
For a long while, I was struggling to consistently getting named LAN hosts to be resolved by the router; it was inconsistent at best. I tried enough things that I can't remember all of them; one thing I think I did do was replace dnsmasq
with dnsmasq-full
, and that's what's currently installed. For a while this seemed to work fine, and then I think I got a firmware update and my troubles started.
WAN DNS resolution happens over Wireguard via Mullvad's DNS servers -- as I want it to. This works consistently, all the time. However, every day, 1-3 times a day, LAN host resolution stops working. It happens every night, and I sometimes during the day; I don't have a feeling for periodicy.
When LAN resolution starts failing, my work-around is to ssh into the server and run route_policy 3
. I've narrowed it down to this via tracing:
/etc/init.d/vpnpolicy-apply restart
fixed it, so I traced that and found that- both
/usr/bin/vpn_domain_update.sh
and/usr/bin/route_policy
were being called, which meant myproxy_mode
must be "3", which led me to trying - calling
vpn_domain_update.sh
, which didn't fix the issue, so trying route_policy 3
, which does fix it.
This is as far as I've traced it; I suspect that it's not the firewall changes that the script is doing, but rather the ipset
changes and set_domain_policy()
shell function that resolve the issue.
I still have no idea what's causing the LAN host resolution to consistently, periodically fail.
I also wonder if it's odd that there are 4 dnsmasq
processes running:
5668 root 2704 S /usr/sbin/dnsmasq -C /etc/dnsmasq.conf.vpn -x /var/run/dnsmasq/dnsmasq.vpn.pid --server=193.138.219.228 --no-resolv
5669 root 2676 S /usr/sbin/dnsmasq -C /etc/dnsmasq.conf.vpn -x /var/run/dnsmasq/dnsmasq.vpn.pid --server=193.138.219.228 --no-resolv
6002 dnsmasq 2724 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
6007 root 2692 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
Two look like intentional forking, but the fact that each set is using different configurations looks susicious. I do know that DNS resolution still works if I call service dnsmasq stop
, because that kills only the non-VPN instances; and with only those WAN and LAN resolution still work fine. However, even with service dnsmasq disable
, vpnpolicy_apply
and/or route_policy 3
start all four when they are run.
- I've tried stopping and disabling the dnsmasq service. Indeed, it kills the non-vpn-config pair, and both LAN and WAN DNS resolution continues to work without them. However, it doesn't prevent the issue occurring, and it just gets started back up by
/usr/bin/route_policy
when I runservice vpnpolicy-apply restart
. - I've renamed
/etc/init.d/dnsmasq
. This causes/usr/bin/route_policy
to complain, does prevent the second set of dnsmasq instances from running, and it leaves LAN/WAN DNS resolution in a working state -- but it doesn't stop the issue from happening, and all it does is prevent 2 of the dnsmasq instances from running.
Thanks,