[Bug] Flint 2 (4.8.4) — UDP/53 reverse-NAT silently drops responses to specific LAN client after WAN IP rotation

Device: GL-MT6000 (Flint 2) Firmware: GL.iNet 4.8.4 (OpenWrt 21.02-SNAPSHOT) Severity: High — silently breaks DNS resolution for one specific LAN client after the WAN public IP changes


Summary

After the ISP rotated the public IP assigned to the router (PPPoE WAN), one specific LAN client lost the ability to resolve DNS over UDP/53 — to any upstream server. TCP/443 outbound from the same client kept working perfectly. Other clients on the same VLAN were unaffected. Reboot of the client, reboot of the router, conntrack -F, and nf_conntrack_helper=1 did not fix the issue. Only adding a DoH proxy on the client (bypassing UDP/53 through TCP/443) restored DNS.

Packet captures show the query reaches Cloudflare, the response reaches the router's kernel (conntrack registers packets=1 bytes=333 on the reverse tuple), but no chain in the iptables filter table ever sees a packet with dst=<client IP> — meaning the reverse-NAT / UN-DNAT step silently drops or mis-rewrites the response for this specific flow.

Affected device

  • Model: GL-MT6000 (Flint 2)

  • Firmware: GL.iNet 4.8.4 (factory firmware)

  • Underlying OS: OpenWrt 21.02-SNAPSHOT

  • Tailscale: 1.80.3 (factory bundle, subnet-router + exit-node)

  • AdGuard Home: enabled, dns-port 3053, listening on the router

Network topology (relevant)

  • WAN: PPPoE on pppoe-wan, dynamic public IP (ISP: RDS Digi)

  • LAN: 4 VLANs (Trusted 192.168.1.0/24, Crypto 192.168.30.0/24, IoT 192.168.40.0/24, Guest 192.168.9.0/24)

  • Affected client: NAS QNAP TS-673A, MAC 24:5E:BE:83:43:3C, IP 192.168.1.222, connected by cable to lan2

  • Working clients (same VLAN Trusted): laptop on Wi-Fi (different MAC), other LAN devices

Symptom (from the affected client)

$ getent hosts api.cloudflare.com
# (empty, no result)

$ curl https://api.cloudflare.com/
# HTTP 000 / 4-second timeout (DNS resolution fails)

$ curl --resolve api.cloudflare.com:443:104.16.132.229 https://api.cloudflare.com/
# HTTP 403   t=0.038s    -- Direct IP works fine — TCP/443 outbound is OK

$ curl https://1.1.1.1/dns-query?name=api.cloudflare.com    # DoH
# {"Status":0, ...}      -- DoH (TCP/443) works perfectly

conntrack on the affected client shows all UDP/53 outbound connections as [UNREPLIED], regardless of the upstream server (Flint 2's own dnsmasq at 192.168.1.1, public 1.1.1.1, public 9.9.9.9 — all fail identically).

From any other client on the same VLAN (e.g. Wi-Fi laptop), DNS via UDP/53 to the same upstreams works fine.

Diagnostic data (gathered on the router)

Outbound query path

Counter rules added on the router for the affected client's UDP/53 traffic:

Chain Table Filter Pkts
PREROUTING mangle udp dport 53 src 192.168.1.222 OK (e.g. 19 pkts)
FORWARD filter (same flow at FORWARD) OK
POSTROUTING mangle (same flow leaving WAN) OK

Query leaves the router toward the upstream resolver normally.

Inbound response path (the problem)

After flushing conntrack and triggering 5 fresh queries from the client:

Chain Table Filter Pkts Notes
PREROUTING mangle udp sport 53 (any) 6 Response arrives
FORWARD mangle udp sport 53 (any) 6 Passes mangle FORWARD
FORWARD filter udp sport 53 dst 192.168.1.222 0 Never seen with the un-NATted destination
POSTROUTING mangle udp sport 53 (any) 6 Goes back out (?)

filter FORWARD never sees a packet whose destination is the original LAN client. The kernel's conntrack table does register the response (packets=1 bytes=333 on the reverse tuple, target dst=<router public IP>), confirming the packet reached the host, but the un-DNAT step that should rewrite dst to 192.168.1.222 is not producing the rewritten packet for FORWARD.

Counter on the affected client's INPUT chain

Chain Table Filter Pkts
INPUT filter udp sport 53 (any) 0
INPUT mangle udp sport 53 (any) 0

No UDP/53 reply ever reaches the affected client's network stack.

Attempted fixes that did NOT resolve the issue

  1. conntrack -F on the router (twice) — no effect.

  2. sysctl -w net.netfilter.nf_conntrack_helper=1 — no effect (was 0 by default; flushed conntrack after).

  3. Reboot of the affected NAS — no effect.

  4. Reboot of the router — no effect.

  5. Removing the affected MAC from src_mac10 / src_mac11 ipsets (VPN policy) — confirmed not related (counters in TUNNEL10/11_ROUTE_POLICY chains showed 0 packets matched for this client at the time of test).

  6. Per-container DNS override on the affected client's Docker daemon — irrelevant, the host itself fails.

Diagnostic snapshot details

  • iptables-save output captured (~12 KB, available on request).

  • conntrack -L shows ESTABLISHED flow on the reverse tuple but no FORWARD/INPUT delivery.

  • dmesg shows: nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. (documented OpenWrt warning, but enabling helpers did not fix it.)

The LAN client's DNS started failing at the same time as a forced PPPoE reconnection (router uptime at the time of the bug was 17 h, public IP had changed mid-day).

Suspected root cause

The nf_conntrack_nat IP-rewrite step, on the reply for an outbound UDP/53 flow originated by a specific MAC/IP combination, is silently dropped after a public-IP rotation on the WAN. Other LAN clients (different MAC, same VLAN) on the same router/WAN do not suffer this — strongly indicating a per-flow conntrack inconsistency that survives across conntrack -F. Possibly a bug in the GL.iNet-specific pre_dns_deal_conn_zone chain or the dns_dispatcher / policy_redirect NAT chains added by the AdGuard / VPN-policy modules, in combination with a stale CT zone entry pinned to the old WAN IP.

Workaround applied (works perfectly)

On the affected NAS:

  1. Run klutchell/dnscrypt-proxy Docker container in --network host mode, listening on 127.0.0.2:5354, upstream Cloudflare DoH (TCP/443).

  2. Configure /etc/dnsmasq.conf with: server=127.0.0.2#5354 + no-resolv.

  3. /etc/resolv.confnameserver 127.0.1.1 (local dnsmasq).

Result: DNS resolves correctly via DoH; getent hosts api.cloudflare.com → HTTP 301, 51 ms. TCP/443 path is the only working DNS path for this client.

Severity / Impact

  • DNS-dependent services on the affected client (Cloudflare tunnels, Tailscale registration, Docker daemon registry pulls) were all silently failing with cascading errors.

  • Diagnosis took ~5 hours due to the silent drop and the inconsistency between the working conntrack registration vs. the missing FORWARD packet.

  • A user without packet-level diagnostic skill would have no way to recover other than factory-resetting the router.

Suggested investigation areas

  1. Behaviour of pre_dns_deal_conn_zone (raw PREROUTING) and the CT zone 40960 assignment when the public IP changes.

  2. Interaction between the GL.iNet policy_redirect / dns_dispatcher NAT chains and the kernel's UN-NAT for UDP/53 replies post-IP rotation.

  3. Whether conntrack -F actually clears the nat mappings or only the filter state-tracking table; the symptom suggests the NAT mapping is pinned to the old public IP.

  4. Whether a Tailscale + AdGuard + VPN-policy-by-MAC configuration (all enabled) is exercising a code path that triggers this.

Test environment available

I can reproduce on demand and gather any specific data (tcpdump -w, iptables-save -c, conntrack -L, dmesg, etc.) if needed. Happy to run instrumented tests if engineering wants to dig in. Thanks!

1 Like

Hi

Thanks for your report.

We noticed that you have already reached out to us by email regarding this issue.
Please continue working with our support team through the ticket system so we can assist you more effectively.

This post will remain open in case other users wish to share additional suggestions.