So.. third time in a month I’ve been woken up by customer emergencies when the VPN tunnel fails. All 3 due to Brume 3’s with DDNS that suddenly stopped working (separate customers / routers). Rebooting the router doesn’t help, but disable/re-enable of DDNS does fix it (after a few minutes for DNS propagation)
I haven’t figured out what’s killing DDNS to begin with, but I have found why it doesn’t self-heal after a basic router reboot.. in short, GL added a new (broken) flock wrapper in this latest firmware.
Symptom: After any reboot, DDNS shows "enabled" in the admin panel but the daemon is not running.
Root cause: A change was made to the hotplug script sometime between builds 224 (SlateAX 4.8.2) and 294 (Brume3 4.8.4) that introduced a flock wrapper.
Old version (working): /etc/init.d/gl_ddns restart &
New version (broken, build 294): flock -n /var/run/ddns/glddns.lock /etc/init.d/gl_ddns restart && rm /var/run/ddns/glddns.lock &
The flock call requires /var/run/ddns/ to exist in order to create the lock file. But /var is tmpfs and is wiped on every reboot. The new DDNS init script's boot() function is a no-op (return 0), so nothing recreates this directory at boot. The directory is only created when the DDNS updater script actually runs >>>> but it can't run because flock fails first.
Next is repeat chicken-and-egg problem:
boot() returns 0 > nothing happens
WAN comes up > hotplug fires
flock -n /var/run/ddns/glddns.lock fails silently (no such file or directory)
The restart command inside the flock never executes
DDNS daemon never starts, and no log files + no error in the GUI
How to reproduce:
enable DDNS in GUI
reboot Brume 3
check GUI and DDNS still shows enabled, but:
/var/run/ddns/ - does not exist
/var/log/ddns/ - does not exist
DDNS process - not running
Syslog DDNS entries - zero
in DDNS GUI: Disable > Apply > Enable > Apply
DDNS is working again.
Conclusion: DDNS is broken on every Brume3 reboot.
FIX: Add mkdir -p /var/run/ddns before the flock call in /etc/hotplug.d/iface/95-gl_ddns
Still not positive what broke it to start with. Have setup a monitoring script for now.
@bruce just experienced this on a another Brume3 today. I can confirm now on 4 separate Brume3s, that after being initialized, GL DDNS is broken upon any subsequent reboot.
Current only fix is to disable (apply) and re-enable the Dynamic DNS in the GUI after reboot.
Agreed, but what is a proper ticket? They have no “official” issue tracker. If they’d give us a proper github then I’d raise an issue.
That said, I do know it’s reported - and they’ve actually had this issue tracked for over a month and didn’t think it was a priority.
This should be a “same day fix, new firmware push” type of issue. Currently the Brume 3 - their premier VPN gateway - is functionally broken for it’s primary purpose.
@will.qiu@bruce someone at GL should be ahead of this. It’s impacting your customers severely right now.
We have created an internal ticket and submitted to R&D.
I find R&D guys have already submitted the fix and will urge them in this ticket to compile the firmware as soon as possible.
After patching, rebooted each one and DDNS started automatically. Directories were recreated, daemon runs and hostname update is registered. No manual toggle needed.
Thank you, @bruce . Most of these are customer routers in production (relying on them for daily remote work), so I can’t put them on beta until I’ve had time to test it myself more thoroughly, but I’m glad to hear it’s fixed and appreciate the reply!
In the interim I’ll just keep patching the hotplug. I’m assuming there are likely many new Brume3 customers out there sitting on stable that aren’t going if understand why everything suddenly stops working on the first time their Brume3 reboots while they’re abroad on travel.. and most aren’t going to know how to trace it to a DDNS bug (or how to workaround or that they need beta to fix it).
May want to consider a 4.8.4.x hotfix in the interim if 4.8.5 is going to take a while to migrate to stable.
I certainly understand that.. but GL can push a small fix update of the firmware to the gl-download center with a version bump, which would then prompt a firmware update notification to everyone via the admin panel - as is normal practice for most software/firmware vendors with critical fixes.
Obviously not ideal, but maybe worth consideration when the new flagship VPN appliance is broken at a core level for it’s primary purpose as a VPN server. Right now, every person having just purchased this and setting it up for VPN use is one reboot away from hitting this failure. I’ve deployed a few dozen of these for customers this past month, so I would assume there’s hundreds more doing the same right now.
GL has pushed version bumps for mostly single features such as AmneziaWG.. I would argue this is significantly more worthy and impacting to customers.
Regardless. I appreciate the response and updates.
I was looking for that previous Brume 3 DDNS thread on Discord earlier, but I couldn't find it either. Sorry about this.
Regarding the device sharing, yes, you'll need to email us or DM me on Discord with your Brume 3's MAC address and login password so I can access it via SSH to take a look.
There is a new beta firmware available now, it fixed the DDNS issue for Brume3.
You might want to try that first, and if you're still having issues, feel free to reach out to us again.