Problem
The RM-1 is only useful when it is reachable. If its Ethernet uplink stops working, whether due to a physical link issue, DHCP failure, driver glitch, or network stack problem, the device effectively disappears from the operator’s perspective.
In my experience (having almost lost control of my KVM + machine setup), this is a serious weakness. The device itself is still powered and functional, but because it depends on external connectivity, it cannot recover on its own. In practice, a simple reboot often restores connectivity. Unfortunately, without a built-in recovery mechanism, the device can remain offline indefinitely until someone intervenes physically, which defeats the purpose of having remote management hardware in the first place.
The result is a single-point failure:
A transient network fault can permanently remove remote access.
Proposed behavior
The device should periodically check whether its primary wired interface is still operational and attempt to recover if it is not.
Rather than immediately rebooting, recovery could happen in stages. Many failures are software or negotiation-related and can be resolved without restarting the whole system.
A simple recovery ladder could look like this:
First, determine whether the interface is usable. This can be done locally by checking link carrier and address state:
$ cat /sys/class/net/eth0/carrier
$ ip -4 addr show eth0
If carrier is down or no usable address exists, attempt a soft reset of the interface:
$ ip link set eth0 down
$ sleep 3
$ ip link set eth0 up
After a short wait, the system should re-evaluate status.
If the interface is still not functional, the next step would be restarting the network stack, for example:
$ killall connmand
# pray `init` respawns it automatically
If that still doesn’t recover, a stronger step is to restart connman itself. On my RM-1, connmand is managed by inittab respawn (and S45connman doesn’t actually stop/start the daemon), so the practical “restart” is to kill it and let init respawn it:
$ connmanctl enable ethernet
Only if these recovery attempts fail should the system escalate to a reboot:
$ reboot
Safeguards & Practical Implementation
Reboot escalation needs to be guarded to avoid two failure modes: reboot loops, and disruption of an active virtual USB (ISO) session.
Reboot rate limiting: the watchdog should not reboot repeatedly when the upstream network (router/ISP) is down. Store the last reboot timestamp and only allow a recovery reboot once per window (e.g., 3600 seconds).
LAST_REBOOT_FILE=/etc/kvmd/user/state/net-selfheal/last_reboot
NOW=$(date +%s)
LAST=$(cat "$LAST_REBOOT_FILE" 2>/dev/null || echo 0)
can_reboot() {
[ $((NOW - LAST)) -gt 3600 ]
}
Do not reboot when USB mass-storage is active: the RM-1 can act as a USB thumb drive to the controlled PC. If an ISO is mounted/presented, rebooting can break an OS install or corrupt an operation on the remote machine. A practical test is to look at USB gadget configfs: if any mass_storage LUN has a backing file configured, treat it as “active” and suppress reboot.
On the RM-1 I’m using, the gadget is under /sys/kernel/config/usb_gadget/rockchip/...:
usb_storage_active() {
for f in /sys/kernel/config/usb_gadget/rockchip/functions/mass_storage.*/lun.*/file; do
[ -e "$f" ] || continue
backing="$(cat "$f" 2>/dev/null | tr -d '\r\n')"
case "$backing" in
""|"none") continue ;;
*) return 0 ;;
esac
done
return 1
}
If usb_storage_active returns true, the watchdog may still attempt non-disruptive recovery (interface bounce, network service restart), but must not reboot.
Suggested overall flow (pseudo-code):
every 10 minutes:
if eth0 healthy:
clear failure counter; exit
attempt interface reset; re-check
if healthy: exit
attempt network restart; re-check
if healthy: exit
if usb_storage_active: log + exit (no reboot)
if ping veto succeeds: log + exit (no reboot)
if reboot rate-limit allows: reboot
else: log + exit
This design is straightforward to implement today via a cron job (BusyBox crond) and a shell script. I’d love to see this as an official (verified + supported) feature!
Setup I’m using for now
mkdir -p /etc/kvmd/user
cat >/etc/kvmd/user/net-selfheal.sh <<'EOF'
#!/bin/sh
set -eu
IFACE="eth0"
TAG="net-selfheal"
CHECK_V4=1
RECHECK_SLEEP=15
MAX_FAILS=2
REBOOT_COOLDOWN=3600
PING_VETO=1
PING_VETO_IP1="1.1.1.1"
PING_VETO_IP2="8.8.8.8"
USB_GUARD_REQUIRE_UDC=0
PERSIST_STATE_DIR="/etc/kvmd/user/state/net-selfheal"
VOL_STATE_DIR="/tmp/net-selfheal"
FAIL_FILE="$VOL_STATE_DIR/fails"
LAST_REBOOT_FILE="$PERSIST_STATE_DIR/last_reboot"
log() { logger -t "$TAG" "$*"; }
mkdir -p "$PERSIST_STATE_DIR" "$VOL_STATE_DIR"
carrier_up() {
[ -r "/sys/class/net/$IFACE/carrier" ] || return 1
[ "$(cat "/sys/class/net/$IFACE/carrier" 2>/dev/null || echo 0)" = "1" ]
}
has_ipv4() {
ip -4 addr show dev "$IFACE" 2>/dev/null | grep -q ' inet '
}
link_ok() {
carrier_up || return 1
[ "$CHECK_V4" -eq 0 ] && return 0
has_ipv4
}
usb_gadget_connected() {
[ -r /sys/kernel/config/usb_gadget/rockchip/UDC ] || return 1
udc="$(cat /sys/kernel/config/usb_gadget/rockchip/UDC 2>/dev/null | tr -d '\r\n')"
[ -n "$udc" ] && [ "$udc" != "none" ]
}
usb_storage_presented() {
for f in /sys/kernel/config/usb_gadget/rockchip/functions/mass_storage.*/lun.*/file; do
[ -e "$f" ] || continue
backing="$(cat "$f" 2>/dev/null | tr -d '\r\n')"
case "$backing" in
""|"none") continue ;;
*) return 0 ;;
esac
done
return 1
}
usb_storage_active() {
usb_storage_presented || return 1
if [ "$USB_GUARD_REQUIRE_UDC" -eq 1 ]; then
usb_gadget_connected
else
return 0
fi
}
ping_veto_ok() {
[ "$PING_VETO" -eq 1 ] || return 1
ping -c 1 -W 2 "$PING_VETO_IP1" >/dev/null 2>&1 && return 0
ping -c 1 -W 2 "$PING_VETO_IP2" >/dev/null 2>&1 && return 0
return 1
}
read_int_file() {
if [ -f "$1" ]; then
v="$(cat "$1" 2>/dev/null | tr -dc '0-9')"
[ -n "$v" ] && { echo "$v"; return; }
fi
echo "$2"
}
write_int_file() { echo "$2" >"$1"; }
iface_bounce() {
ip link set "$IFACE" down >/dev/null 2>&1 || true
sleep 3
ip link set "$IFACE" up >/dev/null 2>&1 || true
}
restart_kvmd_network() {
[ -x /etc/init.d/S46kvmd-network ] || return 1
/etc/init.d/S46kvmd-network restart >/dev/null 2>&1 || return 1
return 0
}
restart_connman_real() {
# On this image connmand is respawned by inittab, and S45connman does not manage it.
# So: kill connmand; init will restart it.
if pidof connmand >/dev/null 2>&1; then
killall connmand >/dev/null 2>&1 || true
fi
# Give init/respawn a moment
sleep 2
pidof connmand >/dev/null 2>&1
}
# Fast path
if link_ok; then
oldfails="$(read_int_file "$FAIL_FILE" 0)"
[ "$oldfails" -ne 0 ] && log "Recovered (fails was $oldfails)."
write_int_file "$FAIL_FILE" 0
exit 0
fi
fails="$(read_int_file "$FAIL_FILE" 0)"
fails=$((fails + 1))
write_int_file "$FAIL_FILE" "$fails"
log "Detected degraded link on $IFACE. Failure $fails/$MAX_FAILS. Starting recovery."
# Stage 1: bounce interface
iface_bounce
sleep "$RECHECK_SLEEP"
if link_ok; then
log "Recovery succeeded after interface bounce."
write_int_file "$FAIL_FILE" 0
exit 0
fi
# Stage 2: re-apply network config via kvmd-network (connmanctl config)
restart_kvmd_network || true
sleep "$RECHECK_SLEEP"
if link_ok; then
log "Recovery succeeded after re-applying kvmd-network configuration."
write_int_file "$FAIL_FILE" 0
exit 0
fi
# Stage 3: real connman restart (kill -> inittab respawn)
if restart_connman_real; then
# After respawn, re-apply config again (DHCP/manual + DNS)
restart_kvmd_network || true
sleep "$RECHECK_SLEEP"
if link_ok; then
log "Recovery succeeded after restarting connman."
write_int_file "$FAIL_FILE" 0
exit 0
fi
fi
if [ "$fails" -lt "$MAX_FAILS" ]; then
log "Still degraded; will retry on next scheduled run (no reboot yet)."
exit 0
fi
if usb_storage_active; then
log "USB mass-storage is active (ISO/image presented); suppressing reboot."
exit 0
fi
if ping_veto_ok; then
log "Ping veto succeeded; suppressing reboot despite degraded local link state."
exit 0
fi
now="$(date +%s)"
last="$(read_int_file "$LAST_REBOOT_FILE" 0)"
if [ $((now - last)) -lt "$REBOOT_COOLDOWN" ]; then
log "Reboot suppressed by rate limit (last reboot $((now - last))s ago)."
exit 0
fi
mkdir -p /etc/kvmd/user
cat >/etc/kvmd/user/net-selfheal.sh <<'EOF'
#!/bin/sh
set -eu
IFACE="eth0"
TAG="net-selfheal"
CHECK_V4=1
RECHECK_SLEEP=15
MAX_FAILS=2
REBOOT_COOLDOWN=3600
PING_VETO=1
PING_VETO_IP1="1.1.1.1"
PING_VETO_IP2="8.8.8.8"
USB_GUARD_REQUIRE_UDC=0
PERSIST_STATE_DIR="/etc/kvmd/user/state/net-selfheal"
VOL_STATE_DIR="/tmp/net-selfheal"
FAIL_FILE="$VOL_STATE_DIR/fails"
LAST_REBOOT_FILE="$PERSIST_STATE_DIR/last_reboot"
log() { logger -t "$TAG" "$*"; }
mkdir -p "$PERSIST_STATE_DIR" "$VOL_STATE_DIR"
carrier_up() {
[ -r "/sys/class/net/$IFACE/carrier" ] || return 1
[ "$(cat "/sys/class/net/$IFACE/carrier" 2>/dev/null || echo 0)" = "1" ]
}
has_ipv4() {
ip -4 addr show dev "$IFACE" 2>/dev/null | grep -q ' inet '
}
link_ok() {
carrier_up || return 1
[ "$CHECK_V4" -eq 0 ] && return 0
has_ipv4
}
usb_gadget_connected() {
[ -r /sys/kernel/config/usb_gadget/rockchip/UDC ] || return 1
udc="$(cat /sys/kernel/config/usb_gadget/rockchip/UDC 2>/dev/null | tr -d '\r\n')"
[ -n "$udc" ] && [ "$udc" != "none" ]
}
usb_storage_presented() {
for f in /sys/kernel/config/usb_gadget/rockchip/functions/mass_storage.*/lun.*/file; do
[ -e "$f" ] || continue
backing="$(cat "$f" 2>/dev/null | tr -d '\r\n')"
case "$backing" in
""|"none") continue ;;
*) return 0 ;;
esac
done
return 1
}
usb_storage_active() {
usb_storage_presented || return 1
if [ "$USB_GUARD_REQUIRE_UDC" -eq 1 ]; then
usb_gadget_connected
else
return 0
fi
}
ping_veto_ok() {
[ "$PING_VETO" -eq 1 ] || return 1
ping -c 1 -W 2 "$PING_VETO_IP1" >/dev/null 2>&1 && return 0
ping -c 1 -W 2 "$PING_VETO_IP2" >/dev/null 2>&1 && return 0
return 1
}
read_int_file() {
if [ -f "$1" ]; then
v="$(cat "$1" 2>/dev/null | tr -dc '0-9')"
[ -n "$v" ] && { echo "$v"; return; }
fi
echo "$2"
}
write_int_file() { echo "$2" >"$1"; }
iface_bounce() {
ip link set "$IFACE" down >/dev/null 2>&1 || true
sleep 3
ip link set "$IFACE" up >/dev/null 2>&1 || true
}
restart_kvmd_network() {
[ -x /etc/init.d/S46kvmd-network ] || return 1
/etc/init.d/S46kvmd-network restart >/dev/null 2>&1 || return 1
return 0
}
restart_connman_real() {
# On this image connmand is respawned by inittab, and S45connman does not manage it.
# So: kill connmand; init will restart it.
if pidof connmand >/dev/null 2>&1; then
killall connmand >/dev/null 2>&1 || true
fi
# Give init/respawn a moment
sleep 2
pidof connmand >/dev/null 2>&1
}
# Fast path
if link_ok; then
oldfails="$(read_int_file "$FAIL_FILE" 0)"
[ "$oldfails" -ne 0 ] && log "Recovered (fails was $oldfails)."
write_int_file "$FAIL_FILE" 0
exit 0
fi
fails="$(read_int_file "$FAIL_FILE" 0)"
fails=$((fails + 1))
write_int_file "$FAIL_FILE" "$fails"
log "Detected degraded link on $IFACE. Failure $fails/$MAX_FAILS. Starting recovery."
# Stage 1: bounce interface
iface_bounce
sleep "$RECHECK_SLEEP"
if link_ok; then
log "Recovery succeeded after interface bounce."
write_int_file "$FAIL_FILE" 0
exit 0
fi
# Stage 2: re-apply network config via kvmd-network (connmanctl config)
restart_kvmd_network || true
sleep "$RECHECK_SLEEP"
if link_ok; then
log "Recovery succeeded after re-applying kvmd-network configuration."
write_int_file "$FAIL_FILE" 0
exit 0
fi
# Stage 3: real connman restart (kill -> inittab respawn)
if restart_connman_real; then
# After respawn, re-apply config again (DHCP/manual + DNS)
restart_kvmd_network || true
sleep "$RECHECK_SLEEP"
if link_ok; then
log "Recovery succeeded after restarting connman."
write_int_file "$FAIL_FILE" 0
exit 0
fi
fi
if [ "$fails" -lt "$MAX_FAILS" ]; then
log "Still degraded; will retry on next scheduled run (no reboot yet)."
exit 0
fi
if usb_storage_active; then
log "USB mass-storage is active (ISO/image presented); suppressing reboot."
exit 0
fi
if ping_veto_ok; then
log "Ping veto succeeded; suppressing reboot despite degraded local link state."
exit 0
fi
now="$(date +%s)"
last="$(read_int_file "$LAST_REBOOT_FILE" 0)"
if [ $((now - last)) -lt "$REBOOT_COOLDOWN" ]; then
log "Reboot suppressed by rate limit (last reboot $((now - last))s ago)."
exit 0
fi
write_int_file "$LAST_REBOOT_FILE" "$now"
log "Escalating to reboot: link still degraded after recovery attempts."
reboot
EOF
chmod +x /etc/kvmd/user/net-selfheal.sh
mkdir -p /etc/kvmd/user/scripts
cat >/etc/kvmd/user/scripts/S20-net-selfheal-cron.sh <<'EOF'
#!/bin/sh
case "$1" in
start)
mkdir -p /var/spool/cron/crontabs
echo '*/10 * * * * /etc/kvmd/user/net-selfheal.sh' > /var/spool/cron/crontabs/root
chmod 600 /var/spool/cron/crontabs/root
pidof crond >/dev/null 2>&1 || crond -S -l 8
;;
esac
EOF
chmod +x /etc/kvmd/user/scripts/S20-net-selfheal-cron.sh
# Apply immediately
/etc/kvmd/user/scripts/S20-net-selfheal-cron.sh start
I’d exercise the failure scenarios if I wasn’t this close to losing remote access to my RM-1 once today already! ![]()