Pages go blank (with an error) after a long uptime

kennethrc · June 23, 2022, 8:58pm

I’ve been up for (according to LuCI) 11d 10h 44m 4s.

This morning I was getting the following error in red when I’d go to any of the sidebar pages. The pages seem to come back (i.e., have information in them) from time-to-time, but the error persists, and the connection is still very much up.

kennethrc · June 23, 2022, 9:00pm

And there’s nothing in the dmesg to indicate any issues:

...
[ 6423.194674] br-lan: port 1(eth1) entered forwarding state
[ 6423.199293] br-lan: port 1(eth1) entered forwarding state
[ 6431.204413] br-lan: port 1(eth1) entered forwarding state
[179474.193860] nss-dp 3a001400.dp3 eth2: PHY Link is down
[179474.194426] br-lan: port 2(eth2) entered disabled state
[179475.193798] nss-dp 3a001400.dp3 eth2: PHY Link up speed: 1000
[179475.194095] br-lan: port 2(eth2) entered forwarding state
[179475.198565] br-lan: port 2(eth2) entered forwarding state
[179483.203675] br-lan: port 2(eth2) entered forwarding state
[179790.203758] nss-dp 3a001400.dp3 eth2: PHY Link is down
[179790.204188] br-lan: port 2(eth2) entered disabled state
[179791.203778] nss-dp 3a001400.dp3 eth2: PHY Link up speed: 1000
[179791.203853] br-lan: port 2(eth2) entered forwarding state
[179791.208548] br-lan: port 2(eth2) entered forwarding state
[179799.223682] br-lan: port 2(eth2) entered forwarding state
root@GL-AXT1800:~# cat /proc/uptime
989305.05 3834103.04
root@GL-AXT1800:~#

xize11 · June 23, 2022, 11:28pm

Hmm and does your log show something about the repeater?

Noticed only once ghosting on the wifi when repeater mode was on and only because I set the 5G channel on the AP side higher then I also got this web error but when I reconnected to wifi it was working again.

kennethrc · June 24, 2022, 4:52am

I’m connected wired to my Hotspot this entire time; the LTE signal was really good here (RSRQ -9 dB) so I just used that (315.65 GB so far!)

yuxin.zou · June 27, 2022, 2:44am

This error message is displayed when the router API does not return as expected. This is due to an unstable web connection to the router or an abnormal process on the router.
Process exceptions are usually resolved after a reboot.

If you would like to help us troubleshoot further, please follow these steps to identify which process is abnormal and we will try to reproduce the problem if possible.

Press ctrl+shift+i on this page to open the debugging tool
Select “Network”.
Wait for the red “rpc” to appear in the list, it will appear when the error message is displayed
Click on the payload on the left and copy the content
Finally, go to the “Log” page, click export log

kennethrc · July 4, 2022, 8:28pm

@yuxin.zou - sorry for taking so long to reply; my SlateAX is now on the other side of the country, but I’l be back there soon. The issue is most of the time I don’t have such long uptimes to see this issue manifest, but we’ll see how long on my next trip.

jeffsf · July 5, 2022, 4:06am

I’m seeing similar errors with mine sitting here on the bench.

Force-reloading the browser page does not help

BusyBox v1.33.1 (2022-06-03 08:21:49 UTC) built-in shell (ash)

  _______                     ________ __ ______ __
 |       |.-----.-----.-----.|  |  |  |__|   ___|__|
 |   -   ||  _  |  -__|     ||  |  |  |  |   ___|  |
 |_______||   __|_____|__|__||________|__|__|   |__|
          |__| W I R E L E S S   F R E E D O M
 ---------------------------------------------------
 ApNos-349ddbc4-devel
 OpenWrt 21.02-SNAPSHOT, r16273+114-378769b555
 ---------------------------------------------------
root@GL-AXT1800:~# uptime
 21:42:50 up 5 days, 11:24,  load average: 0.09, 0.04, 0.00

Connectivity is present as expected

root@GL-AXT1800:~# ping www.google.com
PING www.google.com (216.58.214.4): 56 data bytes
64 bytes from 216.58.214.4: seq=0 ttl=107 time=146.626 ms
64 bytes from 216.58.214.4: seq=1 ttl=107 time=147.743 ms
64 bytes from 216.58.214.4: seq=2 ttl=107 time=146.702 ms
^C
--- www.google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 146.626/147.023/147.743 ms

2022-06-06 4.0.0 build
BUILD_ID="r16273+114-378769b555"

Chrome developer console shows repeatedly, roughly every 10 seconds:

app.75d8ac5e.js:25          POST https://192.168.8.1/rpc 500 (Internal Server Error)
(anonymous) @ app.75d8ac5e.js:25
t.exports @ app.75d8ac5e.js:25
t.exports @ app.75d8ac5e.js:23
Promise.then (async)
l.request @ app.75d8ac5e.js:9
l.<computed> @ app.75d8ac5e.js:9
(anonymous) @ app.75d8ac5e.js:9
(anonymous) @ app.75d8ac5e.js:32
n @ app.75d8ac5e.js:32
getSystemStatus @ VM6195:7
getSystemStatus @ VM6195:7
eval @ VM6195:7
setTimeout (async)
getSystemStatusTimeout @ VM6195:7
getSystemStatus @ VM6195:7
eval @ VM6195:7
setTimeout (async)
getSystemStatusTimeout @ VM6195:7
getSystemStatus @ VM6195:7
eval @ VM6195:7
setTimeout (async)
getSystemStatusTimeout @ VM6195:7
getSystemStatus @ VM6195:7
eval @ VM6195:7
setTimeout (async)
getSystemStatusTimeout @ VM6195:7
getSystemStatus @ VM6195:7
eval @ VM6195:7
21:08:58.751 app.75d8ac5e.js:15 Uncaught (in promise) Error: Request failed with status code 500
    at t.exports (app.75d8ac5e.js:15:64567)
    at t.exports (app.75d8ac5e.js:23:93725)
    at o.onreadystatechange (app.75d8ac5e.js:25:102839)

Typical RPC payload when failing

{jsonrpc: "2.0", id: 141, method: "call",…}
id: 141
jsonrpc: "2.0"
method: "call"
params: ["JL7CRbzeZJdUsdJrlkQ9RPZo4rlhUxpC", "system", "get_status", {}]

The ["JL7CRbzeZJdUsdJrlkQ9RPZo4rlhUxpC", "wifi", "get_status", {}] call seems to succeed

Exporting the log fails. Manually copied from the window
2022-07-04-axt1800.log.zip (3.5 KB)

There was an ISP outage on the 3rd as I recall. The AXT1800 would have been getting a valid DHCP assignment with connectivity to the gateway, but there would have been no upstream connectivity past that.

Machine has not been rebooted and can explore further if needed.

alzhao · July 5, 2022, 10:03am

Unfortunate the log does not contains useful info.

The UI died in your case.

jeffsf · July 5, 2022, 4:46pm

Rebooting then and I’ll continue to watch.

I’ll try to “simulate” an upstream outage later on to see if that is somehow related.

jeffsf · July 11, 2022, 4:07pm

The problem has returned. It may be related to loss of upstream connectivity as there was Comcast connectivity loss last night. When I explicitly blocked the connection locally earlier, it did not trigger the problem. This may be due to the explicit block was a firewall “drop” without any ICMP being sent. I’ll continue to watch.

Nothing obvious in the logs, periodic DHCP renewal form upstream, mwan3track detects the ping loss, executes ifdown on wan, then brings it back online when the upstream outage clears.

root@GL-AXT1800:~# uptime
 09:08:57 up 5 days, 23:21,  load average: 0.00, 0.01, 0.00
root@GL-AXT1800:~# cat /etc/os-release 
NAME="OpenWrt"
VERSION="21.02-SNAPSHOT"
ID="openwrt"
ID_LIKE="lede openwrt"
PRETTY_NAME="OpenWrt 21.02-SNAPSHOT"
VERSION_ID="21.02-snapshot"
HOME_URL="https://openwrt.org/"
BUG_URL="https://bugs.openwrt.org/"
SUPPORT_URL="https://forum.openwrt.org/"
BUILD_ID="r16273+114-378769b555"
OPENWRT_BOARD="ipq807x/ipq60xx"
OPENWRT_ARCH="arm_cortex-a7"
OPENWRT_TAINTS="no-all busybox override"
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"
OPENWRT_DEVICE_PRODUCT="Generic"
OPENWRT_DEVICE_REVISION="v0"
OPENWRT_RELEASE="OpenWrt 21.02-SNAPSHOT r16273+114-378769b555"

JerryZhao · July 12, 2022, 3:39am

Pls show the nginx’s log.

cat /var/log/nginx/error.log

jeffsf · July 12, 2022, 4:44am

I’ve since rebooted, but will capture the nginx logs next time.