GL-ARP-SCAN flint2 4.70

dragon2611 · September 25, 2024, 12:30pm

On the 4.70 beta firmware the proccess gl-arp-scan -i eth1 is using 100-150% CPU and causing the router to be unresponsive when eth1 is connected to a Starlink gen1 dish.

By Unresponsive I mean pings to the router were above 800ms with frequent timeouts, this was also seen for forwarded traffic

I put the starlink router back in and that seemed to resolve it (Although I now have an extra layer of NAT)

I'm not even sure why the Flint wants to run an ARP scan, nor can I find a setting to tell it not to.

LupusE · September 25, 2024, 4:03pm

Since it is Beta and could be a new/changed feature, the GL.iNet staff could be more helpful.

But arp-scan by default is used to discover devices in the LAN. Not really a bad idea to detect if there is something I need to route.

My question would be, why do you connect a Starling Gen1 to eth1? eth[n] is in general a LAN device. If you connect another network on this port, and arp-scan scans this 'router' with all the networks, it would be the best choice. If you create a loop, it could go berserk.
Please make sure you switched the LAN Port to a WAN port, so the local network tools are not active. See the first screenshot, the link on the bottom: Ethernet - GL.iNet Router Docs 4

dragon2611 · September 25, 2024, 4:20pm

eth1 is the physical interface name, it is set as a wan in the GL-UI and the underlaying openWRT, it's also not on the lan-bridge.

If you look at the interfaces with "ip a" you will see eth0 for the LAN 2.5G port, eth1 for the WAN 2.5G port and then lan1@eth0 .etc for the ports on the internal 1G switch.

robotluo · September 26, 2024, 6:14am

arp-scan will serve astrowarp for automatic discovery of devices on a LAN, however, it should not run automatically, including CPU loads that should not be too high.
Will be fixed and update firmware.

robotluo · September 26, 2024, 6:45am

My guess is that there is a super-large subnet in the Starlink network, which causes the scan time to be too long, which consumes a lot of CPU resources.

dragon2611 · September 26, 2024, 7:38am

It was allocated an IP from a /22 in the CGN space, so ~1k possible hosts in the subnet, tailscale is not running on the router (Mentioned this since tailscale also uses the CGN space and potentially could cause an IP overlap)

robotluo · September 26, 2024, 7:49am

Strangely, the 16-bit subnets we tested didn't have such high CPU usage either.

admon · September 26, 2024, 7:50am

Could it be that the Starlink network redirects all ARP and answers on every single host, even if there is no host for this IP? Saw something like this on some firewalls before.

dragon2611 · September 26, 2024, 7:53am

It's possible they are doing something weird with it, but that said the easy solution is probably just to avoid doing the ARP scan on the WAN interface.

I can't really think of a good reason for doing it, we'd have already ARP'd for the gateway and that's probably the only host in the subnet we care about beyond ourselves, especially so in a CG-Nat subnet

dragon2611 · October 20, 2024, 11:11am

This is still a problem on Beta4, if the Flint is directly connected to startlink, and it's the primary wan gl-arp-scan will cause 100% load on a cpu core but it also makes it unable to route properly.

I've temporarily disabled gl-arp-scan by removing the execute bit on it.

clannad · October 21, 2024, 1:36am

Can you tell us how long it takes for 100% load per minute on your device, and can you give us a rough estimate?
And aslo give the log of below shell command
ubus call network.interface.wan status

dragon2611 · October 21, 2024, 6:24am

It seemingly gets stuck like that until I kill the process

dragon2611 · October 21, 2024, 7:39am

Discussing this with @robotluo in discord, but here's my thoughts on this.

If the Gl-Inet device is the primary router it probably doesn't need to proactively arp scanning, but given these devices are also used as travel routers there are some scenarios where it may need to.

However, It can probably be done a bit more intelligently, I'd suggest taking the following into account.

Is the network range RFC1918, if not it's potentially a WAN and may either have a large allocation or do weird things with ARP requests, here be dragons proceed with caution.
Is the subnet size allocated to the interface rather large, scanning a /24 is probably no major load but what about when the interface gets a /16 or worse a /8 (I've seen the odd badly configured network that hands out an address in 10/8 and actually sets the netmask to /8!)

For a larger network it's probably not a good idea to proactively arp scan it, as even if it works you'll fill up the ARP table with hosts that the user probably doesn't even care about, but if you must then please throttle the scanning to a sensible level so not to flood the network and not to overload the router doing the scanning.

If the networks not doing some kind of ARP filtering already a large subnet is probably noisy enough with naturally occurring ARP requests

Also at the moment my starlink interface is br-lan.503 but when I originally opened this it was on eth1 using the inbuilt wan/lan switch option.

clannad · October 21, 2024, 7:52am

This is indeed a very useful suggestion. We will evaluate your suggestion before deciding whether to adopt it. Thank you so much

system · April 19, 2025, 7:52am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.