GL-MT5000 (Brume 3) upstream OpenWrt support

@bruce Thank you for publishing rtl8366ub_mdio.c/.h — this is genuinely appreciated, and it turns out to be more useful than you may have expected: the wrapper you just published contains the root cause of the VLAN PVID bug I reported. The bug is not in Realtek's NDA'd SDK — it's in this GPL glue layer, so you can fix it today without waiting on Realtek.

The bug — rtl8366ub_sw_set_vlan_ports():

port = &val->value.ports[0];
for (i = 0; i < val->len; i++, port++) {
    RTK_PORTMASK_PORT_SET(vlan.mbr, port->id);
    if (!(port->flags & BIT(SWITCH_PORT_FLAG_TAGGED)))
        RTK_PORTMASK_PORT_SET(vlan.untag, port->id);
    if (EXT_PORT0 == port->id || EXT_PORT1 == port->id) {
        continue;
    }
    /*
    * To ensure that we have a valid MC entry for this VLAN,
    * initialize the port VLAN ID here.
    */
    err = rtk_vlan_portPvid_get(port->id, &pvid, &port_Pri);
    if (err < 0)
        return err;
    err = rtk_vlan_portPvid_set(port->id, val->port_vlan, port_Pri); // <-- BUG
    if (err < 0)
        return err;
}

This sets the PVID of every non-EXT member port to the VLAN currently being applied — including tagged members. Since netifd applies switch_vlan sections in ascending VID order at boot and on every switch reapply, a trunk port that is a tagged member of VLANs 1/20/30/40 ends up with PVID=40 instead of its configured PVID. That's exactly what I captured in my report: dmesg shows the [rtl8366ub_sw_set_vlan_ports] vid=1 … vid=20 … vid=30 … vid=40 cascade (the pr_info a few lines below this loop), after which swconfig dev switch0 port 1 get pvid returns 40. Untagged clients then land on the wrong VLAN and get DHCPNAK'd.

The fix is one line, and upstream already has it. This loop — including the "To ensure that we have a valid MC entry" comment, verbatim — comes from OpenWrt's GPL rtl8366_smi.c. But upstream guards the write so it only initializes a port that doesn't have a PVID yet (openwrt-21.02, rtl8366_sw_set_vlan_ports [[LINK:index]](openwrt/target/linux/generic/files/drivers/net/phy/rtl8366_smi.c at openwrt-21.02 · openwrt/openwrt · GitHub)):

err = rtl8366_get_pvid(smi, port->id, &pvid);
if (err < 0)
    return err;
if (pvid == 0) { // <-- the guard your fork dropped
    err = rtl8366_set_pvid(smi, port->id, val->port_vlan);
    if (err < 0)
        return err;
}

Your version calls rtk_vlan_portPvid_get() but then only uses the priority and overwrites the PVID unconditionally. Restoring the equivalent guard (if (pvid == 0) — or simply not touching PVID for tagged members) fixes the trunk-port behavior; set_port_pvid already exists for explicit PVID configuration.

A second, smaller bug while you're in this file — the enable_vlan get/set handler bodies are swapped:

static int rtl8366ub_sw_set_vlan_enable(...) {
    ...
    val->value.i = gsw->global_vlan_enable; // "set" handler only READS
    return 0;
}

static int rtl8366ub_sw_get_vlan_enable(...) {
    ...
    gsw->global_vlan_enable = val->value.i != 0; // "get" handler only WRITES
    return 0;
}

This is why swconfig dev switch0 set enable_vlan 1 is a no-op and get enable_vlan always reads 0 on fw 4.8.6, despite uci network.@switch[0].enable_vlan='1' — something several of us independently hit while debugging this.

8 Likes