Posted: Sat Jan 07, 2023 23:40 Post subject: Guest network fails every few months - needs factory reset
On one of my Netgear R6700 v3 routers, every few months, the devices connected to the Guest network fail to connect. I can see them appearing temporarily for a few seconds as Wireless Clients (either wl0.1 or wl1.1) on the Status/Wireless page but they disappear before trying again after a couple of seconds. There is no entry on the log page mentioning these aborted connect attempts.
No changes have occurred to the nvram (other than trivial things like timestamps) as I can verify by comparing the current nvram settings vs. prior ones (or even by comparing current with previous bin backups)
The only way I can get Guest networking to work again is to do a FIRMWARE UPGRADE (with the same firmware version) with a RESET TO FACTORY DEFAULTS followed by reloading the CURRENT BACKUP (i.e. backup of the same settings as had worked previously but are now not working)
Note that:
- Rebooting alone doesn't help
- Restoring nvram settings alone with previous good copy doesn't help
- Factory reset of nvram followed by restoring settings with previous good copy doesn't help
- Upgrading firmware (with same firmware version) but without resetting to factory defaults doesn't work.
- Sequentially upgrading firmware (without nvram reset) and then subsequently restoring nvram (or vice-versa) doesn't work
So it seems there is some "flakiness" that creeps in only every few months that is only reset by simultaneously doing: firmware upgrade + reset to factory defaults followed by a restore of current settings
- If it's an nvram corruption, not sure why resetting the nvram alone won't work
- If it's a flash corruption, not sure why a firmware upgrade alone (without resetting nvram) won't work.
So what is getting corrupted that only gets reset when new firmware and nvram are flashed simultaneously?
Note that I have 4 Netgear R6700 v3's on my home network. -- one is a router, other 3 are gateways but the problem only occurs on one of them (the one configured as router)
Guest networking is installed and working on all routers (each with a different 2.4GHz and 5GHz SSID and a different, isolated 192.168.x subnetwork). All have the same firmware settings except that one is a router and the others are in gateway mode.
Note all routers are running dd-wrt r49532 (but I have seen the problem with older firmwares too)
I’d capture the output of “nvram show | sort” when you reset the device, and again when it has a problem. Using diff to compare the two files will tell you exactly which nvram variables have changed.
Do you use the captive portal (nocatsplash), or just WPA-protected guest VAP? I’ve noticed that DD’s implementation of nocatsplash has some problems dealing with other changes to the firewall settings, and potentially can be far enough down the list that its tables become blocked by other entries.
I use a script run from cron to automatically put the guest access tables at the top of the firewall rules if it becomes blocked by other rules, added later.
Maybe capture “iptables -L -nv” before and after, also.
Joined: 18 Mar 2014 Posts: 12913 Location: Netherlands
Posted: Sun Jan 08, 2023 19:20 Post subject:
When you upgrade do NOT do a firmware upgrade with a RESET TO FACTORY DEFAULTS.
It can lead to bricking your router.
Upgrade and *after* upgrade reset to defaults,
Do not restore from a backup to a different build and not if you experience problems (garbage out, garbage in) put settings in Manually.
You mention Gateway setup, Router setup, it is unclear to me what you mean.
Do you have one Main router connected to the internet and 3 Wireless Access Points (WAP): A secondary router connected wired LAN<>LAN on the same subnet as the primary router?
If you have then see my personal notes how I setup these things maybe they are helpful
On occasion I have seen strange things with NVRAM corruption if the NVRAM is almost all used, that could be hardly the case as you should have 128K but worth a check.
I’d capture the output of “nvram show | sort” when you reset the device, and again when it has a problem. Using diff to compare the two files will tell you exactly which nvram variables have changed.
As I mentioned in the original post, the diff of "nvram show" (as well as a decoded binary diff of the configuration backup) shows *no* changes in the nvram variables
Quote:
Do you use the captive portal (nocatsplash), or just WPA-protected guest VAP? I’ve noticed that DD’s implementation of nocatsplash has some problems dealing with other changes to the firewall settings, and potentially can be far enough down the list that its tables become blocked by other entries.
Just WPA
Quote:
I use a script run from cron to automatically put the guest access tables at the top of the firewall rules if it becomes blocked by other rules, added later.
Maybe capture “iptables -L -nv” before and after, also.
That shouldn't make a difference after reboot since iptables should start with the same initial state given that my nvram (and firmware) have not changed. Right?
When you upgrade do NOT do a firmware upgrade with a RESET TO FACTORY DEFAULTS.
It can lead to bricking your router.
Upgrade and *after* upgrade reset to defaults,
Thanks for the warning.
HOWEVER, if I do it sequentially, then the Guest networking stays broken.
It only gets fixed if I do the firmware upgrade WITH the factory reset (though technically it's not a firmware upgrade so much as a rewriting of my existing firmware).
This is really perplexing me.
If it's just a non-volatile memory corruption issue then separately rewriting the firmware and/or the NVRAM should solve the issue.
Is it possible that doing afirmware upgrade WITH a factory reset clears/resets some bits that are not cleared/reset when done separately?
The only alternative I can think of is that the flash is flaky causing some bit rot over time and that the rotten bits are only restored when the flash is reset all at once.
But really I am just perplexed...
Quote:
Do not restore from a backup to a different build and not if you experience problems (garbage out, garbage in) put settings in Manually.
Don't worry. Not doing that.
Quote:
You mention Gateway setup, Router setup, it is unclear to me what you mean.
Ignore what I wrote there
Quote:
Do you have one Main router connected to the internet and 3 Wireless Access Points (WAP): A secondary router connected wired LAN<>LAN on the same subnet as the primary router?
Yes
Quote:
If you have then see my personal notes how I setup these things maybe they are helpful
Do you have a link?
Quote:
On occasion I have seen strange things with NVRAM corruption if the NVRAM is almost all used, that could be hardly the case as you should have 128K but worth a check.
An attachment is only visible when you are logged in.
That was the problem Thanks.
BTW, my VAP setup is pretty much the same as yours, except:
I add a bridge network 'br1' to bridge wl0.1 and wl1.1.
Specifically,
- Under Wireless/Basic, I keep the wl0.1 and wl1.1 Network Configuration = Bridged (i.e., no dhcp here)
- Under Setup/Networking, I add br1 to bridge wl0.1 and wl1.1 and then configure br1 and dhcpd the way you do for wl0.1
I also add the following to the firewall:
[Case 1: VAP on Router]
Code:
iptables -t nat -A POSTROUTING -s $(nvram get openvpn_net)/24 -o $(nvram get wan_ifname) -j MASQUERADE
ebtables -I FORWARD --logical-in br1 --logical-out br1 -j DROP
where the first line is needed to give guest access to the WAN
where the 2nd line prevents guest from communicating with each other (e.g., from a device on wl0.1 to one on wl1.1)
[Not sure if this is still necessary but in the past guests on wl0.1 were blocked from each other and guests on wl1.1 were blocked from each other, so this second line was required to block users on wl0.1 from wl1.1 and vice-versa]
where the first line NAT's the guest network traffic over the primary router network (e.g., 192.168.1.1) [this corresponds to the first line in Case 1 giving guests access to the WAN]
where the 2nd line then prevents the guest from accessing resources on the primary network that is NAT'ed to above (i.e., traffic traverses the primary LAN but can't access nodes on it)
where the final line prevents guests from communicating with each other on br1 (as in Case 1)
This is what has always worked for me but if you have any suggestions or if anything has changed to do the above automatically, please let me know.
Quote:
The only suggestion I have is reset to default and rebuild manually (restoring a backup is garbage out, garbage in)
Check with my notes for errors in your setup.
I generally try to avoid garbage-out/garbage-in by comparing the nvram settings before and after restore to make sure nothing has been corrupted but yes redoing from scratch is always cleaner/safer