For as yet unknown reasons the rekey cycle will sometimes generate thousands of user.info 'timer' log entries along with high system load and the occasional wifi and/or system hang.
When a 'timer' entry is caught, the script increments rekey values by 1 for easy event tracking and restarts nas. Temporarily setting rekey values to '360' makes for a quick sanity check.
Tested with Asus RT-N66U / v3.0-r36808 mega.
Cheers,
Update: Still present in 39xxx builds, timer events seem to degrade system usability if not caught/reset.
Last edited by lazardo on Mon Jul 29, 2019 20:23; edited 11 times in total
Joined: 17 Apr 2014 Posts: 135 Location: SF Bay Area
Posted: Tue Sep 11, 2018 19:53 Post subject:
jpp wrote:
Do i have to enter this script (the first part under Code, i imagine the second part is a log) in administration -> commands? Then save startup?
JP
Yes, BUT, check the code before running as there are escape rules in the GUI 'commands' section that can be tricky so I don't use it for anything but one liners.
I would suggest copy/paste to a different computer then use ssh/scp/putty to move to /tmp on the router.
Remember that /tmp contents do not survive a reboot.
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Wed Sep 12, 2018 0:37 Post subject:
The thing is, the nas, gtk, KRACK fixes are related to client modes mainly, from my best understanding. In router / gateway / AP modes, this shouldn't be an issue IMHO, but it is. Seems like the timer fixes and all haven't really fixed things (yet). Anyhow, can this also be saved as a custom script in the webUI and used, if one chooses? Not sure if I want to add this to my startup script as I already have one, and since I am not using any client modes, it's probably moot, I presume. Awesome workaround / fix for this problem, though, thanks for the info!
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Wed Oct 17, 2018 13:10 Post subject:
Ok, the 'update' at the top of your last post is a little confusing, looking at the 'updated' script towards the bottom of the post. Could you please verify the changes and updated script, please?
EDIT: 'nvram show | grep _gtk_rekey' shows values for vifs / vaps that are not configured / enabled. I am currently testing 'nvram set' via cli, setting those values to 0 to see if it affects problems that seem to have recently arisen with wi-fi stability, because I think even though I have the main interfaces (wl0 and wl1) re-key set to 0, those values were being read and affecting things.
I did not have any NAS and yet there are plenty of these messages and LAN/WiFi are stuck sporadically for 2-20 seconds. Will it be enough for me just to stop nas and wlconf services?
stopservice nas
stopservice wlconf
Router Model Asus RT-AC66U
Firmware Version DD-WRT v3.0-r37442 giga (10/19/1
Kernel Version Linux 3.10.108-d6 #21410 Fri Oct 19 15:34:31 CEST 2018 mips
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Tue Dec 25, 2018 9:57 Post subject:
danielwritesback wrote:
P.S.
The error was installed at 35667; so, earlier versions don't need the script.
Supposedly, that was a 'bugfix' to $linuxver/net/ipv4/etherip.c that was backported all the way to 3.2 (in DD-WRT).... but, it introduced another bug. Now, the question is, was it a fix from upstream vanilla kernels or from the linux-MIPS community? Also, I am curious how you determined that it was that specific revision?
P.S.
The error was installed at 35667; so, earlier versions don't need the script.
Supposedly, that was a 'bugfix' to $linuxver/net/ipv4/etherip.c that was backported all the way to 3.2 (in DD-WRT).... but, it introduced another bug. Now, the question is, was it a fix from upstream vanilla kernels or from the linux-MIPS community? Also, I am curious how you determined that it was that specific revision?
Version 35531 didn't have much errors, and it is quite famous for being a recent stable 'go to' version. After that point, the new build threads can be roughly translated as something about hell and a handbasket. So, I decided that the error was introduced in the version(s) after 35531. It was a somewhat educated guess also involving a bit of research (not too conclusive due to small number of reports posted).
Speaking of educated guess! Oh, I need your help in those top secret dnsmasq commands, as in whatever makes it work well aboard the e4200.
I'd love to try it.
Joined: 08 May 2018 Posts: 14125 Location: Texas, USA
Posted: Thu Dec 27, 2018 15:28 Post subject:
danielwritesback wrote:
Speaking of educated guess! Oh, I need your help in those top secret dnsmasq commands, as in whatever makes it work well aboard the e4200.
I'd love to try it.
The only problems I am having right now is due to the APs always being 'isolated', or something. DNSMasq tends to be a little screwy (again), but ONLY on wi-fi, not wired. I think the last good running build I had was 37442 or 37582 K3X, didn't seem to be too problematic. Below is how I have it set up, I don't use local DNS or the DNSSEC cache (that may be where I might be having problems?). Also, I am using OpenDNS servers via the additional options box with no-resolv directive:
...DNSMasq tends to be a little screwy (again), but ONLY on wi-fi, not wired....
I bet that is related to wi-fi connectivity in some way. If a wifi client wants dns but can't get through, there can be a cyclic retry issue.
Forced redirect may make that get worse: If you have a google home that really wants 8.8.8.8, connection count skyrockets out of control (unless you block 8.8.8.8 using reject). In case you needed the redirect, here's a cpu efficient version, without dnat.
iptables -t nat -A PREROUTING -i br0 -p udp --dport 53 -j REDIRECT
This an example of redirect to myself command (and dnsmasq answers port 53). If you have an optional time server installed in your router, same command will do for port 123. The dnat version (in the dd-wrt checkbox) is only needed if you want to redirect to a different locale.
Client on a retry spree + dnat + wifi = wee little mushroom cloud.
Here's per client restriction to prevent udp bombardment over wifi (startup script):
iptables -I INPUT -i eth1 -p udp -s 192.168.1.0/24 -m connlimit --connlimit-mask 32 --connlimit-above 60 -j REJECT
iptables -I INPUT -i eth2 -p udp -s 192.168.1.0/24 -m connlimit --connlimit-mask 32 --connlimit-above 60 -j REJECT
nvram set ip_conntrack_udp_timeouts=60
The only client to get 'punished' is the one out of control on a retry spree, and the suppression only lasts a minute at worst, but probably just the couple of seconds it takes for a client to back off when it gets hammered with rejects (a QOS method).
I actually have my connlimit range 192.168.1.1/25 and dhcp auto range ends at 192.168.1.126 so that connlimit is applied to it (I don't have -i interface specified). This leaves the upper half for static assignments without connlimit and that's where I put server oriented equipment that is supposed to make more numerous connections.
You can do other things with connlimit, such as if more than x number connections set to very worst QOS classification and/or limit overage connections to x number per second (trickle enough to prevent timeout). That would be very transparent, but I'm not quite that skilled with itables.
Meanwhile, back to wifi,
DNS can be the first thing to complain, even if DNS isn't the actual cause; so, if the '53 unreplied' connection count gets extreme... personally, I have to restart my cable modem to fix that. There's a variety of commands that could make DNSMasq slightly more resilient to connectivity losses; however, fixes are less effective when applied too far away from the locale of the actual problem. So, it turns in to more of a question, cause or effect?
According to my records, 36104 has working WDS, which, possibly, means more stable wifi.