Posted: Sun Nov 27, 2022 6:34 Post subject: When SFE is disabled, DNS stops working. How to reenable?
As soon as I disable SFE, DNS stops working and external resolution fails. Internal works. I can ping the external DNS host 8.8.8.8 can connect using:
nc -vuz 8.8.8.8 53
yet no return reply exists from external DNS. Until I enable SFE that is, though I wish to disable SFE.
What do I need to do in place of SFE to allow external DNS resolution once more when SFE is disabled? _________________ Cheers, TK
------------------------
Joined: 26 Mar 2013 Posts: 1855 Location: Hung Hom, Hong Kong
Posted: Sun Nov 27, 2022 14:53 Post subject: Re: When SFE is disabled, DNS stops working. How to reenabl
tkmds wrote:
As soon as I disable SFE, DNS stops working and external resolution fails. Internal works. I can ping the external DNS host 8.8.8.8 can connect using:
nc -vuz 8.8.8.8 53
yet no return reply exists from external DNS. Until I enable SFE that is, though I wish to disable SFE.
What do I need to do in place of SFE to allow external DNS resolution once more when SFE is disabled?
Sometimes, services were not restarted properly after you altered certain settings. A reboot usually fixed it.
You might also try using Admin->Command to 'service start dnsmasq'. By default, DNSmasq serves both DHCP and DNS. This method requires knowledge about under the hood of DD-WRT.
In fact DD-WRT's WEBUI has no explicit buttons to restart various services. It totally trusts those enable/disable buttons. _________________ Router: Asus RT-N18U (rev. A1)
Drink, Blink, Stretch! Live long and prosper! May the Force and farces be with you!
I would not have expected SFE to mess up DNS either TBH. Yet....it looks to do just so in my setup.
What I tried.....enabled DNSmasquerade two nights ago after I sent that reply, seeing how I used to have it earlier as well, and DNS resolution does work but it's very slow, likely due to caching from when SFE was enabled though. However, after a short while, DNS resolution stops.
Essentially, when I run an nslookup off the routers, just a short while after SFE is disabled, this is the result:
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
*** Request to dns01.dom.xyz timed-out
C:\>
192.168.0.100 being my main internal DNS server, which has caching enabled. Whereas off my laptop, it still resolves, maybe due to cache for the time being:
when these two options are set on my main router for DNSmasquerade:
Code:
no-resolv
server=172.87.80.1,172.87.81.1
these being the external DNS servers of the provider. Though after so much back and fourth, I'm not longer 100% if the above setting had anything to do with my issues.
So SFE is definitely doing something with the routing at the least that is affecting DNS. Result of tcpdump on the router, shows that the packet from the DNS nslookup command is sent out but nothing returns. Whereas as soon as I enable SFE, this resolution works just fine. Lost packet example below:
Code:
root@DD-WRT-INET:~# tcpdump -s0 -na "port 53 and ( host 192.168.0.100 or host 192.168.0.101 or host 192.168.0.102) and host 8.8.8.8"
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:37:53.112919 IP 192.168.0.101.51396 > 8.8.8.8.53: Flags [SEW], seq 4266102126, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
13:37:56.114436 IP 192.168.0.101.51396 > 8.8.8.8.53: Flags [SEW], seq 4266102126, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
13:38:02.114027 IP 192.168.0.101.51396 > 8.8.8.8.53: Flags [S], seq 4266102126, win 8192, options [mss 1460,nop,nop,sackOK], length 0
The internal DNS servers responsible for resolving internal and external hosts:
Now, after some trial and error, reenabled SFE and this brings resolution back to a working state. While doing so, noticing SIRQ /SYS is super high as well once more:
I've poked around for a bit longer. The 3 DNS servers above are behind the router where I'm disabling SFE. I've now disabled DNSmasquerade as well. I've opened up the F/W as well on this router as it was showing dropped on port 53, yet ping works but DNS resolution still doesn't when SFE is disabled despite the F/W change. Tried DNSmasq options below:
Doesn't work to fix DNS resolution either when SFE is turned off.
Fast forward a few hours of troubleshooting. Now when SFE is enabled again, the load is going up progressively. It's now 19+. Seems as if there is a process locking all this up raising both the load and SIRQ's. If I'm making sense. Can't save or kill anything on the router. "OSPF Router" setting keeps getting reverted to "vtysh OSPF BGP RIPD Zebra" router (appears briefly while load is high) which has been an issue for some time in the UI switching between the two modes inexplicably. This load/SIRQ behavior is new however. Can't SSH into the unit which is a hint SSH is waiting on something. Not yet sure what.
What's interesting, this happens when visiting news page off various networks, particularly the Yahoo home page, or any of the other popular news outlets like CNN, ABC, pages with plenty of advertisements etc.
Need to read more about SFE (Shortcut Forwarding Engine) to understand this DNS behavior better and a bit more on why SIRQ and LOAD is consistently climbing.
Very weird! _________________ Cheers, TK
------------------------
Quick update. Recabled around the main router. Now everything works much better. (See image please).
When cabled in the first diagram w/ SFE enabled, resolution is fast and SIRQ is kept to a minimal. Not yet 100% sure what was happening with the second cabling job. Need more time to dig into it then I had in this session.
Summary:
- Running tcpdump's certainly increases SIRQ's and load significantly. So does tailing log files and tracing connections it appears.
- A less then optimal (ad-hoc) cabling job will lead to a router's ever increasing load till it locks up. (Can probably fix with some software tweaks but who has time for that when things work). Not necessarily a router issue.
- Switching routers around AFTER originally setup in one topology, to another topology, will cause these issues.
- DNS caching works wonders for speed!
Joined: 18 Mar 2014 Posts: 12837 Location: Netherlands
Posted: Mon Nov 28, 2022 7:22 Post subject:
SFE is getting confused in your setup and does not route properly, that can even happen when you are using a VAP or Portforwarding via more than one interface etc.
SFE only works reliably for a simple setup with one WAN interface and one LAN (br0) interface), in all other setups there is the risk of routing problems.
So SFE is masking the fact that there is something wrong in your setup.
Would be good if I can drop SFE, but it doesn't want to go. Job security. Makes all my routing work for the time being.
Need to know how SFE works first to see what it could be doing to affect things this significantly. Then I can disable it. _________________ Cheers, TK
------------------------
I'm at 101,475 every 30 seconds which is about 3382.5/s on idle. vlan2@eth0 is my internet facing NIC with the public IP. _________________ Cheers, TK
------------------------
I guess, I'm getting closer to figuring out how SFE routes traffic. Just to focus the discussion I've boiled this down to a simplified view and setup to figure the immediate 'ping' issue when SFE is disabled:
Whereas if I run a ping from the SECONDARY ROUTER (R2):
Code:
# ping 9.9.9.9
I see requests but no replies meaning any machines behind the INET router, can't ping out when SFE is disabled, hence the DNS resolution issue (F/W is disabled or all permissive in the tests here):
The goal for me is to eliminate the dependence on SFE to eliminate the high SIRQ SFE introduces. I need to find an alternative way to route traffic when SFE is disabled (ie DNSmasq or other means). Unable to find that yet. _________________ Cheers, TK
------------------------
Did a bit more interim digging after reaching even higher numbers. Interrupt delta spiked to 115,927/30s or 3864.23/s. Another sample revealed 154,341/30s which is 5144.7/s. (I suppose now I know what this router is capable of. )
Enabled, High and Enabled for all Dropped, Rejected and Accepted.
I also had a handfull of 'logaccept' in my iptables which I just switched to 'accept'. Since I had remote logging enabled as well, the router was basically drinking from a fire hose everytime a page got loaded.
So things are stable and working well with SFE right now.
You'd suppose that would eliminate SFE as the culprit. I suppose. However, now that @egc put me up to removing SFE ( ) I still would like to eliminate SFE to see what the behavior is like without it. I've disabled SFE from all my other DD-WRT routers but can't do so from the main one (INET / PRIMARY ROUTER) without loosing the ability to browse ( or in this case ping ) IP's from other connected routers such as the SECONDARY ROUTER in the above image. _________________ Cheers, TK
------------------------