Posted: Fri Jul 15, 2022 4:09 Post subject: [SOLVED] Help needed to use DD-WRT as a failover router
Hello fellow DD-WRT people !
I am thinking of setting up an old DD-WRT router I have laying around (a good old TP-Link TL-WDR3600 running DD-WRT v3.0-r49492 std (07/14/22)) as a failover/backup router + WIFI AP for when the power goes out and/or when I lose my pfSense or LAN switch for whatever reason. I currently already have the UPS setup for the LAN switch and pfsense box but the server running pfsense makes it last only about 15 minutes and I am looking for something just to keep WIFI devices up during power outages.
Here is my setup:
- Modem -> current router (pfSense in a computer) -> LAN switch -> WIFI AP, PC, etc
I would like to add a second link from the modem to my old DD-WRT router (WAN port) to act as a failover router, and a link between the DD-WRT router (LAN port) to the LAN switch.
The DD-WRT would have a reserved LAN IP on the same subnet as the regular LAN. I would like it to monitor the connectivity to the pfsense gateway (ping every X seconds), and as long as the pfsense is up, the DD-WRT's DHCP server and WIFI would be disabled. If the pfsense box stops responding (because of power outage, maintenance, or other, the DD-WRT router would turn on the WIFI and DHCP server. The DHCP server would give IPs in a reserved range not included in the pfsense DHCP server range, and provide a different gateway (the DD-WRT's LAN IP address instead of the pfsense IP) so that the trafic could go out and make it as transparent as possible. Once connectivity to the pfsense box is re-established, WIFI and DHCP server would be disabled again.
I'm pretty sure there must be some kind of shell scripting / cron job combo to do this, but I am unsure about the right commands. Do you guys have any idea on how to achieve this ?
Something along the lines of
Code:
ping 192.168.1.1
if ping = true {
verify if dhcp is running {
true = kill dhcp
}
verify if wifi is running {
true = kill wifi
}
}
if ping = false {
verify if dhcp is running {
false {
start dhcp
wait 5 seconds
}
}
verify if wifi is running {
false = start wifi
}
end
Thanks for your help !
viz
Last edited by vizi0n on Mon Jul 18, 2022 0:23; edited 2 times in total
Joined: 08 May 2018 Posts: 14217 Location: Texas, USA
Posted: Fri Jul 15, 2022 4:21 Post subject:
Please consider upgrading to the current release. We generally don't focus on anything until you've upgraded since everything in the router database is outdated and not generally supported in the forum.
Please consider upgrading to the current release. We generally don't focus on anything until you've upgraded since everything in the router database is outdated and not generally supported in the forum.
Thanks for the heads up. First thing I did when I dug out the router from the storage bin was to upgrade the 2016 firmware to the latest one on the database. It is now upgraded to DD-WRT v3.0-r49492 std (07/14/22)
For failover purposes, I'd be more inclined to have the primary router manage this process entirely. Let me explain.
If we're talking about a power failure, it seems to me we can safely assume NOTHING is going to work. IOW, this isn't typically a situation where some things work and others don't. Either everything's up and running, or nothing is, at least locally.
Given the above, what I would do is simply establish another gateway on the existing network for failover purposes. The primary router would use its own WAN until such time it became unresponsive, then change the default gateway to the other router. Similarly, it would monitor the WAN for recovery and change the default gateway back to itself.
All in all, NOT very complicated. Certainly no more complicated than what happens when you configure a change in the default gateway on the primary router w/ the OpenVPN client. It just happens that the change in the default gateway is a virtual network interface on the same device. But the logic is exactly the same in terms of monitoring.
What the OP is describing is something vastly more complicated, because it does more than just change the default gateway. It changes the DHCP server, wifi APs, perhaps DNS, etc. IOW, it's a wholesale change in what is effectively the primary router, including all its services. That just seems to overly complicate matters. And as I said before, I don't see where you're going to have the primary router w/o power, and the failover router w/ power anyway in order to justify this configuration.
The ONU and the TP-Link are on a 12v UPS connected to a car battery. If power fails, the ONU and the TP-Link will remain powered on for hours, even maybe a day due to the very low load they create and the huge capacity of the battery. The failover is for electrical outage, not network outage. The detection of the power outage is actually done by probing a 24/7 device on the network (the regular router) with ping.
The purpose of this is to keep WIFI services functional so the wife and kids can use their tablet and I can still work with my laptop. I can't rely on the switch to have hot/standby routes as the switch and server hosting the router are on the same UPS, and they will power down at the same time. Nothing else from the network will go through them, because nothing else will have power. That is why I need to find the commands to start/stop the wifi and DHCP.
I think that you are seeing this much more complex than it actually is. In bash scripting this would be relatively easy with the right syntax, which I don't have in this case.
I could just leave the DD-WRT with its own SSID, and connect to it "if needed", but I would prefer the other approach to lower the chances of interferences, and devices sticking to the wrong SSID
dhcpRunning=$(cat /tmp/dnsmasq.conf | grep -c "dhcp")
# dhcpRunning returns 1 if not running, returns 6 if running
wlanRunning=$(ifconfig | grep -c "wlan")
# wlanRunning returns 0 if WIFI disabled, returns 2 if WIFI enabled (wlan0 wlan1)
if [ $pingResult -eq 100 ]
then
echo "Main router DOWN"
# If 100% packetloss enable DHCP server and enable WLAN interfaces
if [ $dhcpRunning -eq 1 ]
then
stopservice dnsmasq
cp /tmp/root/dnsmasq.conf.dns-dhcp /tmp/dnsmasq.conf
startservice dnsmasq
echo "DHCP server is now ENABLED"
fi
if [ $wlanRunning -lt 2 ]
then
ifconfig wlan0 up
ifconfig wlan1 up
echo "WIFI is now ENABLED"
fi
else
echo "Main router UP"
# If less than 100% packetloss, make sure DHCP server and WLAN interfaces are disabled
if [ $dhcpRunning -gt 1 ]
then
stopservice dnsmasq
cp /tmp/root/dnsmasq.conf.dns-only /tmp/dnsmasq.conf
startservice dnsmasq
echo "DHCP server is now DISABLED"
fi
if [ $wlanRunning -gt 0 ]
then
ifconfig wlan0 down
ifconfig wlan1 down
echo "WIFI is now DISABLED"
fi
fi
I also made a copy of /tmp/dnsmasq.conf with and without DHCP enabled in /root, which I copy to overwrite the existing config file used to launch dnsmasq.
Now I just need to figure out the cron to run this script every 15 seconds and I'm all set. As long as files dont disappear after a reboot!
You should be able to use:
stopservice dnsmasq
startservice dnsmasq
I've got it quite functional. I just need to set the storage and crontab. Seems like this router does not have enough memory for a jffs storage, so I'll be getting a USB stick
So, I have finally completed my watchdog/failover DD-WRT router. I had to connect a USB flash drive so the router keeps the scripts over reboots because this router does not support a built-in JFFS2 partition. The files are all located in the /opt directory.
My ONU (modem) has 2 ethernet ports and is just a layer 2 device.
- ONU Port 1 is connected to my main router
- ONU Port 2 is connected to my backup (DD-WRT router)
- Both routers are connected to the same LAN switch and are on the same subnet. Main router has .1 IP and DD-WRT has .254
- DHCP provides the corresponding gateway (main router provides .1 as gateway, DD-WRT provides .254 as gateway).
- Each DHCP has its own range of IPs that do not conflict with each other (main router .100-.149 and DD-WRT .200-.249)
The transition is smooth and should happen within 20 seconds of failure of the main router and/or main WIFI access points
It essentially works like this :
Step 1
A cron job script is created to allow running the verification twice per minute. This script is located in /opt/monitor_cronjob.sh and it calls the actual monitoring script.
Content of monitor_cronjob.sh
Code:
#!/bin/sh
for i in 0 1; do /opt/monitor_lan.sh & sleep 25; done; /opt/monitor_lan.sh
Step 2
What the /opt/monitor_lan.sh script essentially does is the following:
- If primary router is unracheable, DD-WRT stops dnsmasq, replaces dnsmasq config file by the one that includes the DHCP config, and restarts dnsmasq to enable the DHCP server
- If primary router is reachable, DD-WRT stops dnsmasq, replaces its config by the one without DHCP config, and restarts dnsmasq to only keep the DNS server feature
- If both WIFI access points are unreachable, DD-WRT starts both WIFI interfaces
- If at least one WIFI access point is reacheable, DD-WRT shuts both WIFI interfaces
Please bear with me, I am not a programmer and I know there is room for optimization and combining stuff. I just don't know enough to do that.
wlanRunning=$(ifconfig | grep -c "wlan")
# wlanRunning returns 0 if WIFI is disabled, returns 1 or 2 if WIFI is enabled (interfaces wlan0 wlan1)
if [ $ping_ap1 -eq 100 ] && [ $ping_ap2 -eq 100 ]
then
echo -e "${YELLOW}LAN Access Points${ENDCOLOR} : [ ${RED}DOWN${ENDCOLOR} ]"
verifyWIFI
if [ $wlanRunning == "DISABLED" ]
then
echo -e "${YELLOW}ENABLING${ENDCOLOR} : DD-WRT WIFI"
ifconfig wlan0 up
ifconfig wlan1 up
echo -e "${YELLOW}DD-WRT WIFI${ENDCOLOR} : [ ${GREEN}ENABLED${ENDCOLOR} ]"
fi
else
echo -e "${YELLOW}LAN Access Points${ENDCOLOR} : [ ${GREEN}UP${ENDCOLOR} ]"
verifyWIFI
if [ $wlanRunning == "ENABLED" ]
then
echo -e "${YELLOW}DISABLING${ENDCOLOR} : DD-WRT WIFI"
ifconfig wlan0 down
ifconfig wlan1 down
echo -e "${YELLOW}DD-WRT WIFI${ENDCOLOR} : [ ${RED}DISABLED${ENDCOLOR} ]"
fi
fi
echo -e "==============================================="
Step 3
I also needed to keep a copy of the dnsmasq config with and without DHCP enabled. These files are also located in /opt but keep in mind that your config in these files will vary.
The DHCP server lease time is very short (5 minutes) and also has a different range than the one on the primary router. It is this short to allow for a smooth swithover back to the regular router once it is back online.
Step 4
A cronjob (added in the Administration page of DD-WRT) runs every minute
Code:
* * * * * root /opt/monitor_cronjob.sh
And that's it ! It just works
I've thought I could share in case someone else wants to do somthing similar in the future. It's always nice to find some info in the forums !