Posted: Sun May 08, 2022 20:57 Post subject: [SOLVED] WireGuard Failover member/Watchdog
I noticed that the resetted of the WG is not entirely correct on my router.
after all the tunnels are down, in the list, the router resetted the WG service (as written in the instructions and it can be seen from the logs)
but after resetted the WG service, the first tunnel does not connect, only the second one connects, and so it will repeat.
But it's worth clicking the “Apply Settings” - everything works again as it should!
I also noticed that after frequent switching between tunnels, some ports stop working, only a full reboot of the router helps.
How to fix it?
My router: Asus RT-AC68U C1
Firmware: DD-WRT v3.0-r48741 std (04/26/22)
Last edited by urasic on Wed May 11, 2022 11:27; edited 1 time in total
Joined: 18 Mar 2014 Posts: 12837 Location: Netherlands
Posted: Mon May 09, 2022 6:37 Post subject:
First you have to research why your tunnels are failing.
WireGuard is very resilient and should pick up a lost connection within seconds.
There are providers with very busy servers which kick users off or if you do not use it will let your configuration expire
Proton free WireGuard expires even in one day (but it is free)
So sometimes it is necessary to use fail over and/or watchdogs but having multiple tunnels fail is strange perhaps you need a better provider.
In the Trouble shooting section (It is in the WireGuard Server setup guide) there are some tips to look into especially this:
Quote:
When running multiple tunnels make sure the Local (Listen) port is unique.
Also note that running multiple tunnels to the same provider with the same Local Public key will not always work as the Local Public key is part of the (crypto-key) routing
Things you can try:
1. upgrade to the latest build (there are some changes in WireGuard see the changelog) reset to defaults *after* upgrade and put settings in manually, never restore from a backup.
2. instead of only resetting WireGuard, let the router reboot on fail of the last tunnel with:
nvram set wg_onfail_reboot=1
I just did a test with my own router (R6400v1 running 48831) and it works as advertised, but without further information it is difficult to see what is the problem.
In the trouble shooting section there is a list what information to provide, as a minimum provide a screenshot of the tunnels page and the logs:
grep -i wireguard /var/log/messages
Tunnels are failing because I turn them off myself (on my servers) - simulating shutdown problems to test automatic tunnel switching feature.
The problem of switching tunnels was solved - thanks to your recommendations.
The problem was in the assignment of the «Listen Port»- I specified the port that is assigned on the server in the WG settings and the switch began to work as it should.
But the problem of port forwarding remains, I will describe it again
After switching to the second tunnel, port forwarding does not work. For example, ssh connection stops working. (But there is internet)
If second tunnel is down, WG will reset. The first tunnel will connect (as it should), but the ports will not be forwarded either. Only restarting the router helps.
You recommended reboot on fail of the last tunnel. Unfortunately, this will not help, since the reboot will be only after the fall of the last tunnel!
My settings and log
Code:
grep -i wireguard /var/log/messages
Jan 1 03:00:19 DD-WRT user.info root: WireGuard tunnel oet1 is the fail over group start tunnel
Jan 1 03:00:19 DD-WRT user.info root: WireGuard number of non failed tunnels in fail set: 2
Jan 1 03:00:20 DD-WRT user.info root: Enable WireGuard interface oet1 on port 53849
Jan 1 03:00:20 DD-WRT user.info root: Establishing WireGuard tunnel with peer endpoint 188.72.209.146:53849
Jan 1 03:00:20 DD-WRT user.info root: WireGuard setting route for oet1 to endpoint 188.72.209.146:53849 via 0.0.0.0 dev vlan2
Jan 1 03:00:20 DD-WRT user.info root: WireGuard 10.66.66.3/32 added to oet1
Jan 1 03:00:20 DD-WRT user.info root: WireGuard fd42:42:42::3/128 added to oet1
Jan 1 03:00:21 DD-WRT user.info root: WireGuard set /tmp/oet.lock
Jan 1 03:00:21 DD-WRT user.info root: WireGuard Killswitch activated for all clients!
Jan 1 03:00:25 DD-WRT user.info root: WireGuard tunnel oet1 is the fail over group start tunnel
Jan 1 03:00:25 DD-WRT user.info root: WireGuard number of non failed tunnels in fail set: 2
Jan 1 03:00:25 DD-WRT user.info root: Enable WireGuard interface oet1 on port 53849
Jan 1 03:00:25 DD-WRT user.info root: Establishing WireGuard tunnel with peer endpoint 188.72.209.146:53849
Jan 1 03:00:25 DD-WRT user.info root: WireGuard setting route for oet1 to endpoint 188.72.209.146:53849 via 192.168.100.1 dev vlan2
Jan 1 03:00:25 DD-WRT user.info root: WireGuard 10.66.66.3/32 added to oet1
Jan 1 03:00:25 DD-WRT user.info root: WireGuard fd42:42:42::3/128 added to oet1
Jan 1 03:00:26 DD-WRT user.info root: WireGuard set /tmp/oet.lock
Jan 1 03:00:26 DD-WRT user.info root: WireGuard Killswitch activated for all clients!
May 9 22:10:37 DD-WRT user.info root: WireGuard tunnel oet1 is the fail over group start tunnel
May 9 22:10:37 DD-WRT user.info root: WireGuard number of non failed tunnels in fail set: 2
May 9 22:10:38 DD-WRT user.info root: Enable WireGuard interface oet1 on port 53849
May 9 22:10:38 DD-WRT user.info root: Establishing WireGuard tunnel with peer endpoint 188.72.209.146:53849
May 9 22:10:38 DD-WRT user.info root: WireGuard setting route for oet1 to endpoint 188.72.209.146:53849 via 192.168.100.1 dev vlan2
May 9 22:10:38 DD-WRT user.info root: WireGuard 10.66.66.3/32 added to oet1
May 9 22:10:38 DD-WRT user.info root: WireGuard fd42:42:42::3/128 added to oet1
May 9 22:10:38 DD-WRT user.info root: WireGuard set /tmp/oet.lock
May 9 22:10:38 DD-WRT user.info root: WireGuard waited 9 seconds to set routes for oet
May 9 22:10:39 DD-WRT user.info root: WireGuard route 0.0.0.0/1 added via oet1
May 9 22:10:39 DD-WRT user.info root: WireGuard route 128.0.0.0/1 added via oet1
May 9 22:10:39 DD-WRT user.info root: WireGuard DNS server 208.67.222.222 routed via oet1
May 9 22:10:39 DD-WRT user.info root: WireGuard DNS server 208.67.220.220 routed via oet1
May 9 22:10:39 DD-WRT user.info root: WireGuard Killswitch activated for all clients!
May 9 22:10:40 DD-WRT user.info root: WireGuard waited 0 sec. for DNSMasq
May 9 22:10:40 DD-WRT user.info root: WireGuard: wireguard-fwatchdog 1 not running yet
May 9 22:10:40 DD-WRT user.info root: WireGuard released /tmp/oet.lock
May 9 22:10:40 DD-WRT user.info root: WireGuard watchdog /usr/bin/wireguard-fwatchdog.sh on tunnel oet1 running
May 9 22:10:40 DD-WRT user.info root: WireGuard waited 1 seconds to set routes for oet
May 9 22:10:40 DD-WRT user.info root: WireGuard route 0.0.0.0/1 added via oet1
May 9 22:10:40 DD-WRT user.info root: WireGuard route 128.0.0.0/1 added via oet1
May 9 22:10:40 DD-WRT user.info root: WireGuard DNS server 208.67.222.222 routed via oet1
May 9 22:10:40 DD-WRT user.info root: WireGuard DNS server 208.67.220.220 routed via oet1
May 9 22:10:41 DD-WRT user.info root: WireGuard waited 0 sec. for DNSMasq
May 9 22:10:41 DD-WRT user.warn root: WireGuard DNS WARNING, already set when running oet1 will overwrite
May 9 22:10:41 DD-WRT user.info root: WireGuard: wireguard-fwatchdog 1 already running will be killed first
May 9 22:10:41 DD-WRT user.info root: WireGuard watchdog /usr/bin/wireguard-fwatchdog.sh on tunnel oet1 running
May 9 22:10:41 DD-WRT user.info root: WireGuard released /tmp/oet.lock
May 9 22:10:41 DD-WRT user.info root: WireGuard waited 1 seconds to set routes for oet
May 9 22:10:41 DD-WRT user.info root: WireGuard route 0.0.0.0/1 added via oet1
May 9 22:10:41 DD-WRT user.info root: WireGuard route 128.0.0.0/1 added via oet1
May 9 22:10:41 DD-WRT user.info root: WireGuard DNS server 208.67.222.222 routed via oet1
May 9 22:10:41 DD-WRT user.info root: WireGuard DNS server 208.67.220.220 routed via oet1
May 9 22:10:43 DD-WRT user.info root: WireGuard waited 0 sec. for DNSMasq
May 9 22:10:43 DD-WRT user.warn root: WireGuard DNS WARNING, already set when running oet1 will overwrite
May 9 22:10:43 DD-WRT user.info root: WireGuard: wireguard-fwatchdog 1 already running will be killed first
May 9 22:10:43 DD-WRT user.info root: WireGuard watchdog /usr/bin/wireguard-fwatchdog.sh on tunnel oet1 running
May 9 22:10:43 DD-WRT user.info root: WireGuard released /tmp/oet.lock
May 9 22:13:37 DD-WRT user.warn root: WireGuard watchdog: oet1 is DOWN, Reboot or Reset of WireGuard is executed
May 9 22:13:37 DD-WRT user.warn root: WireGuard watchdog: oet1 set to fail
May 9 22:13:37 DD-WRT user.info root: WireGuard tunnel oet2 is the fail over group start tunnel
May 9 22:13:37 DD-WRT user.info root: WireGuard number of non failed tunnels in fail set: 1
May 9 22:13:37 DD-WRT user.info root: WireGuard DNS reset
May 9 22:13:37 DD-WRT user.info root: Enable WireGuard interface oet2 on port 49497
May 9 22:13:37 DD-WRT user.info root: Establishing WireGuard tunnel with peer endpoint 185.236.78.11:49497
May 9 22:13:37 DD-WRT user.info root: WireGuard setting route for oet2 to endpoint 185.236.78.11:49497 via 192.168.100.1 dev vlan2
May 9 22:13:37 DD-WRT user.info root: WireGuard 10.66.66.3/32 added to oet2
May 9 22:13:37 DD-WRT user.info root: WireGuard fd42:42:42::3/128 added to oet2
May 9 22:13:37 DD-WRT user.info root: WireGuard watchdog: tunnel restarted
May 9 22:13:37 DD-WRT user.info root: WireGuard set /tmp/oet.lock
May 9 22:13:37 DD-WRT user.info root: WireGuard waited 1 seconds to set routes for oet
May 9 22:13:38 DD-WRT user.info root: WireGuard route 0.0.0.0/1 added via oet2
May 9 22:13:38 DD-WRT user.info root: WireGuard route 128.0.0.0/1 added via oet2
May 9 22:13:38 DD-WRT user.info root: WireGuard DNS server 94.140.14.14 routed via oet2
May 9 22:13:38 DD-WRT user.info root: WireGuard DNS server 94.140.15.15 routed via oet2
May 9 22:13:39 DD-WRT user.info root: WireGuard waited 0 sec. for DNSMasq
May 9 22:13:39 DD-WRT user.info root: WireGuard: wireguard-fwatchdog 2 not running yet
May 9 22:13:39 DD-WRT user.info root: WireGuard released /tmp/oet.lock
May 9 22:13:39 DD-WRT user.info root: WireGuard watchdog /usr/bin/wireguard-fwatchdog.sh on tunnel oet2 running
May 9 22:13:39 DD-WRT user.info root: WireGuard Killswitch activated for all clients!
May 9 22:13:39 DD-WRT user.info root: WireGuard watchdog: firewall restarted
Joined: 18 Mar 2014 Posts: 12837 Location: Netherlands
Posted: Tue May 10, 2022 5:57 Post subject:
About your setup, you are using the same IP address for the tunnels.
That is also a potential source of trouble, usually it will work but depending on setup and router (cores/threads) it is possible that one thread is still taking the tunnel and routes down while the other is already trying to startup the next tunnel (WireGuard is multithreaded and some processes on the router run forked/asynchronously to speed things up).
Like I said it should work but to avoid possible conflicts it is advised to use unique IP addresses for the tunnels.
But if it works for you just keep it like the way it is
Now about port forwarding and VPN clients (as I do not now exactly what and how you port forwarded I will discuss this in general, if this does not give you enough pointers to solve the problem then I need to know the specific rules/settings)
You are using a VPN client with kill switch enabled.
the kill switch will stop any traffic from LAN to WAN thus effectively making it impossible to port forward to a LAN client.
For that you have to disable the Kill switch in the GUI and do it manually as described in the Client setup guide
Another problem why you cannot use the WAN for port forwarding or accessing services on the router (SSH, administration or a VPN *Server*) is that you route everything via the VPN.
Packets arriving at he WAN port will be routed back via the VPN and the firewall will not allow that.
So you have to free up the WAN port from the VPN using Policy Based routing.
You can use Source routing to route all local IP addresses via the VPN but not the router itself or destination based routing e.g. for a specific port.
I am also running an OpenVPN server on port 1194 and have Destination routing: "Routed selected destinations via the WAN" and simply added: "port 1194" in the "Destination for PBR" field.
The first problem was with the DNS on the WG server, after changing the DNS, my services began to work (with the settings as I attached up)
But it turned out that ssh access is blocked only for IP tunnels from the list! (any other IP works)
This is solved, for example, you can connect to the current connected tunnel via internal IP (10.66.66.1 and so on). But through the external IP does not allow you to connect to the second tunnel or so on.
And it is possible to catch this problem only after the WG switches over all the tunnels in the list.
When you first start the router or after rebooting, connecting ssh to all tunnels is possible except for the first one.
Decided to describe this problem. Since I think that this is still a small switching error.
I apologize if I'm not describing the problem well. I'm trying
I will try again
set up 2 tunnels (as described up)
from the internal network of the router, can I connect to any server (which is on the Internet and has its own external IP) on the standard port (22) via ssh with these settings?
I think yes!?
If one of the tunnels goes down, the WG switches to the second (everything is fine, as it should be). But after that, I cannot connect from the internal network to the server via ssh on which these 2 tunnels are configured. Specifically to these two servers.
This is not a critical issue! Rather for your own understanding
Joined: 18 Mar 2014 Posts: 12837 Location: Netherlands
Posted: Wed May 11, 2022 6:03 Post subject:
Ok I am starting to get some idea of what you are doing and what could be happening.
From either tunnel you should be able to reach anything on the internet that should be no problem (provided your server is routing the traffic, as traffic goes from client to server and then out onto the internet and back).
But if your servers are running on the same platform and when logged into one server it is possible to reach the other server internally you might have a routing problem.
When a tunnel is made a route is made to the endpoint via the WAN (necessary of course otherwise it would not work)
When a tunnel goes down and the next one in line is started only minimal changes are made so no complete reset of routing and firewall to not disturb other traffic on the router.
The route to the endpoint of the failing server is thus not removed (not impossible to do but actually not necessary)
So in your case when the first tunnel is down and the second one has kicked in there is still a route via the WAN to the first server.
So yes you can reach your first server but not via the tunnel but via the WAN.
If your server is not reachable via the WAN but only internally via the second server than that would explain your problem.
Not sure how you test but if it is by blocking WAN access of the first server then that would explain it
(I test by blocking the servers IP address on the client with:
iptables -I OUTPUT -d <server-ip-address> -j REJECT)
You can test this by removing the route via the WAN to the first server if the first server is down.
From the CLI interface (telnet/Putty) of the client first check the routing table with:
ip route show
You should see the route of the first server and you can remove it with:
ip route del 188.72.209.146
When a tunnel goes down and the next one in line is started only minimal changes are made so no complete reset of routing and firewall to not disturb other traffic on the router.
And a simple proof of this:
You are right if delete the route (ip route del 188.72.209.146) - then everything starts working as it should.
Again, this is not a serious problem.
But I have used OpenVPN before (also on and off my servers). And earlier on old builds (I use your killswitch and watchdog script) and on new builds (with built-in killswitch and watchdog) - everywhere switching between servers did not give such problems with routing.
Maybe it makes sense in the future to make a larger routing reset for WG tunnels (if possible)?
Thanks for your detailed answers, they always help!
Today, I imported a 3rd configuration for a Wireguard/Windscribe fail-over group member and noticed the Local Public Key was blank and not set as it was in the first two imports.
I suspect this is an error.
So, I manually added:
nvram set oet3_public="xxLocalPublicKeyxx"
nvram commit _________________ Current: Netgear R9000 DD-WRT v3.0-r55363 std (03/13/24)
Retired: Linksys WRT32X r39296, TP-Link Archer C7 v2, LinkSys WRT54G v5
Joined: 18 Mar 2014 Posts: 12837 Location: Netherlands
Posted: Thu Apr 13, 2023 6:15 Post subject:
Totally unnecessary and even wrong.
Have a look in the conf file you are uploading, there is no Local Public key.
Because the Local Public key is calculated from the Private Key, but that can take a couple of seconds so is not always visible the first time you are looking at the GUI.
Update: Was lucky. It worked because the private keys were the same. _________________ Current: Netgear R9000 DD-WRT v3.0-r55363 std (03/13/24)
Retired: Linksys WRT32X r39296, TP-Link Archer C7 v2, LinkSys WRT54G v5