SIP phone extensions over dd-wrt/openvpn revisited

Post new topic   Reply to topic    DD-WRT Forum Index -> Advanced Networking
Author Message
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Fri Jul 02, 2021 6:01    Post subject: SIP phone extensions over dd-wrt/openvpn revisited Reply with quote
This all originally started in thread
https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=329037
but you don't have to read that.

Instead, here is the setup and problem:

At the main net is the SIP pbx. It has a Netgear AC1450 running version r46974 and is acting as the OpenVPN server

At the remote net 80 miles away are the SIP phones. It has a Netgear R6300v2 running r46788 and acts as the OpenVPN client. The SIP phones register to the server over a site-to-site OpenVPN vpn.

I set this link up a month ago using r46788 on both ends and it was solid for a week. Then I tried loading a newer firmware revision on the R6300v2 and the SIP phones would NOT remain registered. I tried factory resetting, screwing with OpenVPN parameters, the works. They would NOT remain registered for more than an hour even with the exact same config on the remote even trying different firmware.

After a couple days of this I reverted the R6300v2 back to r46788. The phones have been rock solid since. Even with changing OpenVPN parameters to use TCP instead of UDP and other experiments.

It does not seem to affect the phones if I firmware update the OpenVPN server end of the link. But if I try any dd-wrt version newer than r46788 on the remote, I get problems.

I've looked through the SVN changes and I cannot figure out what in the world was changed after 46788 that could possibly have anything to do with this. What is so special about r46788?

It's probably worth mentioning that if I backrev from R46788 the phones seem to work for a while but if I go many months backwards they become problematic.


Last edited by tedm on Thu Jul 22, 2021 7:03; edited 1 time in total
Sponsor
egc
DD-WRT Guru


Joined: 18 Mar 2014
Posts: 12835
Location: Netherlands

PostPosted: Fri Jul 02, 2021 7:07    Post subject: Reply with quote
So 46788 works and the next builds e.g. 46816 or 48636 do not.

The commits indeed show nothing about OpenVPN, a lot about the vlan/ports and SFE/CTF

Are you using SFE or CTF if so could it be related to that?

I have seen strange latency problems with SFE

A recent OpenVPN change was 46798: https://svn.dd-wrt.com/changeset/46978, upgrade to OpenVPN 2.5.3 but neither your server or client are using that?

_________________
Routers:Netgear R7000, R6400v1, R6400v2, EA6900 (XvortexCFE), E2000, E1200v1, WRT54GS v1.
Install guide R6400v2, R6700v3,XR300:https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=316399
Install guide R7800/XR500: https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=320614
Forum Guide Lines (important read):https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=324087
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Sat Jul 03, 2021 23:07    Post subject: Reply with quote
yes on SFE no on CTF or FA

I'm at the main end right now so it would be fairly easy to try running the latest version with OpenVPN 2.5.3 and see if that screws anything up. But the main end is ONLY acting as an OpenVPN server, it is not acting as an Internet router - no other traffic goes through it than the VPN traffic.

The remote end is acting as both a VPN client -and- an Internet router.

Unfortunately it seems that just when all the OpenVPN black hole nonsense was fixed with the change to the tun MTU default, now it's time to break things with SFE. Sigh.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Fri Jul 09, 2021 17:54    Post subject: Reply with quote
I loaded build r47000 (6/28/21) on my server end of the VPN a week ago and the phones through the VPN have been stable since. I did need to make a few changes to the VPN - both Encryption Algorithm were changed from "None" to "not set" since otherwise OpenVPN filled up the logs with "link not encrypted" nonsense, and the tunnel protocol was changed from tcp to udp. First data cipher is AES-128-CGM. It would be interesting to know if AES-128 is faster or slower than CHACHAPOLY as I've figured out that the BCM4708 CPU used in the Netgear AC1450 is just a tad bit too old to have the special AES instructions in it so there's no benefit to running AES

It also appears the new OpenVPN version enforces that tl key renegotiation every hour. So I bumped the TLS renegotiation with

reneg-sec 28800

since 1 hour on a key renegotiation is too often - too much chance for interrupting a phone call. Besides I doubt I'll be able to send 256 exabytes of data over this link in less than 8 hours. (to where a birthday attack would be feasible) /s

So it does appear that the issue is centered on the Netgear R6300v2 running r46788 and acting as an openvpn client, since updating that one beyond that version triggered the dropping. I have another Netgear router with a faster 1Gb CPU and I'm thinking it might be worth testing that at the remote and seeing if that would allow me to run current builds on it.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Sat Jul 10, 2021 8:25    Post subject: Reply with quote
Hmmmm...setting the key renegotiation time period to 8 hours was not good. Started losing the phone registrations and the openvpn log on the server side started showing "bad source address from client [172.16.1.16] packet dropped" error messages from the PBX.

So now I finally think I have a working theory of what is actually going on - at least this is my most current working theory. (it will do until something better comes along)

When the phones register in they are initiating the registration on the client side of the vpn. That causes the client OpenVPN to send a host route to the server VPN. The server OpenVPN then adds it with the message:

house/1.2.3.4:3xxx10 MULTI: Learn: 172.16.100.122 -> house/1.2.3.4:3xxx10

It does that for any device that sends a packet through the VPN from the client side. That creates the "back channel" for the packets going from the pbx back to the phones.

But for reasons unknown this entry fades out over time. So now the PBX is sending SIP keepalives back to the extensions and they are getting dropped by OpenVPN

The TLS reconfiguration resets all of these entries keeping everything fresh.

What is unknown about this is why does it only seem to affect SIP packets? If I ping the phones or telnet into the phones from the network behind the server, it works.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue Jul 13, 2021 14:08    Post subject: Reply with quote
Further info on this as the phones started losing registration again. I reverted back to 46974 on the server end but I don't think that is the key.

I think there are several things that have to be setup perfectly for the phone registrations to stay up:

1) If the phone is using SIP-over-UDP then the VPN must use TCP transport. However if the phone is using SIP-over-TCP then the VPN can use either the udp or tcp transport. Some phones (like the Cisco 7940) can only do SIP-over-UDP so if you are using those you have to run openvpn over tcp transport.

This is because the underlying public Internet has too much loss for regular SIP over UDP. Loss on the Internet is not a regular amount, some days it's higher than others. And it is disguised because all public routers on the Internet are configured to prioritize ICMP packets over TCP packets and TCP packets over UDP packets - it's part of the TCP/IP standard. So you can have maybe 10-20% loss on UDP and if you try viewing loss with ping (which uses ICMP) you will see nothing, or an app like FTP (which uses TCP) you also will see nothing. Yet during that there will be loss on UDP traffic. SIP over UDP is not tolerant of loss, at least not very much.

It was maddening to figure this out because some days there IS no loss on a UDP path. If you configure the VPN on one of those days then SIP-over-UDP works perfectly over OpenVPN-over-udp.

2) If using UDP for SIP from the phone, there can be no black holes with different packet sizes because UDP has no means of path discovery.

This was covered extensively in the prior thread referenced at the beginning of this thread

3) OpenVPN Server installs individual host routes for hosts on the other end of the link. It does this when it first gets a packet from one of those individual hosts from the OpenVPN client. However the host routes time out and are expired on the server end if there are no TCP connections through them. So if the PBX sends a UDP keepalive to the phone on the client end, if the host route is gone then OpenVPN server and OpenVPN client take time discussing it and reestablishing the route. This is OK for SIP over TCP because the TCP stack on the PBX will just send a retry. But if it's UDP then the keepalive packet will be lost and the phone registration can fail out

The fix to this is any phone using SIP-over-UDP to statically number it at the client end. Then install a host route (mask 255.255.255.255) in the openvpn server configuration for that number

4) For SIP-over-UDP, Registration expiration timers appear to need to be put way down from the default of 3600. Near as I can tell this is because dd-wrt's linux core IP Filter settings seem to have UDP timeouts set lower and there seems to be an interaction with OpenVPN even though we are not running NAT in this configuration. Note that I am not 100% positive about this and will be doing further experimentation on it.

To do: more testing with newer dd-wrt/openvpn versions on the server side, more testing with UDP sip timers.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Thu Jul 29, 2021 5:42    Post subject: Reply with quote
I thought I would post a follow-up on this.

I'm now running 47040 on the AC1450 on the server end, and am still running 46788 on the R6300v2 router at the client end that's also serving as the main router for that site.

I enabled CTF on the server and client instead of SFE and on the client end I also turned on FA (flow acceleration) and the phone registrations have been very stable.

I discovered that at the client end that if the router was heavily used the phones would drop registration. turning on FA and CTF fixed that.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Sun Aug 29, 2021 19:40    Post subject: Reply with quote
Further update on this:

Updated to r47117 a month ago on the server end of the OpenVPN tunnel and the SIP registrations on the phones have remained completely stable

Updated to r47206 at the remote client end a week ago and the SIP registrations on the phones have continued to remained stable

I suppose it's time for a wiki article. Fundamentally SIP is timing sensitive at a millisecond level and to be able to reduce packet delays to where they won't interfere with it you need a fast CPU in the router.
kyanox
DD-WRT Novice


Joined: 17 Sep 2006
Posts: 30

PostPosted: Mon Sep 20, 2021 16:33    Post subject: Reply with quote
One thing I recommend if you can, switch from openvpn to SoftEther. WAY better. I had tons of issues with openvpn, and softether basically just works WAY better.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue Oct 19, 2021 6:05    Post subject: Reply with quote
now on r47528 on the openvpn server side and the phones are still stable.
Display posts from previous:    Page 1 of 1
Post new topic   Reply to topic    DD-WRT Forum Index -> Advanced Networking All times are GMT

Navigation

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum