OpenVPN defect, bug, on MTU handling - you decide

Post new topic   Reply to topic    DD-WRT Forum Index -> Advanced Networking
Goto page Previous  1, 2, 3  Next
Author Message
egc
DD-WRT Guru


Joined: 18 Mar 2014
Posts: 12834
Location: Netherlands

PostPosted: Mon May 10, 2021 6:59    Post subject: Reply with quote
The kernel is tied to Broadcom platform as their drivers are closed source so there is no K4.9 for Broadcom, most other third party Broadcom firmwares are still using K2.6.

Atheros/Qualcomm is Open source and DDWRT is currently on K4.9 for Qualcomm/Atheros (OpenWRT is already using K5.4 for Qualcomm/Atheros).

So the testbed I am now setting up consists of two Qualcom/Atheros routers the venerable Netgear R7800 (dual core Arm A15 1725 MHz with two little cores which runs circles around an R7000) and its little brother Linksys EA8500.
Those are both running 46446 which uses K4.9 will upgrade later.

I am already in contact with BS about this and he is also concerned and considers lowering the MTU from its default 1500 for client and server setup.

WireGuard is much easier in this respect they tell you to take the routers MTU subtract 28 for UDP and IPv4 and 32 for the WG encryption so the WG MTU is 1440 (assuming MTU of 1500 and IPv4 only).
But of course that is possible because it has fixed encryption (always Chacha-Poly and is only UDP)
(Of course that does not exclude all MTU problems if something is borked along the way and I have seen that)

_________________
Routers:Netgear R7000, R6400v1, R6400v2, EA6900 (XvortexCFE), E2000, E1200v1, WRT54GS v1.
Install guide R6400v2, R6700v3,XR300:https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=316399
Install guide R7800/XR500: https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=320614
Forum Guide Lines (important read):https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=324087
Sponsor
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Mon May 10, 2021 9:01    Post subject: Reply with quote
OK here are the results of my tun mtu testing. This is actually on K4.4 on both sides of the link.

I'm setting the tun size on both sides to the same value

For each value I set it to, the maximum size of packet appears to be n-28 when travelling from the client network to the server network up to a max tun mtu size of 1447.

So if I set tun = 1447 then the max packet I can get though the link is 1419

If I set tun = 1446 then max packet is 1418, tun=1445 max packet is 1417, and so on

SOMETIMES I will get a message back saying "Frag needed and DF set" for the packet that is the max size for the link but then if I send that size packet again, it will pass and I won't get that message. Any packet I send above the max packet a message comes back saying it's too large.

If I set tun=1448, then I start getting a black hole all the way up to a packet size of 1500 and then at 1501 and above it will respond back with a packet too large.

From a machine on the server network sending to the client network, the max packet size is n-20 where n is the tun size. What is interesting though is that any packet above that I'll get a message back saying "frag needed and DF set" all the way up to a packet size of 1472. Packets 1473 and larger I just get a message back saying Message too long" This is sending from a FreeBSD machine which has a much better ping - I can sweep sizes with it for example:

ping -G 1500 -g 1350 -D 172.16.100.2

that sends a ping and then increments the size by one and sends another up to the max size you want to check without the messiness of a do loop.

ALSO:

I went ahead and ran some MTU tests from my subnet (on Comcast) to my subnet (on Spectrum) Max MTU on that with ping is 1472 above that I get an error message from the Linux kernel that the packet is too large and the max MTU is 1500. Going the other direction I get the same message that is it is symmetrical. So, I am getting a full 1500 MTU over the Internet. (MTU minus 20 bytes IP header and 8 bytes for the ICMP header).

Here's a partial trace of that path so you can see how many Internet routers it is going though (and all of them are passing the 1500 byte packet without trouble)

2 10.102.46.1 (10.102.46.1) 9.863 ms 9.238 ms 15.283 ms
3 096-034-059-006.biz.spectrum.com (96.34.59.6) 9.352 ms 12.596 ms 13.913 ms
4 dtr01lnbhwa-tge-0-0-0-0.lnbh.wa.charter.com (96.34.111.208) 14.642 ms 12.245 ms 16.457 ms
5 096-034-108-175.biz.spectrum.com (96.34.108.175) 22.682 ms 17.680 ms 24.319 ms
6 096-034-108-160.biz.spectrum.com (96.34.108.160) 31.368 ms 19.717 ms 28.289 ms
7 bbr02sttlwa-bue-4.sttl.wa.charter.com (96.34.0.56) 29.757 ms 38.428 ms 32.213 ms
8 prr01sttlwa-bue-2.sttl.wa.charter.com (96.34.3.39) 27.559 ms 27.438 ms 31.503 ms
9 be-204-pe02.seattle.wa.ibone.comcast.net (23.30.207.69) 25.637 ms 30.784 ms 25.898 ms
10 be-10847-cr01.seattle.wa.ibone.comcast.net (68.86.86.225) 26.738 ms 27.893 ms 34.039 ms
11 ae-72-ar01.troutdale.or.bverton.comcast.net (68.86.92.218) 30.856 ms 29.636 ms 30.828 ms
12 ae-1-rur202.troutdale.or.bverton.comcast.net (68.87.222.250) 35.167 ms 31.135 ms 30.733 ms
13 po-2-1-cbr05.troutdale.or.bverton.comcast.net (162.151.214.122) 30.165 ms 29.756 ms 29.872 ms
14 c-71-238-89-173.hsd1.or.comcast.net (71.238.89.173) 41.514 ms 44.128 ms 50.921 ms
15 *^C
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Mon May 10, 2021 10:04    Post subject: Reply with quote
Well, it so happens that I had a D-Link DIR 825 rev B router here. I just finished flashing it with the latest r46604 code from yesterday and guess what - the bug is still present. And that is an Atheros AR7161 with K3.10 kernel.

And, one last test - I put the original K2.6 WNDR4000 back and ran ping tests back and forth over the VPN. From a host on the main network pinging a host on the remote network I can get a MTU of 1480 before it starts rejecting the packet as too large, from a host on the remote network pinging a host on the main network I can get a max MTU of 1472 that direction.

Let me know if you want me to test an older openVPN version on the K3 kernel.
egc
DD-WRT Guru


Joined: 18 Mar 2014
Posts: 12834
Location: Netherlands

PostPosted: Mon May 10, 2021 12:39    Post subject: Reply with quote
Fired up R7800 with K4.9

First connected to keepsolid VPN just test maximum unfragmented ping.

The maximum ping packetsize is 1342.
and as with K4.4 no messages (packet size 1473 got the right message as my routers MTU is 1500)
So for this connection to this VPN provider tun-MTU of 1370 was working ,packetsize of 1342 got through 1343 I got the message.

Bottomline K4.9 is behaving just as bad as 3.10 and 4.4 only K2.6 seems to behave better according to your testing.

The "standard" tests (speedtest, packetlosstest, icmp blackhole test, watching streaming media ) did not show any difference between a tun-mtu of 1500 and 1370.

My VPN proivder also has WireGuard So fired up WireGuard to the same provider, the servers are located at the same spot in England so the path is the same (I checked)
WG tun is 1440 and packets of 1412 get through and 1413 gets the message so all fine and dandy.

But what if I set the wg tun to 1500?
Guess what exactly the same 1412 OK, 1413 fragmented message, so no blackhole here.
So it seems it is not something in the kernel as WG seems to deal with it nicely.

Will test K4.9 OpenVPN client/server later.

Still not sure what would be the best value to use as OpenVPN default, your proposed value of 1430 might even be too high?

_________________
Routers:Netgear R7000, R6400v1, R6400v2, EA6900 (XvortexCFE), E2000, E1200v1, WRT54GS v1.
Install guide R6400v2, R6700v3,XR300:https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=316399
Install guide R7800/XR500: https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=320614
Forum Guide Lines (important read):https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=324087
egc
DD-WRT Guru


Joined: 18 Mar 2014
Posts: 12834
Location: Netherlands

PostPosted: Mon May 10, 2021 12:46    Post subject: Reply with quote
tedm wrote:
Well, it so happens that I had a D-Link DIR 825 rev B router here. I just finished flashing it with the latest r46604 code from yesterday and guess what - the bug is still present. And that is an Atheros AR7161 with K3.10 kernel.

And, one last test - I put the original K2.6 WNDR4000 back and ran ping tests back and forth over the VPN. From a host on the main network pinging a host on the remote network I can get a MTU of 1480 before it starts rejecting the packet as too large, from a host on the remote network pinging a host on the main network I can get a max MTU of 1472 that direction.

Let me know if you want me to test an older openVPN version on the K3 kernel.


If you can spare the time you could test OpenVPN version 2.4

We moved to 2.5 in build 44553 so if you use a build before that it should have OpenVPN 2.4

_________________
Routers:Netgear R7000, R6400v1, R6400v2, EA6900 (XvortexCFE), E2000, E1200v1, WRT54GS v1.
Install guide R6400v2, R6700v3,XR300:https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=316399
Install guide R7800/XR500: https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=320614
Forum Guide Lines (important read):https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=324087
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue May 11, 2021 6:26    Post subject: Reply with quote
Results of some more testing:

MTU's above 1430 on the tun interface, even though they seem to permit larger packets though (up to a MTU of 1447) on the ping tests, they cause the VoIP extensions at the remote site to drop off.

I also switched the routers at both ends to a Netgear R6300V2 at the remote and a Netgear AC1450 at the main end. Faster dual core CPUs and more ram. And I did this testing on version r46604 5/9/21

And lastly - I am now not entirely sure that the K2.6 kernel does not have an issue. While it is true that replacing the OpenVPN SERVER router with a K2.6 version does allow MTU's up to 1472 AND does not introduce any black holes in MTU's above that up to 1500 MTU, I tried running this version last night without tun restrictions and the phones started dropping registration after an hour or two. I am now leaning towards the version of OpenVPN being the culprit and I'm going to test an older version now.

Here is what I THINK is a better outline of the bug. I am starting to believe that the phones (two Polycom VVX models and a bunch of Cisco 1790's running Cisco's SIP code) are completely ignoring ICMP PTB (Packet Too Big) and Fragmentation Needed error messages. Instead, they are assuming any routers with a restricted MTU are going to fragment and forward packets they send.

So what happens is that when OpenVPN starts it has to look at the MTU of it's outbound interfaces and calculate an actual, real tun MTU. That is of course going to be smaller than 1500 because it needs to make room for the OpenVPN encapsulating packet. OpenVPN is screwing this up and setting too large a tun MTU and when a big packet comes in it fragments it and then just truncates it when it tries putting the frags into the VPN. Since it thinks it successfully sent the packet it does not report back a PTB message which is why we are getting black holes with large packet sizes. When the MTU on tun is set at 1430 or below OpenVPN is still doing this but the frags are small enough to not be truncated and also OpenVPN is sending back PTB messages when a packet comes in that's in that 1400-1500 region. If tun is set in between 1430 and 1447 OpenVPN still sends back the PTB messages but it also truncates the packet.

This is why the ping tests worked on K2.6 and on the tun mtu set to 1447, since there is no error checking on ping and when you tell ping to make the packet larger, all it does is tack on zeros and nothing actually checks the padding at the remote end to see if the echo reply packet was truncated. So the PTB responses when the DF flag is set really do not mean anything, unfortunately, since they are coming from OpenVPN and it's borked idea of an oversize packet.

With the phone UDP packets if the received UDP packets or fragments are truncated, the checksum does not match and the PBX in this case is just assuming the packets are errors and ignoring them.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue May 11, 2021 8:52    Post subject: Reply with quote
Results on testing OpenVPN 2.4

Main site: Firmware: DD-WRT v3.0-r44483 mega (10/02/20) Netgear WNDR4000

Client site: Firmware: DD-WRT v3.0-r46604 std (05/09/21)

Something is seriously wrong with this combo. I tried every combination of ciphers, tun MTU's, etc. While the VPN would come up I could not get any packets larger than 80 bytes (yes, eighty) through the link and everything above 80 bytes was black holed.
SurprisedItWorks
DD-WRT Guru


Joined: 04 Aug 2018
Posts: 1444
Location: Appalachian mountains, USA

PostPosted: Tue May 11, 2021 17:14    Post subject: Reply with quote
You folks are doing nicely with the testing approach, so to complement that, here is a backgrounder based on a day with the OpenVPN 2.5 man page (https://build.openvpn.net/man/openvpn-2.5/openvpn.8.html) and numbers from my OpenVPN client log in dd-wrt. This approach does yield the fastest speed tests through the OpenVPN client that I've seen here. FWIW, I'm running a Linksys WRT1900ACSv2 with build 46069. Here goes.

The OpenVPN client using the UDP protocol has two absolutely key packet-size numbers and two others that are also important. The man page explains that tun-mtu is the maximum packet size at the point where your application's packets enter the OpenVPN tunnel to be encrypted and further packaged into UDP packets to send to the server. We set this number in the GUI as just MTU, and we can do ifconfig tun1 (or whatever your tunnel interface is called) in the CLI to see that number installed into the running system.

There is another, larger number, link-mtu, that is like MTU but also including the encryption overhead. The man page makes clear it does NOT include the overhead for the final outer packaging into UDP packets. That final step will add 20 bytes for UDP overhead, assuming IPv4 transmission, and 8 additional bytes for IP-protocol overhad. This final packet needs to fit in your WAN channel, which in the standard ethernet case handles packets of up to 1500 bytes. I'll assume that 1500 number below. Adapt if yours is different.

In principle you can size your packets correctly in two very different ways. The first is to set MTU to 1500 as the man page recommends and as dd-wrt defaults, even though this obviously results in an overlarge packet on the WAN if nothing further is done. The idea is to then have OpenVPN fragment application UDP packets correctly to a smaller size using as guidance the OpenVPN fragment parameter set by the GUI number for "Tunnel UDP Fragment." The man page states that this fragment number should include encryption overhead but not the final overhead for packaging into UDP packets for transmission. In other wordş this number should be set to what would be the correct link-mtu value, here 1500 - 28 = 1472.

The problem with using this approach in dd-wrt is this in the OpenVPN client log:

Fails: FRAG_IN error flags=0xfa2a187b: FRAG_TEST not implemented

Many dd-wrt users then disable the fragmentation option by not providing a value, but the underlying linux IP stack is only going to fragment based on the specified MTU of 1500, which is simply wrong. You can hope your applications using UDP are conservative in packet sizing so that few packets will be lost due to being overlarge, but this is not an ideal solution.

The alternative approach is to set MTU correctly in the first place. Set an above-default level of logging by including verb 4 (or higher) in your Additional Config, so that when you start up the OpenVPN client (successfully) with a default MTU of 1500, you can look in the log a ways AFTER the "Peer Connection Initiated with..." line and find another line like

Data Channel MTU parms [ L:XXX D:YYY ... ]

The first, XXX number is link-mtu as calculated internally from the basic tun-mtu number. If XXX is 1553 and MTU is 1500, this is telling you that 53 bytes are required for encryption overhead. This overhead number depends on what encryption cipher is used. I calculate 53 bytes for AES-256-GCM but only 38 bytes for CHACHA20-POLY1305. Once you have this difference number, subtract it from 1472 (itself 1500 - 28 ) to obtain the MTU setting you should actually use, the one that will result in the desired link-mtu = 1472. For my CHACHA20-POLY1305 setup to AirVPN, I set my MTU in the GUI to 1500 - 28 - 38 = 1434 bytes. With that MTU setting, the log shows XXX = 1472.

I do see residual log issues with this approach. There are warnings that tun-mtu and link-mtu are used inconsistently between server and client. I believe the peers negotiate the packet sizes actually used such that both of their limits are satisfied, so this is not critical. Remember M in MTU stands for "Maximum."

More mysterious is that I also see a few messages, shortly after connection, like "tun packet too large on write (tried=1436,max=1434)." Those messages do not change at all if I lower mssfix (see below), and if I lower MTU, the messages appear with new numbers, the second one being the new MTU value. The gap may change from 2 bytes. I have no idea where these messages are coming from. Could be some application specific to my setup here. I'll come back and redo this paragraph in blue if I get it figured out.

In addition to XXX, the log line above also shows a YYY number. This is the current mssfix value. You can choose YYY and set it in Additional Config with mssfix YYY. This value is communicated upstream to applications to advise them to limit TCP packet sizes to what will survive encapsulation and transmission. If you leave MTU or the mssfix YYY value too large, you may obtain a connection that slows severely or, more likely, simply hangs now and then.

To me the man page appears ambiguous on the precise interpretation of the mssfix value. It may be that in mssfix YYY, the YYY value should be identical to a properly computed link-mtu value, in our example 1500 - 28 = 1472. Or it may be that YYY should be tun-mtu less 28 bytes for UDP/IP overhead of the application packet. For my CHACHA20-POLY1305 example tun-mtu value of 1434, this YYY value would be 1406. I think the man-page wording leans towards the first interpretation, but my ping tests lean towards the latter one, so I am using 1406.

Thank you for your patience with this digression.

_________________
2x Netgear XR500 and 3x Linksys WRT1900ACSv2 on 53544: VLANs, VAPs, NAS, station mode, OpenVPN client (AirVPN), wireguard server (AirVPN port forward) and clients (AzireVPN, AirVPN, private), 3 DNSCrypt providers via VPN.


Last edited by SurprisedItWorks on Tue May 11, 2021 18:26; edited 2 times in total
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue May 11, 2021 17:22    Post subject: Reply with quote
Results of more testing

I put a dlink 825 version B in service over night as the OpenVPN server, same version as the Netgear r46604 same config with the restricted MTU, slower CPU and less ram. Everything came up fine but over the next 7 hours most of the phone extensions gradually lost registration. This morning only the remote Polycom was still connected the Ciscos were not.

Switched back to the Netgear AC1450 (did not reboot the remote end) all phones immediately registered and came up. I'm going to give it another long block of time to see what they do.

It looks like the VoIP data is sensitive to CPU power as well on the routers. Perhaps the 825 was too slow and dropping packets?
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue May 11, 2021 17:45    Post subject: Reply with quote
SurprisedItWorks wrote:
lt MTU of 1500, look in the log AFTER the "Peer Connection Initiated with..." line for a line like

Data Channel MTU parms [ L:XXX D:YYY ... ]

The first, XXX number is link-mtu as calculated internally from the basic tun-mtu number. If XXX is 1553 and MTU is 1500, this is telling you that 53 bytes are required for encryption overhead.


The problem I have with the r46604 code is in the log I don't get the [ L:XXX D:YYY ... ] printout. Here's what I get:

20210511 10:11:05 I [server] Peer Connection Initiated with [AF_INET]50.18.10.13:1194
20210511 10:11:05 PUSH: Received control message: 'PUSH_REPLY route 172.16.254.0 255.255.255.0 172.16.20.1 1 route 172.16.1.0 255.255.255.0 172.16.20.1 1 route 192.168.1.0 255.255.255.0 172.16.20.1 1 route-gateway 172.16.20.1 topology subnet ping 10 ping-restart 120 ifconfig 172.16.20.2 255.255.255.0 peer-id 0 cipher AES-128-GCM'
20210511 10:11:05 OPTIONS IMPORT: timers and/or timeouts modified
20210511 10:11:05 OPTIONS IMPORT: --ifconfig/up options modified
20210511 10:11:05 NOTE: --mute triggered...
20210511 10:11:05 5 variation(s) on previous 3 message(s) suppressed by --mute
20210511 10:11:05 Outgoing Data Channel: Cipher 'AES-128-GCM' initialized with 128 bit key
20210511 10:11:05 Incoming Data Channel: Cipher 'AES-128-GCM' initialized with 128 bit key
20210511 10:11:05 net_route_v4_best_gw query: dst 0.0.0.0
20210511 10:11:05 net_route_v4_best_gw result: via 68.15.1.17 dev vlan2
20210511 10:11:05 I TUN/TAP device tun1 opened
20210511 10:11:05 I net_iface_mtu_set: mtu 1430 for tun1
20210511 10:11:05 I net_iface_up: set tun1 up
20210511 10:11:05 I net_addr_v4_add: 172.16.20.2/24 dev tun1
20210511 10:11:05 net_route_v4_add: 172.16.254.0/24 via 172.16.20.1 dev [NULL] table 0 metric 1
20210511 10:11:05 net_route_v4_add: 172.16.1.0/24 via 172.16.20.1 dev [NULL] table 0 metric 1
20210511 10:11:05 net_route_v4_add: 192.168.1.0/24 via 172.16.20.1 dev [NULL] table 0 metric 1
20210511 10:11:05 I Initialization Sequence Completed
20210511 10:13:10 MANAGEMENT: Client connected from [AF_INET]127.0.0.1:16
20210511 10:13:10 D MANAGEMENT: CMD 'state'
20210511 10:13:10 MANAGEMENT: Client disconnected
20210511 10:13:10 MANAGEMENT: Client connected from [AF_INET]127.0.0.1:16
20210511 10:13:10 D MANAGEMENT: CMD 'state'
20210511 10:13:10 MANAGEMENT: Client disconnected
20210511 10:13:10 MANAGEMENT: Client connected from [AF_INET]127.0.0.1:16
20210511 10:13:10 D MANAGEMENT: CMD 'state'
(the disconnected/connected lines just repeat over and over that is normal I think)

I'll try later with MTU 1500 and see if that causes the line to spit out since your explanation would indicate the MTU of 1430 is still too high for the cipher I'm using.
SurprisedItWorks
DD-WRT Guru


Joined: 04 Aug 2018
Posts: 1444
Location: Appalachian mountains, USA

PostPosted: Tue May 11, 2021 18:08    Post subject: Reply with quote
I forgot to mention that I use verb 4 in Additional Config to get more log detail! I fix the post shortly.
_________________
2x Netgear XR500 and 3x Linksys WRT1900ACSv2 on 53544: VLANs, VAPs, NAS, station mode, OpenVPN client (AirVPN), wireguard server (AirVPN port forward) and clients (AzireVPN, AirVPN, private), 3 DNSCrypt providers via VPN.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Tue May 11, 2021 19:34    Post subject: Reply with quote
I think you may have hit it. Barely an hour after switching back to the AC1450 router my extensions were dropping off. I then set tun MTU to 1419 a couple hours ago and they have remained stable since.

I just set verb 4 on both sides and this is what I get on the server

20210511 12:01:19 W 68.185.12.178:33025 WARNING: normally if you use --mssfix and/or --fragment you should also set --tun-mtu 1500 (currently it is 1419)
20210511 12:01:19 68.185.12.178:33025 Control Channel MTU parms [ L:1541 D:1212 EF:38 EB:0 ET:0 EL:3 ]
20210511 12:01:19 68.185.12.178:33025 Data Channel MTU parms [ L:1541 D:1450 EF:122 EB:392 ET:0 EL:3 ]
20210511 12:01:19 68.185.12.178:33025 Local Options String (VER=V4): 'V4 dev-type tun link-mtu 1469 tun-mtu 1419 proto UDPv4 comp-lzo cipher AES-128-GCM auth [null-digest] keysize 128 key-method 2 tls-server'

And on the client

I SIGUSR1[soft ping-restart] received process restarting
20210511 12:01:14 Restart pause 5 second(s)
20210511 12:01:19 W WARNING: No server certificate verification method has been enabled. See http://openvpn.net/howto.html#mitm for more info.
20210511 12:01:19 W NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
20210511 12:01:19 W WARNING: normally if you use --mssfix and/or --fragment you should also set --tun-mtu 1500 (currently it is 1419)
20210511 12:01:19 Control Channel MTU parms [ L:1541 D:1212 EF:38 EB:0 ET:0 EL:3 ]
20210511 12:01:19 Data Channel MTU parms [ L:1541 D:1450 EF:122 EB:392 ET:0 EL:3 ]
20210511 12:01:19 Local Options String (VER=V4): 'V4 dev-type tun link-mtu 1469 tun-mtu 1419 proto UDPv4 comp-lzo cipher AES-128-GCM auth [null-digest] keysize 128 key-method 2 tls-client'
20210511 12:01:19 Expected Remote Options String (VER=V4): 'V4 dev-type tun link-mtu 1469 tun-mtu 1419 proto UDPv4 comp-lzo cipher AES-128-GCM auth [null-digest] keysize 128 key-method 2 tls-server'

Now, does L:1541 mean that I need to crank the MTU down from 1419 to 1378 to get the link-mtu to be proper for Ethernet? I tried knocking it down from 1419 to 1418 and L dropped from 1541 to 1540.

What this all is telling me is that OpenVPN has 2 bugs. The first bug is that on the K3 and later kernels it cannot properly obtain the correct MTU for the interfaces in the router so it's internal calculation results for the tun MTU and link MTU are wrong. Hard-coding the tun MTU to what it SHOULD calculate if it did properly obtain the interface MTU works around that bug.

The second bug is that OpenVPN is internally ignoring the DF flag and fragmenting the packet anyway at some point. Is it possible that it is paying attention to the DF header when it gets the packet and then after it encrypts the packet it encrypts the packet in a UDP packet that does NOT have a DF header set? In which case then it's fragmenting the encrypted packet before sending it into the tunnel? That would explain why pings with the DF bit set that are clearly too large for tun MTU are not causing OpenVPN to send back a Packet Too Big icmp response.
SurprisedItWorks
DD-WRT Guru


Joined: 04 Aug 2018
Posts: 1444
Location: Appalachian mountains, USA

PostPosted: Tue May 11, 2021 21:30    Post subject: Reply with quote
You are looking at the wrong "data channel mtu parms" line. Look for the word "AFTER" in my long post. There is this preliminary decoy line that appears before the cipher is decided on, and as far as I can tell, it doesn't matter. Along with it is a control-channel line, and that one might mean more, except that in my case I'd have to seriously reduce MTU to get that one to show a link-mtu of 1472, and I'm not convinced it's necessary. I'm guessing the control packets are naturally small.

Anyway, look for the last "data-channel mtu parms" line, not the first one.

_________________
2x Netgear XR500 and 3x Linksys WRT1900ACSv2 on 53544: VLANs, VAPs, NAS, station mode, OpenVPN client (AirVPN), wireguard server (AirVPN port forward) and clients (AzireVPN, AirVPN, private), 3 DNSCrypt providers via VPN.
tedm
DD-WRT Guru


Joined: 13 Mar 2009
Posts: 554

PostPosted: Wed May 12, 2021 5:45    Post subject: Reply with quote
There is only ONE "data channel mtu parms" line in the entire log and it is what I posted. There is no "data channel mtu parms" line _after_ the peer connection entry in the log on either router.
egc
DD-WRT Guru


Joined: 18 Mar 2014
Posts: 12834
Location: Netherlands

PostPosted: Wed May 12, 2021 7:32    Post subject: Reply with quote
Just for my understanding a recap, correct me if I am wrong:

PMTUD on OpenVPN seems broken therefore the default tun-MTU value of 1500 is not adequately lowered because of a blackhole situation.
So it is better to lower the default tun-mtu value to a more realistic size.
The problem manifests itself when using UDP, packets should not be fragmented so we need the maximum packetsize which does not fragment the packets (maximum because that gives the highest throughput).

In daily usage this does not seem like a problem as most applications factor this in and I have not seen a packetsize over 1350, but applications using UDP (like VoIP, streaming media) which just try to use the mtu do not work as these are fragmented.

Problem is that the tun-mtu is dependent on used encryption/hash (try AES-256-CBC/SHA512 and you know why you should use the newer ciphers which have the hash "included") and compression, even if compression is not used but the compression module is loaded it will cost you a byte.

My testing with a clean path with compression disabled and Chacha-Poly gives me a maximum packetsize of 1420, which translate to a tun-mtu of 1448 (checked this works)

I use DHCP WAN but if we factor in users using PPPoE we have a maximum tun-mtu of 1440, factor in compression used and different encryption then even that is too high.

So how about a tun-mtu value of 1420 for a more realistic tun-mtu default value?

_________________
Routers:Netgear R7000, R6400v1, R6400v2, EA6900 (XvortexCFE), E2000, E1200v1, WRT54GS v1.
Install guide R6400v2, R6700v3,XR300:https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=316399
Install guide R7800/XR500: https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=320614
Forum Guide Lines (important read):https://forum.dd-wrt.com/phpBB2/viewtopic.php?t=324087
Goto page Previous  1, 2, 3  Next Display posts from previous:    Page 2 of 3
Post new topic   Reply to topic    DD-WRT Forum Index -> Advanced Networking All times are GMT

Navigation

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum