Joined: 18 Mar 2014 Posts: 12887 Location: Netherlands
Posted: Sat Feb 18, 2023 13:33 Post subject:
First one step back, I assume you need to port forward through your VPN because there is no WAN access?
Using a WireGuard and OpenVPN Server is the preferred way.
But I did a quick test with port forwarding through my WireGuard client connected to one of my VPN Providers in this case I used Mullvad.
Running build 51741 on a Linksys EA6900.
However this router is heavily patched with more than 20 upgrades among other full WireGuard IPv6 compatibility but that should not matter.
I acquired a port Forward from Mullvad: 56517
My Linux box is listening on port 192.168.13.71:13131
The tunnel to Mullvad is oet4
First one step back, I assume you need to port forward through your VPN because there is no WAN access?
NO. That is NOT correct. There is WAN access. WAN access has never been the problem.
The WAN access on all 4 tunnels (2 in each site) is WORKING PERFECTLY, and has never been part of this problems.
I will try clarify. This problem is very specific. It ONLY is for NEW sessions, incoming packets, that do arrive across the wg tunnel. Repeat - the packets do arrive, even for this specific problem case. For example port 8011 is used to host a webserver. So from <out there on internet> someone types into their Browser: "http://<my dedicated ip address>:8011"
In that example, what do I see ? In the R7000 site , the webpage is served to them. In the Linksys site, the packets ARRIVE. And the packets hit the "-t nat -I PREROUTING" firewall rule (its counter goes up). However, even has SAME configuration, the observed behavior is DIFFERENT in the Linksys site... In Linksys site the "FORWARD" rule never gets hit. It does get hit in the R7000 site. Obviously - the <my-dedicated-ip-address> is different for each site.
I will closely read the other ideas in your post (after I fill my body with coffee). Thanks again for replying !
I am suspicious about the TIMING that the rules in "Firewall" get executed.
Almost the last iptables rule I have in there was a "D 20", ie delete line 20. It was running fine until the upgrade to 51741. Then, it still runs, but finds nothing to delete.... when I later look at INPUT, it shows 23 lines, and the rule that was successfully being deleting in r51306, still there.
@SurpriseditWorks explained to me , that it could be about the timing of the execution at boot up. Meaning that some commands stored in "Firewall" execute too fast/soon. They execute when some of the other iptables rules, generated by the system, have not yet been put into place. He fixes it by using manual scripts, instead of putting everything in Firewall-commands, that way he can delay execution until he knows everything is up already....
I just wanting to get my Linksys working in the same manner as my R7000 (albeit at a different site) .
Your test with a Linksys EA6900 is not comparable.
I believe that has a Broadcom processor, as is my r7000. And the vlan definition is critical to this problem. The whole point is that the vlans had to be created/defined differently in the routers with Marvell processors, than the r7000.
I do not really understand what you tried to recreate when u tested it: just a simple port-forward, from a wg tunnel, to address on same subnet with no vlans involved ?? on a processor that is known to work.... I am trying to understand what specific item you were testing (as that configuration is far removed from the problem I described) . I guess it was a valid test to confirm that the fundamentals were all still working in the latest release - which is important to confirm. But it really says nothing about the problem I am seeing.
Using line numbers in firewall rules is always a bad idea as line numbers are relative.
Yes, I know that. It was just was a curious observation made during troubleshooting (another strange little difference between the 2 sites). It is not part of the original problem, nor the permanent configuration, so it is not worth a diversion in this conversation.
In comparing the R7000 (different site) and the Linksys wrt1900: Both should be forwarding the incoming packets to 192.168.33.11 (ie a different subnet than the Router 192,168.9.1), and 192.168.33.1 is defined on "vlan3" in both sites.
The SPECIFIC DIFFERENCE observed is:
Although in both sites the incoming packets hit the first Firewall rule. That is: "iptables -t nat -A PREROUTING -i oet2 -p tcp --destination-port 8011 -j DNAT --to-destination 192.168.33.11".
In the Linksys router, they DO NOT go on to also hit the second rule: "iptables -I FORWARD -p tcp --dst 192.168.33.11 --dport 8011 -j ACCEPT" ... As explained, in the R7000 router, they do go on to hit that rule.
What might make Marvell Routers ACT DIFFERENTLY like that ?? As the vlans are defined in a different way to non-Marvell routers, that was the original focus of my investigations.
I do note that -- as stated previously -- I can 'make it' hit the "FORWARD -j ACCEPT" rule in the Linksys, by leaving out the --dst. This fact leads me to believe that for some strange reason, the -t nat DNAT rule, while definitely being hit (the counter goes up), is not successful in adding the new IP address (ie the "DNAT" itself). Obviously in other routers, like my R7000 and on your Linksys (non-marvell), it is successful in DNAT-ing to an IP address on vlan3.
Joined: 04 Aug 2018 Posts: 1447 Location: Appalachian mountains, USA
Posted: Sat Feb 25, 2023 16:34 Post subject:
Quote:
In the Linksys router, they DO NOT go on to also hit the second rule: "iptables -I FORWARD -p tcp --dst 192.168.33.11 --dport 8011 -j ACCEPT" ... As explained, in the R7000 router, they do go on to hit that rule.
Well, no harm in being Captain Obvious here, as maybe it will trigger a thought, and ask: What rules are above the nonfiring rule in the FORWARD chain? Is something else disposing of these packets first? _________________ 2x Netgear XR500 and 3x Linksys WRT1900ACSv2 on 53544: VLANs, VAPs, NAS, station mode, OpenVPN client (AirVPN), wireguard server (AirVPN port forward) and clients (AzireVPN, AirVPN, private), 3 DNSCrypt providers via VPN.
Hi kernel-panic69 and thanks for replying. Yes, I have updated ddwrt. Am now on r51887.
The latest status of my symptoms: I no longer see case where hits the nat table PREROUTING rule, but then does not hit FORWARD. Now, seems fairly consistent that if it hits PREROUTING it also hits FORWARD.. BUT.... some very strange twilight-zone things are still going on.
I cannot see any difference in those lines. I have even resorted to changing the line-order (to check my sanity), but the same port numbers give same symptoms, no matter the order. First - the PREROUTING counter for ports 8011 and 5479 now DOES get hit consistently, for port 8012 mostly not, but occasionally yes, and for 5480 the counter never moves. I just do not understand that ?????? Anyway, those 2 good ports ARE now being routed to my server on 192.168.33.11. . So, yes, those 2 are also (now) hitting the FORWARD counter. My server is replying... the reply gets to the Router... and "seems to" be sent back down oet2.... but the sender never receives that reply. So that is now the MAIN problem.
Hi SurprisedItWorks 😊
Those rules are at the top of the FORWARD chain, nothing is before them.
When the reply is sent back via oet2 it seems like it is never received.
I take an ssh session as an example. The first incoming packet is a SYN, to which a SYN-ack is sent back. And then it should continue thru the handshake protocol and establish a session etc.. However, what happens is that - because the ack was never received - they start over and send another SYN.... back and forward over and over, SYN, then SYN-ack .... until time-out.
I monitoring the oet2 from the router using "tcpdump -i oet2 -nnvvS"
As I mentioned before - I have near identical configuration in another site. Main difference is that it uses an R7000 not a Linksys, and its version of ddwwrt is about 15 months old. But that site is working perfectly. Note - I use port 5479 for ssh in this example.
So I compared the tcpdump trace for both sites. Here is the summary - see those zeros in the Linksys site ? - the addresses do NOT appear like that in the working (r7000) site. Is it caused by a bad NAT configuration ?? Could that explain the problem ?? will the sender even know where response came from ?
What do you see in (tcpdump ) of your outgoing, ie reply , packets on your eot tunnel ?
Do you see a sender address of 0.0.0.0 as I do in my non-working Linksys site ?
OR
Do you see a sender address of <tunnel peer IP address> as I do in my working r7000 site ?
If other people have an oet tunnel that is SERVING - and their send ip address also shows as 0.0.0.0 -- then I will know that is not the cause of the problem .
Here is example trace of oet2 trying to serve out a webpage on port 8011.
Code:
root@magnest:~# tcpdump -i oet2 -nnvvS port 8011
tcpdump: listening on oet2, link-type RAW (Raw IP), snapshot length 262144 bytes
10:22:03.339289 IP (tos 0x28, ttl 48, id 37532, offset 0, flags [DF], proto TCP (6), length 60)
103.216.222.107.16449 > 0.0.0.0.8011: Flags [S], cksum 0x7b1a (correct), seq 3024460575, win 65535, options [mss 1460,sackOK,TS val 3345813569 ecr 0,nop,wscale 9], length 0
10:22:03.340241 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
0.0.0.0.8011 > 103.216.222.107.16449: Flags [S.], cksum 0x462b (correct), seq 3112720424, ack 3024460576, win 65160, options [mss 1256,sackOK,TS val 2116265806 ecr 3345813569,nop,wscale 7], length 0
10:22:04.352550 IP (tos 0x28, ttl 48, id 37533, offset 0, flags [DF], proto TCP (6), length 60)
103.216.222.107.16449 > 0.0.0.0.8011: Flags [S], cksum 0x7725 (correct), seq 3024460575, win 65535, options [mss 1460,sackOK,TS val 3345814582 ecr 0,nop,wscale 9], length 0
10:22:04.353004 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
0.0.0.0.8011 > 103.216.222.107.16449: Flags [S.], cksum 0x4236 (correct), seq 3112720424, ack 3024460576, win 65160, options [mss 1256,sackOK,TS val 2116266819 ecr 3345813569,nop,wscale 7], length 0
10:22:05.426588 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
0.0.0.0.8011 > 103.216.222.107.16449: Flags [S.], cksum 0x3e04 (correct), seq 3112720424, ack 3024460576, win 65160, options [mss 1256,sackOK,TS val 2116267893 ecr 3345813569,nop,wscale 7], length 0
I also notice bad udp checksum messages when I monitor the outgoing packets .
These ONLY seems to occur in response packets, that I send out, on oet2 - which, coincidentally, is where the communication breaks down.
In this trace extract:
120.XX.XX.88 is my WAN.
146.XX.XX.116 is wg endpoint for oet2.
"XX" added by me.
Code:
root@magnest:~# tcpdump -i eth0 -nnvvS port 443
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
-
-
12:26:44.116346 IP (tos 0x0, ttl 64, id 19608, offset 0, flags [none], proto UDP (17), length 60)
120.XX.XX.88.51821 > 146.XX.XX.116.443: [bad udp cksum 0x7263 -> 0x8ab2!] UDP, length 32
12:26:44.450857 IP (tos 0x0, ttl 48, id 54495, offset 0, flags [DF], proto UDP (17), length 60)
146.XX.XX.116.443 > 120.XX.XX.88.51821: [udp sum ok] UDP, length 32
12:26:45.815866 IP (tos 0x0, ttl 48, id 55192, offset 0, flags [DF], proto UDP (17), length 108)
146.XX.XX.116.443 > 120.XX.XX.88.51821: [udp sum ok] UDP, length 80
Having the wrong source address (it has 0.0.0.0) will probably give it a bad checksum. The SNAT that is supposed to give it the expected source address - is in place (line 1) - but is not being hit:
Code:
Chain POSTROUTING (policy ACCEPT 9 packets, 452 bytes)
num pkts bytes target prot opt in out source destination
1 0 0 SNAT all -- * oet2 0.0.0.0/0 0.0.0.0/0 to:192.168.140.3
2 216 11794 SNAT all -- * oet1 0.0.0.0/0 0.0.0.0/0 to:192.168.140.2
3 412 99826 SNAT all -- * eth0 192.168.9.0/24 0.0.0.0/0 to:120.XX.XX.88
4 0 0 SNAT all -- * eth0 192.168.33.0/28 0.0.0.0/0 to:120.XX.XX.88
5 0 0 RETURN all -- * br1 0.0.0.0/0 0.0.0.0/0 PKTTYPE = broadcast
6 0 0 MASQUERADE all -- * br1 192.168.33.0/28 192.168.33.0/28
7 0 0 RETURN all -- * br0 0.0.0.0/0 0.0.0.0/0 PKTTYPE = broadcast
8 164 17114 MASQUERADE all -- * br0 192.168.9.0/24 192.168.9.0/24
root@magnest:~#
"the POSTROUTING hook will never see the reply packet because the connection is not NEW: the nat table is not called anymore for this flow and the packets identified as part of this flow."
So that is the reason that the SNAT is not, and never will be, hit. The SNAT is only there for NEW connections. As this problem is about the REPLY, which is classified as "RELATED". The source address in the packets have to be changed somehow i the CONNTRACK system.
So currently - conntrack is changing the source address from 192.168.33.11 to 0.0.0.0, but I need it to be changing it to 192.168.140.3, same as what the SNAT (in -t nat POSTROUTING) does for NEW connections.
After all investigations - the problem comes down to this:
I have 2 sites. They are unrelated to each other. They are just 2 completely different sites. There is no desire to connect these two sites to each other. They are completely unrelated.
However, as they have the same configuration, it means I can compare. That is helpful when one site works differently to the other. Addresses are identical in both sites.
Both sites have oet1 and oet2, connecting to external vpn provider. Only oet2 is used for port-forwarding (ie inbound connections). The (internal) peer address for oet2 is 192.168.140.2. Obviously - each end-point IP addresses is different for each of these 4 tunnels. But all tunnels do connect, are active, exchange data, no errors, stay up, very stable, etc.
Let's focus on just the first incoming packet (from internet) to oet2:
Site 1 R7000, build r47911
Code:
root@nighthawk:~# tcpdump -i oet2 -nnvvS port 8011
tcpdump: listening on oet2, link-type RAW (Raw IP), snapshot length 262144 bytes
09:47:21.842468 IP (tos 0x0, ttl 49, id 27794, offset 0, flags [DF], proto TCP (6), length 60)
84.17.37.217.59153 > 192.168.140.2.8011: Flags [S], cksum 0xf0e7 (correct), seq 925171440, win 65535, options [mss 1460,sackOK,TS val 1028200133 ecr 0,nop,wscale 9], length 0
Site 2 Linksys wrt1900acsv2, build r51887
Code:
root@magnest:~# tcpdump -i oet2 -nnvvS port 8011
tcpdump: listening on oet2, link-type RAW (Raw IP), snapshot length 262144 bytes
11:50:07.835877 IP (tos 0x28, ttl 48, id 61006, offset 0, flags [DF], proto TCP (6), length 60)
103.216.222.107.32919 > 0.0.0.0.8011: Flags [S], cksum 0xe5c3 (correct), seq 2744974535, win 65535, options [mss 1460,sackOK,TS val 3437498059 ecr 0,nop,wscale 9], length 0
Why does the peer address in Site 2 not show up ? (instead it shows as 0.0.0.0). Can anyone explain why tcpdump would show that ?
I cannot find ANY difference between the configurations. The peer address is certainly setup in both. Perhaps there is a configuration difference that I cannot find, even after extensive searching for many days. Can anyone guess what config setting ?? OR offer alternative explanation ??