[TESTING] 3xR7800 upgrade from r41813 to r50012 - issues

Post new topic   Reply to topic    DD-WRT Forum Index -> Atheros WiSOC based Hardware
Goto page 1, 2, 3  Next
Author Message
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Thu Oct 06, 2022 20:54    Post subject: [TESTING] 3xR7800 upgrade from r41813 to r50012 - issues Reply with quote
Hey,

Three R7800 are connected together using WG at different ISPs. I have upgraded firmware and encountered issues below:

1) Random router reboot in approx. 10-20 days.

2) If I access router via SSH and check commands like `ip`, `wg`, it reboots after 5-30 minutes. So it can reboot multiple times in a day. I'm mostly using OpenSSH. I've switched to default Dropbear to see how it goes.

3) WiFi disconnects depending on client computer, every 30-60 minutes 2.4 and 5GHz. Eg two WiFi clients work fine all the time, one has this issue.

4) When I connected additional WG clients (they are laptops), I've noticed WG disconnections "latest handshake" for around 5-10 minutes every few hours. Disconnection happens between R7800 units, not clients such as Macs. So I removed those computers from WG tunnels and it fixed an issue.

P.S. I did nvram erase && reboot

None of those issues I had on r41813, R7800 used to reach 1 year+ uptime under UPSes. So I'm sure these are software issues.

I would like to use latest firmware and happy to spend time on debugging, if any hints offered.

Many thanks


Last edited by LaimisV on Mon Oct 17, 2022 18:34; edited 1 time in total
Sponsor
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Thu Oct 06, 2022 21:39    Post subject: Reply with quote
WiFi settings on DD-1, DD-2 and DD-3:

nvram set wlan0_wl_advanced=1
nvram set wlan0_regdomain=LITHUANIA
nvram set wlan0_mode=ap
nvram set wlan0_net_mode=mixed
nvram set wlan0_channelbw=20
nvram set wlan0_ssid=XXXX-5G
nvram set wlan0_channel=0
nvram set wlan0_closed=0
nvram set wlan0_bridged=1

nvram set wlan1_wl_advanced=1
nvram set wlan1_regdomain=LITHUANIA
nvram set wlan1_mode=ap
nvram set wlan1_net_mode=mixed
nvram set wlan1_channelbw=20
nvram set wlan1_ssid=XXXX-2G
nvram set wlan1_channel=0
nvram set wlan1_closed=0
nvram set wlan1_bridged=1

nvram set wlan0_security_mode=psk2
nvram set wlan0_psk2=1
nvram set wlan0_ccmp=1
nvram set wlan0_wpa_psk="XXXX"

nvram set wlan1_security_mode=psk2
nvram set wlan1_psk2=1
nvram set wlan1_ccmp=1
nvram set wlan1_wpa_psk="XXXX"

The rest is r50012 default.

I used browser's inspect on GUI to find right parameter names. They should be fine, they are double checked both ways.

BTW, I disabled WiFi on DD-3 unit, using `ifconfig wlan0 down; ifconfig wlan1 down`, it has 22 days uptime now. It doesn't need WiFi.

DD-2 has 20 days of uptime, it is testing additional settings:

Code:
nvram set wlan0_channel=5825
nvram set wlan0_wl_advanced=1
nvram set wlan0_fwtype=vanilla

nvram set wlan1_channel=2462
nvram set wlan1_wl_advanced=1
nvram set wlan1_fwtype=vanilla


Last edited by LaimisV on Thu Oct 06, 2022 22:58; edited 1 time in total
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Thu Oct 06, 2022 22:01    Post subject: Reply with quote
SSH server:

DD-1 uptime is 4 days now. I rebooted 4 days ago to leave it without OpenSSH for testing.

Here is OpenSSH configuration in rc_startup on DD-2 and DD-3:

Code:
opkg install openssh-server
# OpenSSH_9.0p1, OpenSSL 1.1.1q  5 Jul 2022
opkg install sudo
# Sudo version 1.8.31

echo 'HostKey /tmp/root/.ssh/openssh_host_rsa_key
ClientAliveInterval 60
ClientAliveCountMax 3
AuthorizedKeysFile /tmp/home/xxxx/.ssh/authorized_keys
PermitRootLogin no
Port 22
' > /opt/etc/ssh/sshd_config

kill ....dropbear....here....
/opt/sbin/sshd -f /opt/etc/ssh/sshd_config -p 22


There is more stuff on OpenSSH, but it's irrelevant for crashes, I believe. In short, sshd and sudo binaries are used. Additional non root user added just to SSH to it and switch to root.


Last edited by LaimisV on Thu Oct 06, 2022 22:39; edited 3 times in total
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Thu Oct 06, 2022 22:11    Post subject: Reply with quote
WG on DD-1 (equivalent settings on DD-2 and DD-3):

Code:
# wireguard-tools v1.0.20210424

nvram set oet1_en=1
nvram set oet1_peers=2
nvram set oet1_private=XXXX
nvram set oet1_public=XXXX
nvram set oet1_proto=2
nvram set oet1_mit=1
nvram set oet1_natout=0
nvram set oet1_port=51820
nvram set oet1_mtu=1440
nvram set oet1_pbr=""
nvram set oet1_firewallin=0
nvram set oet1_killswitch=0
nvram set oet1_ipaddr="10.1.250.1"
nvram set oet1_netmask=255.255.255.0
nvram set oet_tunnels=1
nvram set oet1_psk0=""
nvram set oet1_bridged=1
nvram set oet1_id=1
nvram set oet1_nat=0

nvram set oet1_namep0="to-240"
nvram set oet1_endpoint0=1
nvram set oet1_peerkey0=XXXX
nvram set oet1_peerport0=51820
nvram set oet1_aip_rten0=1
nvram set oet1_usepsk0=0
nvram set oet1_ip0=10.1.240.1
nvram set oet1_rem0=X.X.X.X
nvram set oet1_aip0="10.1.240.1/32,192.168.240.0/24"
nvram set oet1_ka0=20

nvram set oet1_namep1="to-230"
nvram set oet1_endpoint1=1
nvram set oet1_peerkey1=XXXX
nvram set oet1_peerport1=51820
nvram set oet1_aip_rten1=1
nvram set oet1_usepsk1=0
nvram set oet1_ip1=10.1.230.1
nvram set oet1_rem1=X.X.X.X
nvram set oet1_aip1="10.1.230.1/32,192.168.230.0/24"
nvram set oet1_ka1=20
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Thu Oct 06, 2022 22:18    Post subject: Reply with quote
Additional tunnels mentioned in point 4:

I have tried to add few tunnels with same configuration as above, but difference is only 4 parameters:

Code:
nvram set oet1_rem2=
nvram set oet1_ip2=10.1.1.1
nvram set oet1_aip2=10.1.1.1/32,192.168.234.0/24
nvram set oet1_cldns2=1.1.1.1


I have removed these tunnels to see if units are stable without them - test if there is no reboot or WG tunnel downtime.


Last edited by LaimisV on Thu Oct 06, 2022 22:42; edited 1 time in total
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Thu Oct 06, 2022 22:26    Post subject: Reply with quote
Sorry for flood. I think, I should stop here and see how R7800s are performing (:

I general, CPUs are ok, plenty of RAM, plenty of TCP connections left, etc.

When R7800 reboots, nothing unusual is written to messages (R7800 reboots suddenly):

Code:
tail -f /var/log/messages > /jffs/messages.crash.8127318273 &


I can create a script with some stats or enable klogd to remote server. Would it be useful?
Let me know what to include. I'm comfortable with Linux debugging and scripting.

I bet, I have less than 4 issues (maybe 1 or 2 in total). I guess, those 4 issues that I initial listed are somehow connected together.
blkt
DD-WRT Guru


Joined: 20 Jan 2019
Posts: 5700

PostPosted: Fri Oct 07, 2022 6:35    Post subject: Reply with quote
blkt wrote:
Use current builds and reset, explore what can already now be done with WebUI before diving back into commands.

For wireless issues, try advanced settings checkbox for both radios to set each firmware type from dd-wrt to vanilla.
https://forum.dd-wrt.com/phpBB2/viewtopic.php?p=1262148#1262147
https://forum.dd-wrt.com/phpBB2/viewtopic.php?p=1272454#1272454
https://forum.dd-wrt.com/phpBB2/viewtopic.php?p=1272576#1272570
Final thoughts set regulatory domain and do not use automatic channel selection so specify width channel extension.

Sorry for copy pasta, but I do not see enough configurations for various wlan suggestions from reply or links above.

A full list is important if you have not reset as there will be a mix of old and new variables, defaults less predictable.

Firmware Type: VANILLA (not DD-WRT):

wlan0_fwtype_use=vanilla
wlan0_fwtype=vanilla
wlan1_fwtype=vanilla
wlan1_fwtype_use=vanilla

Specify Channels: 36 & 1 (examples):

wlan0_channel=5180
wlan1_channel=2412

https://en.wikipedia.org/wiki/List_of_WLAN_channels
https://w.wol.ph/2015/08/28/maximum-wifi-transmission-power-country/
https://git.kernel.org/pub/scm/linux/kernel/git/sforshee/wireless-regdb.git/tree/db.txt#n1037

WPA2 Personal CCMP-128 (AES):

wlan0_security_mode=wpa
wlan0_psk2=1
wlan0_akm=psk2
wlan0_ccmp=1

wlan1_security_mode=wpa
wlan1_psk2=1
wlan1_akm=psk2
wlan1_ccmp=1

I do not have these two below in nvram. Advanced Settings are webUI checkboxes, their only function is to expand.

wlan0_wl_advanced=1
wlan1_wl_advanced=1

This is now including the other suggestions mentioned in links, so hope you get the idea (also to use web interface).

wlan0_uapsd=0
wlan0_d_lowack=1
wlan0_intmit=0
wlan0_qboost=0
wlan0_protmode=None
wlan0_rts=0
wlan0_preamble=1
wlan0_bcn=100
wlan0_dtim=2
wlan0_distance=900 (or 1350)

wlan1_turbo_qam=0

wlan1_uapsd=0
wlan1_d_lowack=1
wlan1_intmit=0
wlan1_qboost=0
wlan1_protmode=None
wlan1_rts=0
wlan1_preamble=1
wlan1_bcn=100
wlan1_dtim=2
wlan1_distance=900 (or 1350)

Edit: swapped turboqam and wlan channels to match R7800 layout


Last edited by blkt on Fri Oct 07, 2022 7:11; edited 1 time in total
ho1Aetoo
DD-WRT Guru


Joined: 19 Feb 2019
Posts: 2975
Location: Germany

PostPosted: Fri Oct 07, 2022 6:58    Post subject: Reply with quote
this is swapped..

wlan0 is on the R7800 5Ghz and has e.g. no Turbo-QAM option.

What is the nonsense with the 100 variables actually?
Should someone here make the effort and work through all variables and uncover errors like "wlan0_security_mode=psk2" <- no valid securiry mode
blkt
DD-WRT Guru


Joined: 20 Jan 2019
Posts: 5700

PostPosted: Fri Oct 07, 2022 7:13    Post subject: Reply with quote
Argh I posted too soon. Fixed channel examples & turboqam to match wlan0 & wlan1 for R7800.
Yes wlan0_security_mode=psk2 and wlan1_security_mode=psk2 are wrong, they should = wpa.

To disable wlan0 wlan1 use Wireless Network Mode (renamed Network Mode) set each Disabled.

And yes it was some effort go through all these. I really should have said just use web interface.
Also after a reset, use egc's documentation & advice a lot of work was done to avoid commands.
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Fri Oct 07, 2022 9:08    Post subject: Reply with quote
Thanks blkt

I managed to use DD-2 WiFi without disconnects for 7 hours with this addition:

Code:
nvram set wlan0_channel=5825
nvram set wlan0_wl_advanced=1
nvram set wlan0_fwtype=vanilla

nvram set wlan1_channel=2462
nvram set wlan1_wl_advanced=1
nvram set wlan1_fwtype=vanilla


Previously, it was disconnecting every hour. Need more testing to confirm, but so far working longer. Will use additional settings that you suggested, if any disconnect.

----

I have enabled remote logging on DD-1, DD-2 and DD-3 (syslogd -Z -L -R 192.168.250.110). I receive test messages with:

Code:
echo MESSAGE1 > /dev/kmsg


----

BTW, I'm using new firmware with nvram erase && reboot on all three routers. So nvram is in good condition.

----

If it's better, please post a screenshot or parameters in GUI. This is just me (a DevOps guy) who prefers everything in command line: perfectly tracked and easily automated when needed (:
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Fri Oct 07, 2022 10:00    Post subject: Reply with quote
Adding some system information:

Code:
root@dd-2:~# cat /var/log/messages | grep -vE '(\.info|\.debug)' | grep -v auth.err
Oct  7 11:26:35 dd-2 kern.warn kernel: [   30.363366] ath10k_pci 0000:01:00.0: wmi debug print truncated: 128
...line repeated...
Oct  7 11:26:35 dd-2 kern.notice kernel: [   68.783981] SCSI subsystem initialized
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796990.049949] ath10k_pci 0000:01:00.0: could not get mac80211 beacon
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796990.149702] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
# line repeated ...
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796990.316020] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 0
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796996.295131] ath10k_pci 0000:01:00.0: wmi debug print truncated: 128
# line repeated ...
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797003.424785] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0-vanilla/firmware-6.bin failed with error -2
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797012.978948] ath10k_pci 0000:01:00.0: wmi command 36872 timeout, restarting hardware
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797012.979052] ath10k_pci 0000:01:00.0: ani_enable failed from debugfs: -11
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797013.008675] ath10k_pci 0000:01:00.0: cannot restart a device that hasn't been started
# lines repeated ...
Oct  7 11:26:39 dd-2 daemon.warn dnsmasq[945]: ignoring nameserver 192.168.240.240 - local interface
Oct  7 11:26:39 dd-2 authpriv.warn dropbear[2897]: Failed listening on '22': Error listening: Address in use
Oct  7 11:27:13 dd-2 user.warn kernel: [1831652.657097] MESSAGE
# line repeated ...
root@dd-2:~# free -m
              total        used        free      shared  buff/cache   available
Mem:         476208      105612      272448       64064       98148      296092
Swap:             0           0           0
root@dd-2:~# uptime
 12:48:28 up 21 days,  6:08,  load average: 0.52, 0.43, 0.40
ho1Aetoo
DD-WRT Guru


Joined: 19 Feb 2019
Posts: 2975
Location: Germany

PostPosted: Fri Oct 07, 2022 10:51    Post subject: Reply with quote
LaimisV wrote:
Adding some system information:

Code:
root@dd-2:~# cat /var/log/messages | grep -vE '(\.info|\.debug)' | grep -v auth.err
Oct  7 11:26:35 dd-2 kern.warn kernel: [   30.363366] ath10k_pci 0000:01:00.0: wmi debug print truncated: 128
...line repeated...
Oct  7 11:26:35 dd-2 kern.notice kernel: [   68.783981] SCSI subsystem initialized
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796990.049949] ath10k_pci 0000:01:00.0: could not get mac80211 beacon
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796990.149702] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
# line repeated ...
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796990.316020] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 0
Oct  7 11:26:35 dd-2 kern.warn kernel: [1796996.295131] ath10k_pci 0000:01:00.0: wmi debug print truncated: 128
# line repeated ...
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797003.424785] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0-vanilla/firmware-6.bin failed with error -2
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797012.978948] ath10k_pci 0000:01:00.0: wmi command 36872 timeout, restarting hardware
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797012.979052] ath10k_pci 0000:01:00.0: ani_enable failed from debugfs: -11
Oct  7 11:26:35 dd-2 kern.warn kernel: [1797013.008675] ath10k_pci 0000:01:00.0: cannot restart a device that hasn't been started
# lines repeated ...



all irrelevant, are only debug messages

LaimisV wrote:

Code:
Oct  7 11:26:39 dd-2 daemon.warn dnsmasq[945]: ignoring nameserver 192.168.240.240 - local interface
Oct  7 11:26:39 dd-2 authpriv.warn dropbear[2897]: Failed listening on '22': Error listening: Address in use
Oct  7 11:27:13 dd-2 user.warn kernel: [1831652.657097] MESSAGE
# line repeated ...


also irrelevant, since you configured it yourself.
Port 22 will be already in use because you still have openssh running in parallel.

and that with dnsmasq will be a configuration error, there is as upstream server dnsmasq itself configured so an infinite loop.

And again please post your settings as screenshot or don't do it.
I do not read coffee grounds here

Edit: oh and the build r50012 has bugs, e.g. when changing the configuration dnsmasq is not restarted correctly

https://svn.dd-wrt.com/changeset/50126
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Fri Oct 07, 2022 11:22    Post subject: Reply with quote
Sure, whatever is easier to investigate. In terms of dnsmasq, should I run this to fix it?

Code:

stopservice dnsmasq
startservice dnsmasq
ho1Aetoo
DD-WRT Guru


Joined: 19 Feb 2019
Posts: 2975
Location: Germany

PostPosted: Fri Oct 07, 2022 11:27    Post subject: Reply with quote
Yes or just flash a newer build where the service is fixed.

No idea what this has for effects on your setup but Tatsuya had also reported that with the defective dnsmasq wireguard does not work properly.

https://svn.dd-wrt.com/ticket/7576
LaimisV
DD-WRT User


Joined: 01 Mar 2016
Posts: 63

PostPosted: Fri Oct 07, 2022 12:52    Post subject: Reply with quote
I suppose, you are talking about this DNSmasq & WG issue reported by Tatsuya:

https://svn.dd-wrt.com/ticket/7576

I'll add a fix to rc_startup.

As DNSmasq is responsible for DNS and DHCP, I don't use them over WG. Except an attempt to connect few Macbooks using "Peer Tunnel DNS" at "1.1.1.1", if that is related.

So I believe, there are no direct relations to issue reported by Tatsuya.
Goto page 1, 2, 3  Next Display posts from previous:    Page 1 of 3
Post new topic   Reply to topic    DD-WRT Forum Index -> Atheros WiSOC based Hardware All times are GMT

Navigation

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum