Question about ath10k driver load and crashing

Post new topic   Reply to topic    DD-WRT Forum Index -> Atheros WiSOC based Hardware
Author Message
CR_Apollo
DD-WRT User


Joined: 25 Dec 2020
Posts: 90
Location: Toronto - Canada

PostPosted: Thu Sep 02, 2021 22:07    Post subject: Question about ath10k driver load and crashing Reply with quote
Hi there, hopefully someone can provide some insight?

I have been monitoring the logs closely since r47282. In all cases on boot, I see that direct driver load fails for Ath10k and in some cases on boot, I see vanilla fail to load one time, assuming on 2.4 since it mostly happens when I set 2.4 to vanilla, but also happened a couple times on 5ghz. Is this something similar to the pre configs CPU 800mhz settings during post, cause I see it load 10.4 later?

The other thing is I am seeing a lot of crashes lately with ath10k wmi. I know this is power saving related, but I do not have any of the APs set to use power saving. Unless there is an automatic setting for the PCI aspect? Or is this related to my clients settings? I have read many posts where this does happen on openwrt and dd-wrt, and unless the system does not recover, it seems to be acceptable/normal behavior?? For example, the only time I saw a dev pay any attention to the reporting of this, (KONG) is when the user mentions that their system freezes up during a crash. But should I really be seeing this 2 to 3 times in a day? I went 36 hours without this happening once,as far as I can tell(it's possible I lost the logs as I was setting up syslog in that 36 hour stretch?) but now it has happened 3 times within 24 hours.

So to recap, my two questions are...

1. what's up with the failed direct ath10k driver load on every boot up, and occasional vanilla direct driver load fail at time after the other 2 fails?
2. wmi crashes, are they normal and as long as the system recovers from it, and if it happens often, it is acceptable?

LOGs of the crash below. Also, sometimes I end up with duplicated routing tables after a crash. Nothing that halts the system. And of course, reboot fixes that when it gets rebuilt. However, in one case the crash happened right after a reboot and I had some of the table duplicated.

warning kernel [132704.489621] ath10k_pci 0000:01:00.0: wmi command 36872 timeout, restarting hardware
kern warning kernel [132704.489656] ath10k_pci 0000:01:00.0: ani_enable failed from debugfs: -11
kern info kernel [132706.899034] ath10k_pci 0000:01:00.0: boot get otp board id result 0x00040400 board_id 1 chip_id 0 ext_bid_support 1
kern info kernel [132706.951481] ath10k_pci 0000:01:00.0: boot upload otp to 0x1234 len 9317
kern info kernel [132710.493058] ath10k_pci 0000:01:00.0: Init Max Stations to 512
kern info kernel [132710.569600] ath10k_pci 0000:01:00.0: wmi event ready sw_version 0x01000000 abi_version 3 mac_addr XX:XX:XX:XX:XX:XX status 0
kern info kernel [132710.693634] ath10k_pci 0000:01:00.0: bdf failsafe status event received from fw: 0
kern info kernel [132710.791351] ieee80211 phy0: Hardware restart was requested
kern warning kernel [132713.849629] ath10k_pci 0000:01:00.0: wmi command 36864 timeout, restarting hardware
kern warning kernel [132713.849666] ath10k_pci 0000:01:00.0: failed to start hw scan: -11
kern info kernel [132716.269207] ath10k_pci 0000:01:00.0: boot get otp board id result 0x00040400 board_id 1 chip_id 0 ext_bid_support 1
kern info kernel [132716.321699] ath10k_pci 0000:01:00.0: boot upload otp to 0x1234 len 9317
kern info kernel [132719.872392] ath10k_pci 0000:01:00.0: Init Max Stations to 512
kern info kernel [132719.948930] ath10k_pci 0000:01:00.0: wmi event ready sw_version 0x01000000 abi_version 3 mac_addr XX:XX:XX:XX:XX:XX status 0
kern info kernel [132720.072865] ath10k_pci 0000:01:00.0: bdf failsafe status event received from fw: 0
kern info kernel [132720.075079] ath10k_pci 0000:01:00.0: device successfully recovered
Sponsor
ho1Aetoo
DD-WRT Guru


Joined: 19 Feb 2019
Posts: 2927
Location: Germany

PostPosted: Fri Sep 03, 2021 6:34    Post subject: Reply with quote
Quote:
1. what's up with the failed direct ath10k driver load on every boot up, and occasional vanilla direct driver load fail at time after the other 2 fails?


there is nothing wrong with that at all these are debug messages

and if you want to know it exactly, there are different firmware API versions:

https://wireless.wiki.kernel.org/en/users/drivers/ath10k/firmware

The driver tries to load the different firmware API versions - in our case also API 6 (a firmware version that simply does not exist)

QCA9984 currently uses firmware API 5

So "direct firmwareload error" messages don't matter especially if it's firmware-6.bin

if it couldn't load any firmware then the WLAN network controllers would not work at all

and the reason why it initializes the hardware multiple times is that the router loads the dd-wrt firmware first and then dynamically loads the vanilla firmware afterwards.

Is just so - so is the implementation
CR_Apollo
DD-WRT User


Joined: 25 Dec 2020
Posts: 90
Location: Toronto - Canada

PostPosted: Fri Sep 03, 2021 13:32    Post subject: Reply with quote
ho1Aetoo wrote:
Quote:
1. what's up with the failed direct ath10k driver load on every boot up, and occasional vanilla direct driver load fail at time after the other 2 fails?


there is nothing wrong with that at all these are debug messages

and if you want to know it exactly, there are different firmware API versions:

https://wireless.wiki.kernel.org/en/users/drivers/ath10k/firmware

The driver tries to load the different firmware API versions - in our case also API 6 (a firmware version that simply does not exist)

QCA9984 currently uses firmware API 5

So "direct firmwareload error" messages don't matter especially if it's firmware-6.bin

if it couldn't load any firmware then the WLAN network controllers would not work at all

and the reason why it initializes the hardware multiple times is that the router loads the dd-wrt firmware first and then dynamically loads the vanilla firmware afterwards.

Is just so - so is the implementation


That's what I figured for that part, that it was just based on the different models supported by the firmware, to have multiple devices supported. It just seems odd that I get the regular firmware-6 error, then also the vanilla/firmware-6. Should it not only attempt to load one type of firmware based on my settings, even if it does try the 6 before the 5?

In some cases I get two of each failures "HW1.0 and HW1.0-Vanilla... like so

x2 always early in the POST
Dec 31 19:00:47 Apollo_Network kern.warn kernel: [ 7.279232] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
Dec 31 19:00:22 Apollo_Network kern.warn kernel: [ 13.790916] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2

Sometimes following first two failed direct load errors, in some cases x2, sometimes x1 and sometimes not at all. Last boot was x2.
Dec 31 19:00:48 Apollo_Network kern.warn kernel: [ 48.972905] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0-vanilla/firmware-6.bin failed with error -2
Dec 31 19:00:23 Apollo_Network kern.warn kernel: [ 23.499762] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0-vanilla/firmware-6.bin failed with error -2

And how about the crashes I am seeing? I have not seen one since I posted this, but I also disconnected all 2.4Ghz clients for testing. I'm going to see if I notice anything different in the next 24-48 hours with 2.4Ghz clients off the network.

Thanks for the info, that makes total sense. I am just not sure why I am seeing it more times with and without vanilla in some bootups, and others not at all for vanilla or 1 failed instance only. Cheers.

Sorry, I keep using the word POST in my example, but I know it's not the POST, but rather, in DOS terms, command.com and config.sys, so to speak.
ho1Aetoo
DD-WRT Guru


Joined: 19 Feb 2019
Posts: 2927
Location: Germany

PostPosted: Fri Sep 03, 2021 15:03    Post subject: Reply with quote
So I'll try to explain it again - is also the last time.

The R7800 has two radios, 2x QCA9984
The radios have also different PCI ID's

ath10k_pci 0000:01:00.0: <- Radio 1
ath10k_pci 0001:01:00.0: <- Radio 2

Each radio is initialized individually and loads individually firmware

System Startup:

dd-wrt firmware is loaded ->

ath10k_pci 0000:01:00.0: firmware ver 10.4-ddwrt-9984-tW-13-6284M api 5 features mfp,peer-flow-ctrl,allows-mesh-bcast,peer-fixed-rate crc32 88f56884 <-- Radio 1

ath10k_pci 0001:01:00.0: firmware ver 10.4-ddwrt-9984-tW-13-6284M api 5 features mfp,peer-flow-ctrl,allows-mesh-bcast,peer-fixed-rate crc32 88f56884 <-- Radio 2

vanilla firmware is loaded ->

ath10k_pci 0000:01:00.0: user requested fw restart
ath10k_pci 0000:01:00.0: firmware ver 10.4-3.15-00023 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 d67456e2 <-- Radio 1

ath10k_pci 0001:01:00.0: user requested fw restart
ath10k_pci 0001:01:00.0: firmware ver 10.4-3.15-00023 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 d67456e2 <- Radio 2

and again:

Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error
Direct firmware load for ath10k/QCA9984/hw1.0-vanilla/firmware-6.bin failed with error

the firmware-6.bin messages are not errors
do not search for errors where none errors are

here are enough people who know the firmware and the drivers very well and if there would be something wrong then we would have noticed it long ago

and i don't see a real crash in the log - only a chip reset but not in which context it occurred
CR_Apollo
DD-WRT User


Joined: 25 Dec 2020
Posts: 90
Location: Toronto - Canada

PostPosted: Fri Sep 03, 2021 15:28    Post subject: Reply with quote
ho1Aetoo wrote:
So I'll try to explain it again - is also the last time.

The R7800 has two radios, 2x QCA9984
The radios have also different PCI ID's

ath10k_pci 0000:01:00.0: <- Radio 1
ath10k_pci 0001:01:00.0: <- Radio 2

Each radio is initialized individually and loads individually firmware

System Startup:

dd-wrt firmware is loaded ->

ath10k_pci 0000:01:00.0: firmware ver 10.4-ddwrt-9984-tW-13-6284M api 5 features mfp,peer-flow-ctrl,allows-mesh-bcast,peer-fixed-rate crc32 88f56884 <-- Radio 1

ath10k_pci 0001:01:00.0: firmware ver 10.4-ddwrt-9984-tW-13-6284M api 5 features mfp,peer-flow-ctrl,allows-mesh-bcast,peer-fixed-rate crc32 88f56884 <-- Radio 2

vanilla firmware is loaded ->

ath10k_pci 0000:01:00.0: user requested fw restart
ath10k_pci 0000:01:00.0: firmware ver 10.4-3.15-00023 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 d67456e2 <-- Radio 1

ath10k_pci 0001:01:00.0: user requested fw restart
ath10k_pci 0001:01:00.0: firmware ver 10.4-3.15-00023 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 d67456e2 <- Radio 2

and again:

Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error
Direct firmware load for ath10k/QCA9984/hw1.0-vanilla/firmware-6.bin failed with error

the firmware-6.bin messages are not errors
do not search for errors where none errors are

here are enough people who know the firmware and the drivers very well and if there would be something wrong then we would have noticed it long ago

and i don't see a real crash in the log - only a chip reset but not in which context it occurred



Okay, I certainly did not mean to offend here, and I know this is not a problem with the drivers and I was not implying that. So now I understand that regardless of the firmware I choose in my setting for both radios, both firmwares (dd-wrt and vanilla) load into the system. That was the only thing I was confused about there and was not saying I had issues with the drivers.

As for the second thing, while I was not able to record the previous crash logs (red), I did have them when that happened. It does not always crash and even when it does, I do recover from it. My only question with that part, is what is likely causing this to happen? Is it most likely a client that does not respond well and causes the firmware to restart, or is it the firmware itself? That's all I am asking there. As I mentioned, it has not happened since I disabled my 2.4 clients. (2.4 is still up, just no active clients presently. So I guess, as long as this continues the same over the next 24-48 hours, it is likely the client(s) causing the firmware to restart?

Again, was not intending to frustrate anyone and I was not reporting an issue about the firmware, I was just asking to better understand it. I apologize if my questions were not clear enough. Thanks.
ho1Aetoo
DD-WRT Guru


Joined: 19 Feb 2019
Posts: 2927
Location: Germany

PostPosted: Fri Sep 03, 2021 15:32    Post subject: Reply with quote
no idea, bad client, firmware bug or whatever
there are ~100 different firmware versions for the chip, it is not said that all are completely bug free

i can't observe anything like this on my r7800 at the moment
CR_Apollo
DD-WRT User


Joined: 25 Dec 2020
Posts: 90
Location: Toronto - Canada

PostPosted: Fri Sep 03, 2021 15:45    Post subject: Reply with quote
ho1Aetoo wrote:
no idea, bad client, firmware bug or whatever
there are ~100 different firmware versions for the chip, it is not said that all are completely bug free

i can't observe anything like this on my r7800 at the moment


10-4. I'll keep an eye on it and see if I can pin point what's causing it. Thanks again!
Display posts from previous:    Page 1 of 1
Post new topic   Reply to topic    DD-WRT Forum Index -> Atheros WiSOC based Hardware All times are GMT

Navigation

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum