Posted: Thu Jun 16, 2022 20:29 Post subject: r6300v1 Kernel Panic r49197 giga (06/14/22)
EDIT: See posts below, settings dont matter, crash reproducible even on fresh boot with all defaults after Factory Resetting the router.
Background:
Unlikely to be hardware, had this router up with Tomato fw for 600+ days prior to loading build. Loaded 06-07-2022-r49113, crash happened after ~4 days of uptime. Had no serial set up, so unclear if same exact crash. Updated to 06-14-2022-r49197. Crash happened after ~18 hours or so. Both of these happened while not logged into the router. Hooked up serial, after about 16 hours while applying settings in GUI crash happened. Serial output below.
Both builds were set up from scratch. Load build, then Administration -> Factory Defaults.
Setup specifics:
DHCP on
IPv4 used / IPv6 off
SFE set to CTF
DNS forwarded to piHole (WAN DNS ignored)
OpenVPN server set up
~20 statically allocated IPs
Access restrictions set for some MAC addresses
Dynamic DNS set up via Custom selection
2.4Ghz radio ON (WPA2, forced AES)
5.0Ghz radio OFF
16GB USB formatted FAT32 plugged in
UPnP OFF with some ports manually forwarded
SSH + Remote SSH are turned on
CFE for R6300 version: v1.0.2
Build Date: Wed Apr 25 16:29:10 CST 2012
Init Arena
Init Devs.
Boot partition size = 262144(0x40000)
Found an ST compatible serial flash with 32 64KB blocks; total size 2MB
Found a Samsung NAND flash with 2048B pages or 128KB blocks; total size 128MB
et0: Broadcom BCM47XX 10/100/1000 Mbps Ethernet Controller 5.100.138
CPU type 0x19749: 600MHz
Tot mem: 131072 KBytes
another dump (easily recreated by switching from CTF back to SFE, then to CTF a couple of times):
Code:
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Another app is currently holding the xtables lock. [ 429.374824] CPU 0 Unable to handle kernel paging request at virtual address 6f70707e, epc ==0
Perhaps you want to use the -w option?
[ 429.389918] Oops[#1]:
[ 429.395831] CPU: 0 PID: 7282 Comm: awk Tainted: P 4.4.302-st14 #16720
[ 429.403826] task: 8794e450 ti: 860c4000 task.ti: 860c4000
[ 429.409402] $ 0 : 00000000 00000001 6f707070 000006d8
[ 429.414841] $ 4 : 00000000 0000325f 00000000 700043a1
[ 429.420270] $ 8 : 00000001 860fe244 00000000 0000f4c0
[ 429.425700] $12 : 0000035e 00000d71 0000add4 00000019
[ 429.431130] $16 : 86f95200 86077b00 854f7680 87b4f500
[ 429.436569] $20 : 860fe23e 0000008a 87b48884 87b27000
[ 429.441998] $24 : 00000001 8002a6e0
[ 429.447428] $28 : 860c4000 87809d58 00000000 87b42c50
[ 429.452859] Hi : 00000049
[ 429.455833] Lo : d1eb8646
[ 429.458824] epc : 87b42898 0x87b42898
[ 429.462791] ra : 87b42c50 0x87b42c50
[ 429.466747] Status: 1100fc03 KERNEL EXL IE
[ 429.471103] Cause : 00800008 (ExcCode 02)
[ 429.475241] BadVA : 6f70707e
[ 429.478217] PrId : 00019749 (MIPS 74Kc)
[ 429.482265] Modules linked in: nf_nat_pptp nf_conntrack_pptp nf_nat_proto_gre nf_conntrack_proto_gre msdos vfat fat nls_utf8 nls_iso8859_2 nls_]
[ 429.517269] Process awk (pid: 7282, threadinfo=860c4000, task=8794e450, tls=77762824)
[ 429.525352] Stack : 860fe23e 860fe244 00000000 00000000 00000011 00009411 00009411 00000000
00000000 00000000 860fe23e 860fe244 00000002 86cb7000 00000800 00000000
00000000 00000800 86d37800 854f7680 86fed1f0 00000001 86f95200 00000001
87b27420 87809df0 0000001e 00000001 87ade000 860fe220 00000000 86cad7a0
85fb99ac 00000001 00000000 0000001e 86cb0000 87ade200 86f95200 00020148
...
[ 429.562342] Call Trace:
[ 429.564872] [<87b42898>] 0x87b42898
[ 429.568470]
[ 429.570009]
Code: 8c42009c 50400013 3c0487b4 <9443000e> 50600010 3c0487b4 0043b821 12e0000c 8fa20054
[ 429.580385] ---[ end trace 7aa59f82f4fa932d ]---
[ 429.585166] Kernel panic - not syncing: Fatal exception in interrupt
[ 429.591724] Please stand by while rebooting the system...
Decompressing...done
No IPv6 and yet logs seem to tell something else with the ip6tables spread all over... mmm
Yup I think those errors in the log are due to ipv6 being off. They are there on a completely fresh boot after Factory Reset (since ipv6 is off by default)
Code:
ip6_tables: (C) 2000-2006 Netfilter Core Team
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
And nope, upgraded from stock firmware to 06-07-2022-r49113, then when it crashed updated to 06-14-2022-r49197, each time doing a Factory Reset after flash and setting up from scratch.[/code]
It appears that settings dont matter, easily reproducible with defaults after a fresh factory clean. Simply set to CTF (apply), then to SFE (apply), then back to CTF (apply). And after 3 to 5 times it will kernel panic (Ive seen both on SFE and on CTF).
Can reproduce all day long. But remember it crashes even when not touching it, just takes a long time, this is a way to crash it after fresh boot within a minute or two.
Here is another one on last year's build (wanted to see if new issue) Firmware: DD-WRT v3.0-r46788 giga (05/28/21) ALL SETTINGS ARE DEFAULT AFTER FRESH CLEAN:
Code: 8c57009c 12e00011 00000000 <96e2000e> 1040000e 02e2b821 12e0000c 8fa20054 24040001
e upgraded.
---[ end trace 9c096458fff5e8ab ]---
Kernel panic - not syncing: Fatal exception in interrupt
Please stand by while rebooting the system...
What's the outcome of saving and rebooting? _________________ "The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep." - Robert Frost
"I am one of the noticeable ones - notice me" - Dale Frances McKenzie Bozzio
The original issue was observed when everything was set up, saved and router rebooted. Then after 4 days it crashed.
With the option swapping and saving is just a fast way to induce a crash.
the-joker wrote:
There are others with this router on latest fw without issues.
From my personal observation most people update their builds regularly. So if it takes many days to crash they may never get to the point where crash occurs.
Does this happen without either SFE or CTF enabled, whatsoever? _________________ "The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep." - Robert Frost
"I am one of the noticeable ones - notice me" - Dale Frances McKenzie Bozzio
Does this happen without either SFE or CTF enabled, whatsoever?
Dont know, would require to set to off and let it run for week+ to be sure.
But if you look at successful sets it appears to crash when it tries to execute code for:
Quote:
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
sfe : shortcut forwarding engine successfully started<-- CRASH HAPPENS HERE
device vlan2 left promiscuous mode
Quote:
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded.
ctf : fast path forwarding successfully started <-- CRASH HAPPENS HERE
Joined: 31 Jul 2021 Posts: 2146 Location: All over YOUR webs
Posted: Thu Jun 16, 2022 23:06 Post subject:
Duxa wrote:
From my personal observation most people update their builds regularly. So if it takes many days to crash they may never get to the point where crash occurs.
Very few people update regularly, so I disagree, and that is from experience from people asking for help on these boards vs people doing the build threads reports.
Mostly people run some ancient outdated builds or insist on running something outdated because somehow it works then for their specific setups, and I quite believe that scenario since if there are no issues with a specific setups and with others on same machine something goes awry.
There are several others others with your router and running CTF and changing it around to SFE without your results, otherwise Im sure they would have said something, so it has to be some specif user setup or something else.
From my personal observation most people update their builds regularly. So if it takes many days to crash they may never get to the point where crash occurs.
Very few people update regularly, so I disagree, and that is from experience from people asking for help on these boards vs people doing the build threads reports.
Mostly people run some ancient outdated builds or insist on running something outdated because somehow it works then for their specific setups, and I quite believe that scenario since if there are no issues with a specific setups and with others on same machine something goes awry.
There are several others others with your router and running CTF and changing it around to SFE without your results, otherwise Im sure they would have said something, so it has to be some specif user setup or something else.
So good luck, sorry I cant be of help to you.
There is literally nothing unqiue about my setup. I confirmed that the issue happens with everything set to defaults after a fresh clean to defaults. Not a single setting is changed from defaults (other than changing the CTF/SFE).
So, I dont see how it can be my "setup".
I thought maybe it was having FAT32 USB stick plugged in, but happens without it as well.
Here is another dump with crash in different spot, again this is with everything at default with USB unplugged. Complain about IPTABLES below is from default IPTABLES:
Code:
[ctf] : fast path forwarding successfully started
[process_monitor] : daemon successfully stopped
[process_monitor] : set timer: 3600 seconds, callback: ntp_main()[process_monitor] : cleanup timers[process_monitor] : successfully started
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
[ 2031.172308] Unhandled kernel unaligned access[#1]:
[ 2031.177281] CPU: 0 PID: 0 Comm: swapper Tainted: P 4.4.302-st14 #16792
[ 2031.185373] task: 8046b320 ti: 8045e000 task.ti: 8045e000
[ 2031.190948] $ 0 : 00000000 80520000 005b96a5 00000000
[ 2031.196387] $ 4 : 00000238 0002a413 00000002 76406eb8
[ 2031.201825] $ 8 : 00000000 00000000 00000000 00000000
[ 2031.207256] $12 : 0000035e 000004bf 0000c750 00000000
[ 2031.212694] $16 : 85aa2400 86f26200 858bf080 87bcd500
[ 2031.218124] $20 : 860e1f1e 00000085 87bc8884 879e3000
[ 2031.223554] $24 : 00000000 8002a6e0
[ 2031.228992] $28 : 8045e000 87809d58 00000001 87bc32bc
[ 2031.234423] Hi : 004938cf
[ 2031.237397] Lo : 8c8b68ab
[ 2031.240383] epc : 87bc2898 0x87bc2898
[ 2031.244345] ra : 87bc32bc 0x87bc32bc
[ 2031.248301] Status: 1100bc03 KERNEL EXL IE
[ 2031.252649] Cause : 00800010 (ExcCode 04)
[ 2031.256787] BadVA : 005b96b3
[ 2031.259763] PrId : 00019749 (MIPS 74Kc)
[ 2031.263810] Modules linked in: nf_nat_pptp nf_conntrack_pptp nf_nat_proto_gre nf_conntrack_proto_gre ip6_tables tun b5301x_srab b5301x_common w]
[ 2031.281773] Process swapper (pid: 0, threadinfo=8045e000, task=8046b320, tls=00000000)
[ 2031.289946] Stack : 860e1f1e 860e1f24 00000000 00000000 00000006 000086c4 0000bb01 00000000
a8fe573f a7ff0444 860e1f1e 860e1f24 00000001 879e3000 00000800 00000000
00000002 00000800 8734e000 858bf080 860354d8 00000000 85aa2400 00000001
879e3420 87809df0 0000001e 00000001 8716b000 860e1f00 00000000 86dcd7a0
871bc4a0 00000001 871bc4cc 0000001e 86dd0000 8716b400 85aa2400 87809df0
...
[ 2031.326936] Call Trace:
[ 2031.329465] [<87bc2898>] 0x87bc2898
[ 2031.333063]
[ 2031.334603]
Code: 8c42009c 50400013 3c0487bc <9443000e> 50600010 3c0487bc 0043b821 12e0000c 8fa20054
[ 2031.344972] ---[ end trace cee3785b21292ea5 ]---
[ 2031.349741] Kernel panic - not syncing: Fatal exception in interrupt
[ 2031.356300] Please stand by while rebooting the system...
Decompressing...done
Joined: 31 Jul 2021 Posts: 2146 Location: All over YOUR webs
Posted: Fri Jun 17, 2022 6:43 Post subject:
Duxa wrote:
I confirmed that the issue happens with everything set to defaults after a fresh clean to defaults. Not a single setting is changed from defaults (other than changing the CTF/SFE).
So, I dont see how it can be my "setup".
Fair enough, so its not your setup, I only mentioned it because some things only happen after some particular setting is enabled.
As for logs and its greatly appreciated the voluntary provided snippets of the serial logs,
Please provide full complete serial logs from boot to the time issue happens and if it crashes all those extra lines until the reboot starts logging again. That way we can see the full picture. (obviously you are invited to replace sensitive information with e.g. xxxx) as long as nothing in the whole is omitted otherwise.
Attach those to your reply as a text file.
Inline smaller snippets are fine as a reference to a specific portion (you believe has bearing for instance) that is present in the full log attached.
I confirmed that the issue happens with everything set to defaults after a fresh clean to defaults. Not a single setting is changed from defaults (other than changing the CTF/SFE).
So, I dont see how it can be my "setup".
Fair enough, so its not your setup, I only mentioned it because some things only happen after some particular setting is enabled.
As for logs and its greatly appreciated the voluntary provided snippets of the serial logs,
Please provide full complete serial logs from boot to the time issue happens and if it crashes all those extra lines until the reboot starts logging again. That way we can see the full picture. (obviously you are invited to replace sensitive information with e.g. xxxx) as long as nothing in the whole is omitted otherwise.
Attach those to your reply as a text file.
Inline smaller snippets are fine as a reference to a specific portion (you believe has bearing for instance) that is present in the full log attached.
ATM I cant see any clear indication of any thing being wrong. What I will do Is ask someone in forums who has the same router to chime in with his testing, because I dont think he ever had this issue.
I currently set it up to output everything to console and am recording said console to a file. I want to see if crash happening naturally (not induced by SFE/CTF switching) looks similar to induced one. So this may take several days (firs time it happened after 4 days). Ill post logs once it happens.
Right now Id like to know if this is a bug affecting all routers (not just r6300v1). One thing that is not obvious unless you are looking at serial is that when you switch CTF or SFE and press Apply, it takes about 5 seconds for things to actually start applying, then it takes another 30 seconds or so (and interfaces going down/up etc)
So process is not, set SFE, Apply, CTF Apply, SFE Apply etc.
its more like...
set CTF, Apply,
wait 1 minute.
set SFE, Apply
wait 1 minute
set CTF, apply
wait 1 minute
etc...
You can see log of what happens when you apply here. Note line#1 is when Apply is clicked. Line 223 is the crash.
From various dumps I posted you can see that its almost always a different process that is crashing, and crash is unaligned access. So this tells me something is writing beyond its malloc.
one of the crashes was caused by PID 0 Swapper. There is pretty much no way something is wrong with swapper itself, something else wrote to where it wasnt supposed to.
Thats question 1. Can you take whatever router you have. Set CTF, apply, wait 1 minute, set SFE, apply, wait 1 minute... and like that until you do it like 10 times. Its possible you will see the same issue if its a malloc issue.
Question 2. That id like to get BS to look at.. these errors are seen on default values, after factory fresh.
iptables: No chain/target/match by that name. <-- This one while DDWRT is simply idling. I did some poking around and it looks like it may be related to blocking WAN (outside) pings. Perhaps binaries got updated and discontinued switches are being used (or case sensitivity bug)
ip6tables v1.8.5 (legacy): can't initialize ip6tables table `mangle': Address family not supported by protocol
Perhaps ip6tables or your kernel needs to be upgraded. This is seen every time during boot, or restart of interfaces. ipv6 is turned off. So question is two fold:
a) Should this ipv6 business be bypassed if its toggled off
b) should ip6tables be updated for newer kernels?
PS - Oh hey, since the UI guy is on the line. Perhaps setting a 30 second "Applying" spinner when settings are applies in the Setup -> Basic setup tab a good UI/UX move? Since currently its misleading, you hit apply and it wont be applied for another minute, so shouldn't change another thing and hit apply before then.