Posted: Thu Jul 30, 2009 21:25 Post subject: Non booting WL-500W and fix
Hi, my WL-500W wasn't booting and rescue mode wouldn't work. I didn't find anything helpfull so I'm documenting my case here.
The symptoms were pretty strange. The router wouldn't boot. During the boot process it would answer to 2 pings and then stop. It went into rescue mode with the rescue button but when I started tftp it would just reboot. This is with DD-WRT v24 (05/24/0 mega (pretty ancient, it stoped working and I shelved it).
I got around to attaching a serial console and it would go through CFE, load the kernel and then just stop after "Closing network". When I interrupted the boot process any kind of network traffic, pings or tftp included, would make CFE stop responding. Rescue mode would throw an exception after the first tftpt packet.
It was getting pretty dire till I started playing around with CFE and noticed nvram show would crash it just after the sshd_dss_host_key line. Same thing for sshd_rsa_host_key. Dunno how they made it into the nvram or why they crashed it but after a
Does that code provide any insight into why the WL500W routers, in particular, brick with the late v24/SP1 builds like no other routers do? _________________ SIG:
I'm trying to teach you to fish, not give you a fish. If you just want a fish, wait for a fisherman who hands them out. I'm more of a fishing instructor.
LOM: "If you show that you have not bothered to read the forum announcements or to follow the advices in them then the level of help available for you will drop substantially, also known as Murrkf's law.."
Joined: 04 Jan 2007 Posts: 11564 Location: Wherever the wind blows- North America
Posted: Thu Jul 30, 2009 21:53 Post subject:
Ditto on the "Nice job" This is great info....thanx for sharing it.
I have added a link to this thread in the wiki for this model.
barryware wrote:
You on a linux box? I can't bust into the boot with a windows box using HT or putty.
The Cntl-C window to break into the CFE is very narrow...its only about 250mS right at power on.
Start with a rapid-fire Cntl-C as you plug it in.
some models are good about giving you a wide window (like 2 seconds)...then others are so quick you have to try 15 times to get it right....this model is one of the later. (as well as many of the Netgear units)
redhawk _________________ The only stupid question....is the unasked one.
I have tracked the CFE 'nvram show' crashing down to all of the ssh and vpn keys and certs used in my config--
nvram show would crash with all of them except the openvpn_dh
The fact that you debricked your router by simply removing the two ssh keys indicates that cfe is having problems reading/parsing them during boot time in certain instances for some reason.
This causes a problem because most folks that want a hefty router such as the wl500w want to enable things such as VPN and SSH.
Initially I thought the problem was associated with the endline or variable terminating characters because nvram show would always display the entire variable, including its concluding pattern (i.e. end certificate) before crashing.
However, if you simply remove the data out of the variable and assign aaaa to it, nvram show works and completes successfully.
It seems that a closer look into the cfe's parsing method is needed.
There are 2 issues here it seems:
1. While normally parsing via nvram show, there are issues raised with certain patterns or sets of text. What patterns are they and why does it break the code?
2. In some instances during the sequence that loads nvram at boot time these patterns appear to crash cfe as well. What causes this? Is it a particular memory location? The first components of the next variable?
So, either way there is a parsing error in the cfe code, or the parsing algorithm in the code wasn't made to separate strings like those that are being used.
Why does it only sometimes strike in such a manner to brick the router?
How can it be fixed?
If we can identify the parsing error then perhaps we can get the dd-wrt developers to change the algorithm used to save the keys-- but that seems like a _lot_ of work for one router model.
The other option is to fix the bug in cfe and recompile, or compile a newer version of CFE for the router... which the general public will not want to do, and i'm not sure if you can do at all without voiding warranty.
Posted: Thu Sep 24, 2009 14:51 Post subject: Re: Non booting WL-500W and fix
chbm wrote:
Hi, my WL-500W wasn't booting and rescue mode wouldn't work. I didn't find anything helpfull so I'm documenting my case here.
The symptoms were pretty strange. The router wouldn't boot. During the boot process it would answer to 2 pings and then stop. It went into rescue mode with the rescue button but when I started tftp it would just reboot. This is with DD-WRT v24 (05/24/0 mega (pretty ancient, it stoped working and I shelved it).
I got around to attaching a serial console and it would go through CFE, load the kernel and then just stop after "Closing network". When I interrupted the boot process any kind of network traffic, pings or tftp included, would make CFE stop responding. Rescue mode would throw an exception after the first tftpt packet.
It was getting pretty dire till I started playing around with CFE and noticed nvram show would crash it just after the sshd_dss_host_key line. Same thing for sshd_rsa_host_key. Dunno how they made it into the nvram or why they crashed it but after a
nvram show started working again and the router booted right up.
This is AWSOME info!
I am sure this may be the case with many people that use the WL500W.
I also was using SSH and VPN when my router stopped working after a firmware upgrade.
I simply assumed that my issue was caused because I did not restore to factory defaults during the upgrade. Factory defaults would wipe out my ssh / VPN configurations.
It may be prudent to make a note in the WIKI that IF people so use SSH they need to wipe out that info before and update.
Joined: 04 Nov 2006 Posts: 89 Location: The Dalles, Oregon USA
Posted: Thu Sep 24, 2009 16:14 Post subject:
The problem is that sometimes bricking occurs even if an upgrade has not occured- sometimes they will happen randomly (perhaps during traffic graph nvram commit) and sometimes on reboot.
In another thread someone has recommended trying a build newer than 12533.
I can verify newer versions to see if nvram show at cfe crashes cfe with keys, and report back in WIKI or other.
Hopefully will find out how to get newer build and will be able to test it soon, will post results.
Should i be grabbing VINT, NEWD or NEWD-2, or how do i find out? (i see that 12533 is only VINT.. that would indicate that maybe i need to stay with VINT)
You don't want a VINT build, it should be a NEWD.
(Eko is the only one also doing VINT builds, all Brainslayer builds are NEWD) _________________ Kernel panic: Aiee, killing interrupt handler!
CFE version 1.0.37 for BCM947XX (32bit,SP,LE)
Build Date: | 7� 26 16:41:16 CST 2007 (root@localhost.localdomain)
Copyright (C) 2000,2001,2002,2003 Broadcom Corporation.
Initializing Arena
Initializing Devices.
et0: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 3.90.23.0
rndis0: Broadcom USB RNDIS Network Adapter (P-t-P)
et1: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 3.90.23.0
CPU type 0x29006: 264MHz
Total memory: 33554432 KBytes
Totally crashes and reboots the router.
This means that even with the newest builds, the wl500w will still randomly brick as long as people are enabling sshd and/or entering vpn keys... and perhaps other instances.
The key here is to figure out what is causing the crash... i will do some more investigation into the pattern and perhaps source code.
Joined: 24 Aug 2009 Posts: 2070 Location: South Florida
Posted: Fri Sep 25, 2009 3:38 Post subject:
Wow! Some awesome work going on here. Keep it up!
Personally I don't use VPN so therefore haven't experienced any of the problems evident in your discoveries...
What I can say though, is that every Brainslayer build I have tested on my WL500W has caused problems. Eko's builds (NEWD-2) cause NO issues whatsoever. Would like to know why... _________________ Optware, the Right Way
Asus RT-AC68U
Asus RT-N66U
Asus RT-N10
Asus RT-N12
Asus RT-N16 x5
Asus WL520gU
Engenious ECB350
Linksys WRT600Nv1.1
Linksys WRT610Nv1
Linksys E2000
Netgear WNDR3300
SonicWall NSA220W
SonicWall TZ215W
SonicWall TZ205W
SonicWall TZ105W
This means that even with the newest builds, the wl500w will still randomly brick as long as people are enabling sshd and/or entering vpn keys... and perhaps other instances.
I don't agree with that logic of yours..
The CFE obviously has problems with parsing such a long key and I suspect that it is a buffer overflow problem when copying the key to the serial buffer.
You could test that theory by entering a 64 byte key, a 128 byte key, a 256 byte key and finally a 512 key and see at which key size the problem occurs.
But what you are doing here is something that never takes place, the CFE does not use vpn keys and does normally not touch the nvram.
It is only when you are doing a hard reset (reset button) that the CFE will populate the nvram with variables predefined in the CFE data area.
During normal operation, all nvram writes are handled by routines in dd-wrt.
No CFE routines are called from the kernel so if you are getting a crash caused by nvram writes then it is because of dd-wrt and not the CFE. _________________ Kernel panic: Aiee, killing interrupt handler!
Joined: 04 Nov 2006 Posts: 89 Location: The Dalles, Oregon USA
Posted: Fri Sep 25, 2009 4:26 Post subject:
Lom-
Thanks for the thoughts, i had asked about the interaction between cfe and the kernel before and not received any reply.
What would be some possible reasons that unsetting the offending variables from cfe and a bricked state and nvram commiting would fix the bricking problem (as chbm wrote in the very beginning of this thread)?
Does cfe not ever load or touch nvram during the boot process?
I have been chasing down different lengths of variable trying to find some combination that causes CFE to puke...