Non booting WL-500W and fix

Post new topic   Reply to topic    DD-WRT Forum Index -> Broadcom SoC based Hardware
Goto page 1, 2  Next
Author Message
chbm
DD-WRT Novice


Joined: 30 Jul 2009
Posts: 2

PostPosted: Thu Jul 30, 2009 21:25    Post subject: Non booting WL-500W and fix Reply with quote
Hi, my WL-500W wasn't booting and rescue mode wouldn't work. I didn't find anything helpfull so I'm documenting my case here.

The symptoms were pretty strange. The router wouldn't boot. During the boot process it would answer to 2 pings and then stop. It went into rescue mode with the rescue button but when I started tftp it would just reboot. This is with DD-WRT v24 (05/24/0Cool mega (pretty ancient, it stoped working and I shelved it).

I got around to attaching a serial console and it would go through CFE, load the kernel and then just stop after "Closing network". When I interrupted the boot process any kind of network traffic, pings or tftp included, would make CFE stop responding. Rescue mode would throw an exception after the first tftpt packet.

It was getting pretty dire till I started playing around with CFE and noticed nvram show would crash it just after the sshd_dss_host_key line. Same thing for sshd_rsa_host_key. Dunno how they made it into the nvram or why they crashed it but after a
Code:

nvram unset sshd_dss_host_key
nvram unset sshd_rsa_host_key
nvram commit

nvram show started working again and the router booted right up.
Sponsor
barryware
DD-WRT Guru


Joined: 26 Jan 2008
Posts: 13049
Location: Behind The Reset Button

PostPosted: Thu Jul 30, 2009 21:29    Post subject: Reply with quote
Nice job.. Good info.

You on a linux box? I can't bust into the boot with a windows box using HT or putty.

_________________
[Moderator Deleted] Shocked
Murrkf
DD-WRT Guru


Joined: 22 Sep 2008
Posts: 12675

PostPosted: Thu Jul 30, 2009 21:32    Post subject: Reply with quote
Does that code provide any insight into why the WL500W routers, in particular, brick with the late v24/SP1 builds like no other routers do?
_________________
SIG:
I'm trying to teach you to fish, not give you a fish. If you just want a fish, wait for a fisherman who hands them out. I'm more of a fishing instructor.
LOM: "If you show that you have not bothered to read the forum announcements or to follow the advices in them then the level of help available for you will drop substantially, also known as Murrkf's law.."
redhawk0
DD-WRT Guru


Joined: 04 Jan 2007
Posts: 11564
Location: Wherever the wind blows- North America

PostPosted: Thu Jul 30, 2009 21:53    Post subject: Reply with quote
Ditto on the "Nice job" This is great info....thanx for sharing it.

I have added a link to this thread in the wiki for this model.



barryware wrote:
You on a linux box? I can't bust into the boot with a windows box using HT or putty.


The Cntl-C window to break into the CFE is very narrow...its only about 250mS right at power on.

Start with a rapid-fire Cntl-C as you plug it in.

some models are good about giving you a wide window (like 2 seconds)...then others are so quick you have to try 15 times to get it right....this model is one of the later. (as well as many of the Netgear units)

redhawk

_________________
The only stupid question....is the unasked one.
thenextdon13
DD-WRT User


Joined: 04 Nov 2006
Posts: 89
Location: The Dalles, Oregon USA

PostPosted: Thu Sep 24, 2009 6:54    Post subject: Reply with quote
chbm--

What exception was cfe throwing when it received traffic?
Was it panic:out of memory?

I have been documenting my testing of nvram show problems in this thread:
http://dd-wrt.com/phpBB2/viewtopic.php?t=58413&highlight=

I have tracked the CFE 'nvram show' crashing down to all of the ssh and vpn keys and certs used in my config--

nvram show would crash with all of them except the openvpn_dh

The fact that you debricked your router by simply removing the two ssh keys indicates that cfe is having problems reading/parsing them during boot time in certain instances for some reason.

This causes a problem because most folks that want a hefty router such as the wl500w want to enable things such as VPN and SSH.

Initially I thought the problem was associated with the endline or variable terminating characters because nvram show would always display the entire variable, including its concluding pattern (i.e. end certificate) before crashing.

However, if you simply remove the data out of the variable and assign aaaa to it, nvram show works and completes successfully.

It seems that a closer look into the cfe's parsing method is needed.

There are 2 issues here it seems:
1. While normally parsing via nvram show, there are issues raised with certain patterns or sets of text. What patterns are they and why does it break the code?

2. In some instances during the sequence that loads nvram at boot time these patterns appear to crash cfe as well. What causes this? Is it a particular memory location? The first components of the next variable?

So, either way there is a parsing error in the cfe code, or the parsing algorithm in the code wasn't made to separate strings like those that are being used.

Why does it only sometimes strike in such a manner to brick the router?

How can it be fixed?

If we can identify the parsing error then perhaps we can get the dd-wrt developers to change the algorithm used to save the keys-- but that seems like a _lot_ of work for one router model.

The other option is to fix the bug in cfe and recompile, or compile a newer version of CFE for the router... which the general public will not want to do, and i'm not sure if you can do at all without voiding warranty.

Thoughts/Suggestions?
thenextdon13
DD-WRT User


Joined: 04 Nov 2006
Posts: 89
Location: The Dalles, Oregon USA

PostPosted: Thu Sep 24, 2009 6:58    Post subject: Reply with quote
chbm indicates in here:
http://www.dd-wrt.com/phpBB2/viewtopic.php?t=55736&highlight=

that he simply unset the variables causing the nvram show crash, then nvram commited- and that fixed his bricking problem...

This indicates the two are linked. Not sure how i missed this on the first time through the wiki...
t3chm@n
DD-WRT Novice


Joined: 02 Oct 2008
Posts: 20
Location: New York

PostPosted: Thu Sep 24, 2009 14:51    Post subject: Re: Non booting WL-500W and fix Reply with quote
chbm wrote:
Hi, my WL-500W wasn't booting and rescue mode wouldn't work. I didn't find anything helpfull so I'm documenting my case here.

The symptoms were pretty strange. The router wouldn't boot. During the boot process it would answer to 2 pings and then stop. It went into rescue mode with the rescue button but when I started tftp it would just reboot. This is with DD-WRT v24 (05/24/0Cool mega (pretty ancient, it stoped working and I shelved it).

I got around to attaching a serial console and it would go through CFE, load the kernel and then just stop after "Closing network". When I interrupted the boot process any kind of network traffic, pings or tftp included, would make CFE stop responding. Rescue mode would throw an exception after the first tftpt packet.

It was getting pretty dire till I started playing around with CFE and noticed nvram show would crash it just after the sshd_dss_host_key line. Same thing for sshd_rsa_host_key. Dunno how they made it into the nvram or why they crashed it but after a
Code:

nvram unset sshd_dss_host_key
nvram unset sshd_rsa_host_key
nvram commit

nvram show started working again and the router booted right up.


This is AWSOME info!

I am sure this may be the case with many people that use the WL500W.

I also was using SSH and VPN when my router stopped working after a firmware upgrade.
I simply assumed that my issue was caused because I did not restore to factory defaults during the upgrade. Factory defaults would wipe out my ssh / VPN configurations.

It may be prudent to make a note in the WIKI that IF people so use SSH they need to wipe out that info before and update.

Just my 2 cents.

Kind regards,

_________________
Kind Regards,

t3chm@n
thenextdon13
DD-WRT User


Joined: 04 Nov 2006
Posts: 89
Location: The Dalles, Oregon USA

PostPosted: Thu Sep 24, 2009 16:14    Post subject: Reply with quote
The problem is that sometimes bricking occurs even if an upgrade has not occured- sometimes they will happen randomly (perhaps during traffic graph nvram commit) and sometimes on reboot.

In another thread someone has recommended trying a build newer than 12533.

I can verify newer versions to see if nvram show at cfe crashes cfe with keys, and report back in WIKI or other.

Hopefully will find out how to get newer build and will be able to test it soon, will post results.
autobot
DD-WRT Guru


Joined: 07 May 2009
Posts: 1596

PostPosted: Thu Sep 24, 2009 16:24    Post subject: Reply with quote
thenextdon13 wrote:

Hopefully will find out how to get newer build and will be able to test it soon, will post results.


Click one of the links in my signature, grab the latest and give it a go.

_________________
Eko Builds

BrainSlayer Builds

DD-WRT Changelog RSS Feed
thenextdon13
DD-WRT User


Joined: 04 Nov 2006
Posts: 89
Location: The Dalles, Oregon USA

PostPosted: Thu Sep 24, 2009 16:46    Post subject: Reply with quote
Hmm I don't see a Brainslayer build in your sig that would work (all of the newest dirs exclude asus or generic)

Now I see that the EKO builds must be where 12533 comes from..


There are three builds newer than 12533
Code:

     svn12533   <DIR>   16 items   27-07-09   
   svn12548   <DIR>   51 items   22-07-09   
   svn12714   <DIR>   29 items   23-08-09   
   svn12774   <DIR>   36 items   01-09-09   


Should i be grabbing VINT, NEWD or NEWD-2, or how do i find out? (i see that 12533 is only VINT.. that would indicate that maybe i need to stay with VINT)

Hmm i think.. this looks like the newest build of VINT driver and mega..
http://www.dd-wrt.com/dd-wrtv2/downloads/others/eko/V24_TNG/svn12774/dd-wrt.v24-12774_VINT_mega.bin

Please confirm this is correct build for asus wl500w, and i will test it tonight...

thanks
LOM
DD-WRT Guru


Joined: 28 Dec 2008
Posts: 7647

PostPosted: Thu Sep 24, 2009 16:52    Post subject: Reply with quote
Your old build could as well be a Brainslayer build, you can see that by clicking on the version number in the logo field of the routers gui.

Here is Brainslayers 12533, it will help you navigate in his recent builds:

http://www.dd-wrt.com/dd-wrtv2/down.php?path=downloads/others/eko/BrainSlayer-V24-preSP2/07-21-09-r12533/broadcom/

You don't want a VINT build, it should be a NEWD.
(Eko is the only one also doing VINT builds, all Brainslayer builds are NEWD)

_________________
Kernel panic: Aiee, killing interrupt handler!
thenextdon13
DD-WRT User


Joined: 04 Nov 2006
Posts: 89
Location: The Dalles, Oregon USA

PostPosted: Fri Sep 25, 2009 2:42    Post subject: Reply with quote
Thanks for the help!

Have now given BS 12966 a try from here:

http://www.dd-wrt.com/dd-wrtv2/downloads/others/eko/BrainSlayer-V24-preSP2/09-24-09-r12966/broadcom/dd-wrt.v24_mega_generic.bin

Unfortunately the cfe still crashes while parsing keys during nvram show :(

First, here is the removal of keys from ram copy of nvram and run nvram show

Code:

Null Rescue Flag.
Reading :: TFTP Server.
Failed.: Interrupted
CFE> ^C
CFE> ^C
CFE> nvram unset sshd_dss_host_key
*** command status = 0
CFE> nvram unset sshd_rsa_host_key
*** command status = 0
CFE> nvram show
...
...
oet1_fragment=0
oet5_rem=192.168.90.1
size: 24577 bytes (8191 left)
*** command status = 0
CFE>



Works fine.

Now here is a clean boot of nvram- i am including the key in this posting as i will change it.
Code:

Reading :: TFTP Server.
Failed.: Interrupted
CFE> ^C
CFE> ^C
CFE> ^C
CFE> nvram show
...
...
sshd_dss_host_key=-----BEGIN DSA PRIVATE KEY-----
MIIBuwIBAAKBgQC09J8yMD3C9HsuoDSSAPZ0r8I40ETUkgNPARhlRbdpy2QrubLD
ILeMSBuRkDqe56l/cI73aEjsEOsYCRs71mOVj+kx87u/KEn2BD/zhNAwux9Lr/xA
ho6WlklB90NSGXTgndWsLhzmx1Yw84yeduusLI2Lwl5rPDJgkwXE4zr2IwIVALGP
tTklXVKMFIfgBrcf3eb8RHlhAoGBAIBxfDbzNxh+x+ecCRx+Xv/yyTHoME9FvU23
ByeJBS5V5Y7ge7i/NPDTP1M51gF57lbKlOQFnvimOKgZVv/Me/l0BsdjsjFahxKl
92yCOConw2uWiTx8a95rvVQkDHEA0a3QCbLoRodKGlD86M662YtsPTpXwwdGVSgV
aNcsCiinAoGAZ0dcY+62zbXSvFjGQyKicHqjhSri6Pk+En5XHTEn8eR8iYEJ/s+9
TzSWZ5aIZ7cgpiBZlxO/wYLhkRLggXPgKopTeQmRI5MdilK3k6LFr31zhYV5Fatm
4LQ99I5MHEF8xa2tzSs+O4vHgF5Oqs+M5PHywFrBKEkjKw5jWeebBowCFDg/aIrE
ek4XNBA+2CzjUnptnWIK
-----END DSA PRIVATE KEY-----

[MANUAL REBOOT OF ROUTER HERE AT HANG]



CFE version 1.0.37 for BCM947XX (32bit,SP,LE)
Build Date: |  7� 26 16:41:16 CST 2007 (root@localhost.localdomain)
Copyright (C) 2000,2001,2002,2003 Broadcom Corporation.

Initializing Arena
Initializing Devices.
et0: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 3.90.23.0
rndis0: Broadcom USB RNDIS Network Adapter (P-t-P)
et1: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 3.90.23.0
CPU type 0x29006: 264MHz
Total memory: 33554432 KBytes

Total memory used by CFE:  0x80800000 - 0x8089B840 (636992)
Initialized Data:          0x808319B0 - 0x80834090 (9952)
BSS Area:                  0x80834090 - 0x80835840 (6064)
Local Heap:                0x80835840 - 0x80899840 (409600)
Stack Area:                0x80899840 - 0x8089B840 (8192)
Text (code) segment:       0x80800000 - 0x808319B0 (203184)
Boot area (physical):      0x0089C000 - 0x008DC000
Relocation Factor:         I:00000000 - D:00000000

Device eth0:  hwaddr 00-22-15-89-D5-74, ipaddr 192.168.11.254, mask 255.255.255.0
        gateway not set, nameserver not set
Automatic startup canceled via Ctrl-C
CFE> ^C



Oh, and for the fun part, here's a clean boot and then using nvram get function to grab (a different) sshd_dss_host_key
Code:

CFE> nvram get sshd_dss_host_key
-----BEGIN DSA PRIVATE KEY-----
MIIBugIBAAKBgQCZiAw2ldieiSAXW3o0Ei3qnnfcz1R0JM9S1shU3roOqqlmTA9o
F0v+MdhEJAaLcaDOEO6Q8A6Uasv1ldKwJLneTU7vNtC60jNtDCBq8Z+1Piily2p1
5LTP/Kj+oZGzzFRt2WoNnHFnoz9PpVznb34GnVeaZyxLIsyCbOTk4cMFnwIVAKDx
3TFJAY+IzrzUBKy7BQiBFqZPAoGAcrIqBiy76d6f5V4/6v6+xttQvRMAwbT8pxd/
OB/GELa3GJx9Px4fumNvfK+CU8T/KZnBFTN9VEVmXy8x0wnzuYwUcNDYdCxMLirv
YYeipGmP5R/01FybmwJBnIaJNS7/Ia7ZFJNfahJ5ua5hyenX3nfTM8gfmBISiQfM
VZzrpmYCgYBCK/YrHQTTbM/XZJ+a3rfl/z5E1N54LaCMwB105d8vyxJpsCxQcPSv
jlChSyq1b7ja/RSe8lFN4iiUNyKtjT3uXoWXPkV+uUPk9e18CSPZH16HGqiDuaN5
WYT0s0WNON1iHRkq7Uhw9xv2yHXoUw1gmE5S1ARyP/20InF3AnhmAQIUZYHpyEqr
J0b4uo8eGVuzdEi+AS4=
-----END DSA PRIVATE KEY-----

**Exception 8: EPC=58753354, Cause=00000008 (TLBMissRd)
                RA=58753354, VAddr=58753354

        0  ($00) = 00000000     AT ($01) = 80830000
        v0 ($02) = 0000029D     v1 ($03) = 00000000
        a0 ($04) = 00000000     a1 ($05) = 8089B190
        a2 ($06) = 80835888     a3 ($07) = 00000000
        t0 ($08) = 8089B4FD     t1 ($09) = 80835858
        t2 ($10) = 00000000     t3 ($11) = 00000005
        t4 ($12) = B8000000     t5 ($13) = 00000000
        t6 ($14) = 00000000     t7 ($15) = 00000000
        s0 ($16) = 6A744B79     s1 ($17) = 8089B4A0
        s2 ($18) = FFFFFFFD     s3 ($19) = 8089B6C8
        s4 ($20) = 00000080     s5 ($21) = 00000000
        s6 ($22) = 00000000     s7 ($23) = 00000001
        t8 ($24) = 10000000     t9 ($25) = 00000000
        k0 ($26) = 795A6165     k1 ($27) = 4F624379
        gp ($28) = 808399B0     sp ($29) = 8089B468
        fp ($30) = 00000000     ra ($31) = 58753354



CFE version 1.0.37 for BCM947XX (32bit,SP,LE)
Build Date: |  7� 26 16:41:16 CST 2007 (root@localhost.localdomain)
Copyright (C) 2000,2001,2002,2003 Broadcom Corporation.

Initializing Arena
Initializing Devices.
et0: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 3.90.23.0
rndis0: Broadcom USB RNDIS Network Adapter (P-t-P)
et1: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 3.90.23.0
CPU type 0x29006: 264MHz
Total memory: 33554432 KBytes


Totally crashes and reboots the router.

This means that even with the newest builds, the wl500w will still randomly brick as long as people are enabling sshd and/or entering vpn keys... and perhaps other instances.

The key here is to figure out what is causing the crash... i will do some more investigation into the pattern and perhaps source code.

Other comments/suggestions?[/code]
Masterman
DD-WRT Guru


Joined: 24 Aug 2009
Posts: 2070
Location: South Florida

PostPosted: Fri Sep 25, 2009 3:38    Post subject: Reply with quote
Wow! Some awesome work going on here. Keep it up!

Personally I don't use VPN so therefore haven't experienced any of the problems evident in your discoveries...


What I can say though, is that every Brainslayer build I have tested on my WL500W has caused problems. Eko's builds (NEWD-2) cause NO issues whatsoever. Would like to know why...

_________________
Optware, the Right Way
Asus RT-AC68U
Asus RT-N66U
Asus RT-N10
Asus RT-N12
Asus RT-N16 x5
Asus WL520gU
Engenious ECB350
Linksys WRT600Nv1.1
Linksys WRT610Nv1
Linksys E2000
Netgear WNDR3300
SonicWall NSA220W
SonicWall TZ215W
SonicWall TZ205W
SonicWall TZ105W
LOM
DD-WRT Guru


Joined: 28 Dec 2008
Posts: 7647

PostPosted: Fri Sep 25, 2009 3:40    Post subject: Reply with quote
thenextdon13 wrote:

This means that even with the newest builds, the wl500w will still randomly brick as long as people are enabling sshd and/or entering vpn keys... and perhaps other instances.



I don't agree with that logic of yours..
The CFE obviously has problems with parsing such a long key and I suspect that it is a buffer overflow problem when copying the key to the serial buffer.

You could test that theory by entering a 64 byte key, a 128 byte key, a 256 byte key and finally a 512 key and see at which key size the problem occurs.

But what you are doing here is something that never takes place, the CFE does not use vpn keys and does normally not touch the nvram.
It is only when you are doing a hard reset (reset button) that the CFE will populate the nvram with variables predefined in the CFE data area.

During normal operation, all nvram writes are handled by routines in dd-wrt.
No CFE routines are called from the kernel so if you are getting a crash caused by nvram writes then it is because of dd-wrt and not the CFE.

_________________
Kernel panic: Aiee, killing interrupt handler!
thenextdon13
DD-WRT User


Joined: 04 Nov 2006
Posts: 89
Location: The Dalles, Oregon USA

PostPosted: Fri Sep 25, 2009 4:26    Post subject: Reply with quote
Lom-

Thanks for the thoughts, i had asked about the interaction between cfe and the kernel before and not received any reply.

What would be some possible reasons that unsetting the offending variables from cfe and a bricked state and nvram commiting would fix the bricking problem (as chbm wrote in the very beginning of this thread)?

Does cfe not ever load or touch nvram during the boot process?

I have been chasing down different lengths of variable trying to find some combination that causes CFE to puke...

Thanks for the comments and info
Goto page 1, 2  Next Display posts from previous:    Page 1 of 2
Post new topic   Reply to topic    DD-WRT Forum Index -> Broadcom SoC based Hardware All times are GMT

Navigation

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum