BUG - ddwrt locks up with high active connection count

Post new topic   Reply to topic    DD-WRT Forum Forum Index -> Broadcom SoC based Hardware
Goto page 1, 2  Next
Author Message
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Fri Mar 19, 2010 17:57    Post subject: BUG - ddwrt locks up with high active connection count Reply with quote
dd-wrt v24-sp1 mega build 10012 and v24-sp2 mega build 13064, have not checked others

in the presence of high connection counts for tcp and udp, ddwrt reliably locks up and does not transport packets. Locks out HTTP and telnet access. Repeatable easily.

root@DD-WRT:~# grep -c ^tcp /proc/net/ip_conntrack 2748
root@DD-WRT:~# grep -c ^udp /proc/net/ip_conntrack 864

cpu usage remains low but lockout occurs anyway when there are many connections only (per top).
when HTTP service is requested, cpu usage goes to 100% and this makes the problem worse.

This is only a problem where the connections are actually active. If they are just "remembered" by conntrack but not active then the lockout does not occur.

I am running a daemon on a host on the lan side of the router which is the target of large numbers of both tcp and udp connection and can turn them all on and off easily so this problem is easy to replicate.

Apparently conntrack does not shed excess connections gracefully when there are many that are actually active.

Contact me for more info on this.

Michael@insulin-pumpers.org
Sponsor
Skydiver
DD-WRT User


Joined: 23 Feb 2009
Posts: 298
Location: Germany

PostPosted: Fri Mar 19, 2010 18:20    Post subject: Reply with quote
Hi,

have you enabled QoS ?
I´ve seen something like this with QoS enabled.
QoS seems buggy in DD-WRT, if you can, leave it disabled.

_________________
Netgear WNR834B v2 - Eko build v24-sp2 15943M mini NEWD K2.4 (running MINIUPNPD)
Tested with BS 15943 mini build with my 32/1 line over wireless:
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Fri Mar 19, 2010 18:26    Post subject: Reply with quote
No, Qos is not enabled.

I'm guessing that this lockup issue is probably due to the hash size bucket ratio is default 8:1 and should be lowered for this kind of requirement by a factor of 2 to 4 to increase thru-put performance in the presence of a large number of connections since the inner loop of the hash table implementation in linux gets unhappy if the hash table is too small. Researching w/google indicates that active connections above 2000 or so can create this issue when the default hash bucket ratio is used.

i.e. __ip_conntrack_find()

can someone tell me how to change a parameter at boot time for the conntrack modules to change the number of buckets for the connection hash table?

i.e. options ip_conntrack hashsize=2048

This would lower the ratio to 2:1 for 4096 connection-max.
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Fri Mar 19, 2010 18:43    Post subject: Reply with quote
pretty sure the problem is the hash buckets

syslog says:

ip_conntrack version 2.1 (512 buckets, 4096 max)

since my last post I've found numerous complaints on the net of through-put being compromised with active connections above 1300 - 2000 on dd-wrt. I would like to try this with more hash buckets. Will see if I can figure out how to load the contract module with option of larger hash table.
phuzi0n
DD-WRT Guru


Joined: 10 Oct 2006
Posts: 10143

PostPosted: Fri Mar 19, 2010 22:23    Post subject: Reply with quote
With kernel 2.4 based builds you can only change it with a compilation option. With kernel 2.6 based builds it can be changed dynamically with the appropriate /proc entries.

What hardware are you using?

_________________
Read the forum announcements thoroughly! Be cautious if you're inexperienced.
Available for paid consulting. (Don't PM about complicated setups otherwise)
Looking for bricks and spare routers to expand my collection. (not interested in G spec models)
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Fri Mar 19, 2010 22:45    Post subject: Reply with quote
broadcom
wrt54G-TM

It has the 2.4.37 kernel
I see that it has no module for conntrack and am downloading the source.... seems kind of the hard way and I'm not sure if my tools are current enough to do the build.

Is there a more recent build that uses the 2.6 kernel?
socal87
DD-WRT Guru


Joined: 30 Jun 2009
Posts: 944
Location: Here

PostPosted: Fri Mar 19, 2010 23:03    Post subject: Reply with quote
What's the time on your connection timeout? Try decreasing it to 300. The TM doesn't have a whole lot of RAM, so you're probably flooding memory with old connections that don't get flushed...
_________________
Click here for Eko beta
Click here for Brainslayer beta

>>>PEACOCK THREAD!<<<

I do NOT offer personal assistance.
Please do not PM me for help.

phuzi0n
DD-WRT Guru


Joined: 10 Oct 2006
Posts: 10143

PostPosted: Fri Mar 19, 2010 23:30    Post subject: Reply with quote
You can run k2.6 builds: http://www.dd-wrt.com/phpBB2/viewtopic.php?t=63757

ftp://dd-wrt.com/others/eko/BrainSlayer-V24-preSP2/02-23-10-r13972/broadcom_K26/
ftp://dd-wrt.com/others/eko/V24-K26/

_________________
Read the forum announcements thoroughly! Be cautious if you're inexperienced.
Available for paid consulting. (Don't PM about complicated setups otherwise)
Looking for bricks and spare routers to expand my collection. (not interested in G spec models)
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Sat Mar 20, 2010 0:17    Post subject: Reply with quote
re: tcp timer, it is already set to 300. This router is handling the front end for a busy mail and dns server. The number of ACTIVE connections can easily reach 4k and hovers around 3k most of the time. With some of the tools we use, that number can be driven to over 10k some of the time which defintely crashes ddwrt. The current problem is that it does not handle moderate connection loads above 1500 very well (not old connections, real active ones). dd-wrt is quite happy keeping track of connections that are not in use right up to it's max. However if they are active connections, it is not happy.

Ram usage is fine, less than 60% according to top and the web info page.


RE 2.6 kernel. Can this be loaded directly into the router or does the tiny bootloader need to be put in first?

I've got this thing configured with a public DMZ /28 and a private 192/24 on separate vlan's
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Sat Mar 20, 2010 1:36    Post subject: Reply with quote
Ok, have the 13972 build running with 2.6

looking around in proc, the conntrack_buckets parameters are marked as read only by the kernel. There do not appear to be any "hash" parameters at all. I seem to remember that this can be set somewhere but I can not find it.

Clue???
phuzi0n
DD-WRT Guru


Joined: 10 Oct 2006
Posts: 10143

PostPosted: Sat Mar 20, 2010 2:24    Post subject: Reply with quote
It currently seems to be down but here it is along with a google cache link.

http://www.wallfire.org/misc/netfilter_conntrack_perf.txt
http://74.125.155.132/search?q=cache:qHgVZIcaz8sJ:www.wallfire.org/misc/netfilter_conntrack_perf.txt

_________________
Read the forum announcements thoroughly! Be cautious if you're inexperienced.
Available for paid consulting. (Don't PM about complicated setups otherwise)
Looking for bricks and spare routers to expand my collection. (not interested in G spec models)
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Tue Mar 23, 2010 22:35    Post subject: Reply with quote
I've loaded 13575 and 13972 both of these have issues with configuring a DMZ using the vlan's. Something is broken in the build or the 2.6 kernel routing so the new vlan assignment will not route.

Perhaps this needs a new thread. In any event, I am unable to get to the hashtable size issue because I can not configure the router to work the same way as build 13064 with the 2.4.37 kernel (older builds work with the vlan DMZ OK also).

Re: howto here:

http://www.bizsystems.com/howto/DD-WRT_DMZ-with-static-public-subnet.html

What system were those two builds compiled on? Perhaps I can take a look at the code and config and figure out what changes have been made and make the vlan code work properly. I'm mid-level at this sort of thing, no guru so some help would be appreciated.
phuzi0n
DD-WRT Guru


Joined: 10 Oct 2006
Posts: 10143

PostPosted: Tue Mar 23, 2010 23:34    Post subject: Reply with quote
Wow that guide makes my head want to explode. I'm not sure where to begin explaining what's wrong with it. I'm not even sure what, if anything, it really accomplished for you in the past. From what I can see it's telling you to move 2 LAN ports into the WAN VLAN (vlan1), but then it has you assign a "DMZ subnet" to br0:1 which is a virtual interface for the LAN bridge, and finally it has a bunch of iptables rules being appended to the INPUT chain when they appear to be meant for the FORWARD chain.

So I presume you either want to move some ports to the WAN VLAN so that they can have public IP's and have direct switched access to the WAN, or you want to segment some hosts into another LAN subnet? Please be as through as you can to explain exactly what you want to do.

_________________
Read the forum announcements thoroughly! Be cautious if you're inexperienced.
Available for paid consulting. (Don't PM about complicated setups otherwise)
Looking for bricks and spare routers to expand my collection. (not interested in G spec models)
miker
DD-WRT User


Joined: 17 Mar 2009
Posts: 50

PostPosted: Fri Mar 26, 2010 1:38    Post subject: Reply with quote
Sure, I've been using 3 routers configure this way for a year or two now.

The ISP's provides a dynamic connection on the wan side. They route a public CIDR block that is unrelated to the WAN address over the connection. This is the IP space that is routed to the DMZ. In addition, the router provides private IP space so that office staff can reach the internet and the DMZ but the DMZ can not reach the private space.

i.e. example wan gate 123.234.111.1
example routed wan side 123.234.111.122
example routed CIDR 66.77.88.0/28
example lan side IP 66.77.88.1
example DMZ usable 66.77.88.2 - 14
example lan side CIDR 192.168.0/24

public facing servers are on the DMZ side of the router with IP addresses in the 2-14 range


Last edited by miker on Fri Apr 23, 2010 1:21; edited 2 times in total
phuzi0n
DD-WRT Guru


Joined: 10 Oct 2006
Posts: 10143

PostPosted: Fri Mar 26, 2010 3:11    Post subject: Reply with quote
Alright then, start out by properly segmenting the physical medium by moving however many ports you want for the DMZ into VLAN2 using the UI. After you save your VLAN assignments you must reboot for them to take affect. Then go to the basic setup->networking page, make sure VLAN2 is unbridged, and assign it an IP / netmask that matches the DMZ subnet. Then to prevent it from being NAT'd and to lock it down add a few iptables commands to your firewall script on the admin->commands page.

# disable NAT for DMZ
iptables -t nat -I POSTROUTING -s 66.77.88.0/28 -j ACCEPT
# block DMZ->LAN but not LAN->DMZ
iptables -I FORWARD -i vlan2 -o br0 -j DROP
iptables -I FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT

_________________
Read the forum announcements thoroughly! Be cautious if you're inexperienced.
Available for paid consulting. (Don't PM about complicated setups otherwise)
Looking for bricks and spare routers to expand my collection. (not interested in G spec models)
Goto page 1, 2  Next Display posts from previous:    Page 1 of 2
Post new topic   Reply to topic    DD-WRT Forum Forum Index -> Broadcom SoC based Hardware All times are GMT

Navigation

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum