From owner-freebsd-net@FreeBSD.ORG  Wed Aug  3 03:49:36 2005
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2D3A916A41F
	for <freebsd-net@freebsd.org>; Wed,  3 Aug 2005 03:49:36 +0000 (GMT)
	(envelope-from dave-sender-1932b5@seddon.ca)
Received: from seddon.ca (seddon.ca [203.209.212.18])
	by mx1.FreeBSD.org (Postfix) with SMTP id 5B4E943D48
	for <freebsd-net@freebsd.org>; Wed,  3 Aug 2005 03:49:35 +0000 (GMT)
	(envelope-from dave-sender-1932b5@seddon.ca)
Received: (qmail 95463 invoked by uid 89); 3 Aug 2005 03:49:34 -0000
Received: by seddon.ca (tmda-sendmail, from uid 89);
	Wed, 03 Aug 2005 13:49:33 +1000 (EST)
To: freebsd-net@freebsd.org
Date: Wed, 03 Aug 2005 13:49:32 +1000
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: quoted-printable
From: Dave+Seddon <dave-sender-1932b5@seddon.ca>
Message-ID: <1123040973.95445.TMDA@seddon.ca>
X-Delivery-Agent: TMDA/1.0.3 (Seattle Slew)
Subject: running out of mbufs?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2005 03:49:36 -0000

Greetings, 

I'm trying to do some performance testing of a content filtering system, =
so 
I'm trying to get very high HTTP throughput.  I've got 4 * HP DL380s with=
 
3.4G Xeon processors (hyper threading) and 1 G RAM, 2 onboard BGEs, and 2=
 * 
2 port EM.  Using FreeBSD5.4-stable (as of 2005/08/02) and device polling=
, 
I've configured a large number (246) VLAN interfaces on two machines, and=
 
have apache on one box and siege on the other.  Using 'siege -f 
/home/my_big_list_of_urls -c 50 --internet' one host does a large number =
of 
request from the other machine.  I've been trying to tune for maximum 
performance and have been using lots of examples for /etc/sysctl.conf and=
 so 
on from the web.  Adjusting these settings and running the siege, I've fo=
und 
the apache server completely loses network connectivity when device polli=
ng 
is enabled.  I've adjusted the HZ lots and found the system survives the 
longest set a 15000 (yes it seems very large doesn't it).  The problem no=
w 
seems to be that I'm running out of mbufs: 

 --------------------------------------
4294264419 mbufs in use
4294866740/2147483647 mbuf clusters in use (current/max)
0/3/6656 sfbufs in use (current/peak/max)
3817472 KBytes allocated to network
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
 --------------------------------------
host228# cat kern.polling
kern.polling.burst: 671
kern.polling.each_burst: 100
kern.polling.burst_max: 1000
kern.polling.idle_poll: 0
kern.polling.poll_in_trap: 0
kern.polling.user_frac: 70
kern.polling.reg_frac: 40
kern.polling.short_ticks: 3523
kern.polling.lost_polls: 49996588
kern.polling.pending_polls: 1
kern.polling.residual_burst: 0
kern.polling.handlers: 2
kern.polling.enable: 1
kern.polling.phase: 0
kern.polling.suspect: 1768262
kern.polling.stalled: 9
kern.polling.idlepoll_sleeping: 1
 ------------------------------------- 

For some reason, the 'current' can be WAAAY higher than the 'max' which 
seems very odd.  I've tried putting the 'max' right up to 5 billion, howe=
ver 
it only goes to 2.1 billion. 

How should I proceed further?
How come the box loses all connectivity, rather than just some TCP stream=
s 
failing?  Why doesn't the network recover when I stop the siege?
Why does kern.polling.burst_max only go to 1000 when I try setting it to 
1500? 


Settings:
 ----------------------------------------------------------
host228# sysctl kern.polling
kern.polling.burst: 684
kern.polling.each_burst: 100
kern.polling.burst_max: 1000
kern.polling.idle_poll: 0
kern.polling.poll_in_trap: 0
kern.polling.user_frac: 70
kern.polling.reg_frac: 40
kern.polling.short_ticks: 97
kern.polling.lost_polls: 8390
kern.polling.pending_polls: 0
kern.polling.residual_burst: 0
kern.polling.handlers: 2
kern.polling.enable: 1
kern.polling.phase: 0
kern.polling.suspect: 3642
kern.polling.stalled: 0
kern.polling.idlepoll_sleeping: 1
 ------------------------------------------------------------
host228# cat /etc/sysctl.conf
#kern.polling.enable=3D1
kern.polling.enable=3D1 

#kern.polling.user_frac: 50
#kern.polling.reg_frac: 20
kern.polling.user_frac=3D70
kern.polling.reg_frac=3D40 

#kern.polling.burst: 5
#kern.polling.each_burst: 5
#kern.polling.burst_max: 150  #default for 100MB/s 

kern.polling.burst=3D1000
kern.polling.each_burst=3D100
kern.polling.burst_max=3D2000 

#example I found on the web
#kern.polling.burst: 1000
#kern.polling.each_burst: 80
#kern.polling.burst_max: 1000 

#net.inet.tcp.sendspace: 32768
#net.inet.tcp.recvspace: 65536
net.inet.tcp.sendspace=3D1024000
net.inet.tcp.recvspace=3D1024000 

#sysctl net.inet.tcp.rfc1323=3D1  Activate window scaling and timestamp 
options according to RFC 1323.
net.inet.tcp.rfc1323=3D1
net.inet.tcp.delayed_ack=3D0 

#kern.ipc.maxsockbuf: 262144
kern.ipc.maxsockbuf=3D20480000 

#The kern.ipc.somaxconn sysctl variable limits the size of the listen que=
ue 
for accepting new TCP connections. The default value of 128 is typically =
too 
low for robust handling of new connections in a heavily loaded web server=
 
environment.
#kern.ipc.somaxconn: 128
kern.ipc.somaxconn=3D1024 

#The TCP Bandwidth Delay Product Limiting is similar to TCP/Vegas in NetB=
SD. 
It can be enabled by setting net.inet.tcp.inflight.enable sysctl variable=
 to 
1. The system will attempt to calculate the bandwidth delay product for e=
ach 
connection and limit the amount of data queued to the network to just the=
 
amount required to maintain optimum throughput.
#This feature is useful if you are serving data over modems, Gigabit 
Ethernet, or even high speed WAN links (or any other link with a high 
bandwidth delay product), especially if you are also using window scaling=
 or 
have configured a large send window. If you enable this option, you shoul=
d 
also be sure to set net.inet.tcp.inflight.debug to 0 (disable debugging),=
 
and for production use setting net.inet.tcp.inflight.min to at least 6144=
 
may be beneficial. 

#these are the defaults
#net.inet.tcp.inflight.enable: 1
#net.inet.tcp.inflight.debug: 0
#net.inet.tcp.inflight.min: 6144
#net.inet.tcp.inflight.max: 1073725440
#net.inet.tcp.inflight.stab: 20 

#Disable entropy harvesting for ethernet devices and interrupts.  There a=
re 
optimizations present in 6.x that have not yet been backported that impro=
ve 
the overhead of entropy harvesting, but you can get the same benefits by 
disabling it.  In your environment, it's likely not needed. I hope to 
backport these changes in a couple of weeks to 5-STABLE.
kern.random.sys.harvest.ethernet=3D0
kern.random.sys.harvest.interrupt=3D0
 --------------------------------------------------
host228# sysctl -a | grep ipc | grep nm
kern.ipc.nmbclusters: 25600
host228# sysctl kern.ipc.nmbclusters=3D5000000000
kern.ipc.nmbclusters: 25600 -> 2147483647
host228# sysctl -a | grep ipc | grep nm
kern.ipc.nmbclusters: 2147483647
 -------------------------------------------------
host228# sysctl -a | grep hz
kern.clockrate: { hz =3D 15000, tick =3D 66, profhz =3D 1024, stathz =3D =
128 }
debug.psmhz: 20
 --------------------------------------------------
THE PHYSCIAL INTERFACES ONLY (I'm only using 1 interface per 2 port card,=
 
and only running performance tests on the em cards)
bge0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       options=3D1a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
       inet 192.168.1.228 netmask 0xffffff00 broadcast 192.168.1.255
       ether 00:12:79:cf:d0:bf
       media: Ethernet autoselect (1000baseTX <full-duplex>)
       status: active
bge1: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
       options=3D1a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
       ether 00:12:79:cf:d0:be
       media: Ethernet autoselect (none)
       status: no carrier
em0: flags=3D18843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,POLLING> mtu 15=
00
       options=3D4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:ab:3a
       media: Ethernet autoselect (1000baseTX <full-duplex>)
       status: active
em1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       options=3D4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:ab:3b
       media: Ethernet autoselect
       status: no carrier
em2: flags=3D18843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,POLLING> mtu 15=
00
       options=3D4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:b2:4c
       media: Ethernet autoselect (1000baseTX <full-duplex>)
       status: active
em3: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
       options=3D4b<RXCSUM,TXCSUM,VLAN_MTU,POLLING>
       ether 00:11:0a:56:b2:4d
       media: Ethernet autoselect
       status: no carrier
lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
       inet 127.0.0.1 netmask 0xff000000
 --------------------------------------- 

Regards,
Dave Seddon
das-keyword-net.6770cb@seddon.ca