From owner-freebsd-current@FreeBSD.ORG  Sat Nov 28 08:15:41 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A257A106566B;
	Sat, 28 Nov 2009 08:15:41 +0000 (UTC)
	(envelope-from ltning@anduin.net)
Received: from mail.anduin.net (mail.anduin.net [213.225.74.249])
	by mx1.freebsd.org (Postfix) with ESMTP id DF19D8FC1C;
	Sat, 28 Nov 2009 08:15:40 +0000 (UTC)
Received: from [212.62.248.150] (helo=[192.168.2.110])
	by mail.anduin.net with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69 (FreeBSD)) (envelope-from <ltning@anduin.net>)
	id 1NEI0i-000CxR-AE; Sat, 28 Nov 2009 08:46:28 +0100
From: =?iso-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>
Date: Sat, 28 Nov 2009 08:46:12 +0100
Message-Id: <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net>
To: freebsd-current@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1077)
X-Mailer: Apple Mail (2.1077)
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: weldon@excelsusphoto.com, Gavin Atkinson <gavin@FreeBSD.org>
Subject: Re: FreeBSD 8.0 - network stack crashes?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Nov 2009 08:15:41 -0000

Hi,

Gavin Atkinson wrote:
> On Tue, 2009-11-03 at 08:32 -0500, Weldon S Godfrey 3 wrote:
> >=20
> > If memory serves me right, sometime around Yesterday, Gavin Atkinson =
told me:
> >=20
> > Gavin, thank you A LOT for helping us with this, I have answered as =
much=20
> > as I can from the most recent crash below.  We did hit max mbufs.  =
It is=20
> > at 25Kclusters, which is the default.  I have upped it to 32K =
because a=20
> > rather old article mentioned that as the top end and I need to get =
into=20
> > work so I am not trying to do this with a remote console to go =
higher.  I=20
> > have already set it to reboot next with 64K clusters.  I already =
have kmem=20
> > maxed to what is bootable (or at least at one time) in 8.0, 4GB, how =
high=20
> > can I safely go?  This is a NFS server running ZFS with sustained 5 =
min=20
> > averages of 120-200Mb/s running as a store for a mail system.
> >=20
> > > Some things that would be useful:
> > >
> > > - Does "arp -da" fix things?
> >=20
> > no, it hangs like ssh, route add, etc
> >=20
> > > - What's the output of "netstat -m" while the networking is =
broken?
> > Tue Nov  3 07:02:11 CST 2009
> > 36971/2033/39004 mbufs in use (current/cache/total)
> > 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max)
> > 24314/731 mbuf+clusters out of packet secondary zone in use=20
> > (current/cache)
> > 0/35/35/12800 4k (page size) jumbo clusters in use=20
> > (current/cache/total/max)
> > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> > 58980K/2110K/61091K bytes allocated to network (current/cache/total)
> > 0/201276/90662 requests for mbufs denied =
(mbufs/clusters/mbuf+clusters)
> > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> > 0/0/0 sfbufs in use (current/peak/max)
> > 0 requests for sfbufs denied
> > 0 requests for sfbufs delayed
> > 0 requests for I/O initiated by sendfile
> > 0 calls to protocol drain routines
>=20
> OK, at least we've figured out what is going wrong then.  As a
> workaround to get the machine to stay up longer, you should be able to
> set kern.ipc.nmbclusters=3D256000 in /boot/loader.conf -but hopefully =
we
> can resolve this soon.

I'll chip in with a report of exactly the same situation, and I'm on =
8.0-RELEASE.
We've been struggling with this for some time, and latest yesterday the =
box was rebooted, and already last night it wedged again. We're at a =
whopping=20
  kern.ipc.nmbclusters: 524288
and I've just doubled it once more, which means we're allocating 2GB to =
networking..

Much like the original poster, we're seeing this on a amd64 storage =
server with a large ZFS array shared through NFS, and network interfaces =
are two em(4) combined in a lagg(4) interface (lacp). Using either of =
the two em interfaces without lagg shows the same problem, just lower =
performance..


> Firstly, what kernel was the above output from?  And what network card
> are you using?  In your initial post you mentioned testing both bce(4)
> and em(4) cards, be aware that em(4) had an issue that would cause
> exactly this issue, which was fixed with a commit on September 11th
> (r197093).  Make sure your kernel is from after that date if you are
> using em(4).  I guess it is also possible that bce(4) has the same
> issue, I'm not aware of any fixes to it recently.

We're on GENERIC .


> So, from here, I think the best thing would be to just use the em(4) =
NIC
> and an up-to-date kernel, and see if you can reproduce the issue.

em(4) and 8.0-RELEASE still shows this problem.


> How important is this machine?  If em(4) works, are you able to help
> debug the issues with the bce(4) driver?

We have no bce(4), but we have the problem on em(4) so can help debug =
there. The server is important, but making it stable is more important.. =
See below the sig for some debug info.


/Eirik

Output from sysctl dev.em.[0,1].debug=3D1 :

em0: Adapter hardware address =3D 0xffffff80003ac530=20
em0: CTRL =3D 0x140248 RCTL =3D 0x8002=20
em0: Packet buffer =3D Tx=3D20k Rx=3D12k=20
em0: Flow control watermarks high =3D 10240 low =3D 8740
em0: tx_int_delay =3D 66, tx_abs_int_delay =3D 66
em0: rx_int_delay =3D 32, rx_abs_int_delay =3D 66
em0: fifo workaround =3D 0, fifo_reset_count =3D 0
em0: hw tdh =3D 92, hw tdt =3D 92
em0: hw rdh =3D 225, hw rdt =3D 224
em0: Num Tx descriptors avail =3D 256
em0: Tx Descriptors not avail1 =3D 0
em0: Tx Descriptors not avail2 =3D 0
em0: Std mbuf failed =3D 0
em0: Std mbuf cluster failed =3D 11001
em0: Driver dropped packets =3D 0
em0: Driver tx dma failure in encap =3D 0
em1: Adapter hardware address =3D 0xffffff80003be530=20
em1: CTRL =3D 0x140248 RCTL =3D 0x8002=20
em1: Packet buffer =3D Tx=3D20k Rx=3D12k=20
em1: Flow control watermarks high =3D 10240 low =3D 8740
em1: tx_int_delay =3D 66, tx_abs_int_delay =3D 66
em1: rx_int_delay =3D 32, rx_abs_int_delay =3D 66
em1: fifo workaround =3D 0, fifo_reset_count =3D 0
em1: hw tdh =3D 165, hw tdt =3D 165
em1: hw rdh =3D 94, hw rdt =3D 93
em1: Num Tx descriptors avail =3D 256
em1: Tx Descriptors not avail1 =3D 0
em1: Tx Descriptors not avail2 =3D 0
em1: Std mbuf failed =3D 0
em1: Std mbuf cluster failed =3D 17765
em1: Driver dropped packets =3D 0
em1: Driver tx dma failure in encap =3D 0


Output from netstat -m (note that I just doubled the mbuf cluster count, =
thus max is > total and the box currently works:

544916/3604/548520 mbufs in use (current/cache/total)
543903/3041/546944/1048576 mbuf clusters in use =
(current/cache/total/max)
543858/821 mbuf+clusters out of packet secondary zone in use =
(current/cache)
0/77/77/262144 4k (page size) jumbo clusters in use =
(current/cache/total/max)
0/0/0/131072 9k jumbo clusters in use (current/cache/total/max)
0/0/0/65536 16k jumbo clusters in use (current/cache/total/max)
1224035K/7291K/1231326K bytes allocated to network (current/cache/total)
0/58919/29431 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines