From owner-freebsd-net@FreeBSD.ORG Fri Mar 21 13:04:53 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 95A3EE04 for ; Fri, 21 Mar 2014 13:04:53 +0000 (UTC) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 23EFC7E4 for ; Fri, 21 Mar 2014 13:04:52 +0000 (UTC) Received: from [2001:1620:2013:1:98ae:107d:2646:4979] (port=56494) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1WQz7w-000H6F-MU; Fri, 21 Mar 2014 14:04:48 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: 9.2 ixgbe tx queue hang From: Markus Gebert In-Reply-To: Date: Fri, 21 Mar 2014 14:04:07 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1543350122.637684.1395368002237.JavaMail.root@uoguelph.ca> To: Christopher Forgeron X-Mailer: Apple Mail (2.1874) Cc: FreeBSD Net , Rick Macklem , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Mar 2014 13:04:53 -0000 On 21.03.2014, at 12:47, Christopher Forgeron = wrote: > Hello all, >=20 > I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer = away > at the NFS store overnight - But the problem is still there. >=20 > =46rom what I read, I think the MJUM9BYTES removal is probably good = cleanup > (as long as it doesn't trade performance on a lightly memory loaded = system > for performance on a heavily memory loaded system). If I can stabilize = my > system, I may attempt those benchmarks. >=20 > I think the fix will be obvious at boot for me - My 9.2 has a 'clean' > netstat > - Until I can boot and see a 'netstat -m' that looks similar to that, = I'm > going to have this problem. >=20 > Markus: Do your systems show denied mbufs at boot like mine does? No. Our systems never show denied mbufs. Not on boot, not during normal = operations and also not when the problem is occuring. I don=92t know = what you do differently, but in our case neither 4k nor 9k mbufs get = used, only the normal ones. I=92m beginning to think that we look at different problems and at least = quite different symptoms of a similar problem. Have you had luck in = trying to find out, where EFBIG originates from in your case? Markus > Turning off TSO works for me, but at a performance hit. >=20 > I'll compile Rick's patch (and extra debugging) this morning and let = you > know soon. >=20 >=20 >=20 >=20 > On Thu, Mar 20, 2014 at 11:47 PM, Christopher Forgeron = > wrote: >=20 >> BTW - I think this will end up being a TSO issue, not the patch that = Jack >> applied. >>=20 >> When I boot Jack's patch (MJUM9BYTES removal) this is what netstat -m >> shows: >>=20 >> 21489/2886/24375 mbufs in use (current/cache/total) >> 4080/626/4706/6127254 mbuf clusters in use (current/cache/total/max) >> 4080/587 mbuf+clusters out of packet secondary zone in use = (current/cache) >> 16384/50/16434/3063627 4k (page size) jumbo clusters in use >> (current/cache/total/max) >> 0/0/0/907741 9k jumbo clusters in use (current/cache/total/max) >>=20 >> 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max) >> 79068K/2173K/81241K bytes allocated to network (current/cache/total) >> 18831/545/4542 requests for mbufs denied = (mbufs/clusters/mbuf+clusters) >>=20 >> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> 15626/0/0 requests for jumbo clusters denied (4k/9k/16k) >>=20 >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >>=20 >> Here is an un-patched boot: >>=20 >> 21550/7400/28950 mbufs in use (current/cache/total) >> 4080/3760/7840/6127254 mbuf clusters in use (current/cache/total/max) >> 4080/2769 mbuf+clusters out of packet secondary zone in use = (current/cache) >> 0/42/42/3063627 4k (page size) jumbo clusters in use >> (current/cache/total/max) >> 16439/129/16568/907741 9k jumbo clusters in use = (current/cache/total/max) >>=20 >> 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max) >> 161498K/10699K/172197K bytes allocated to network = (current/cache/total) >> 18345/155/4099 requests for mbufs denied = (mbufs/clusters/mbuf+clusters) >>=20 >> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> 3/3723/0 requests for jumbo clusters denied (4k/9k/16k) >>=20 >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >>=20 >>=20 >>=20 >> See how removing the MJUM9BYTES is just pushing the problem from the = 9k >> jumbo cluster into the 4k jumbo cluster? >>=20 >> Compare this to my FreeBSD 9.2 STABLE machine from ~ Dec 2013 : Exact = same >> hardware, revisions, zpool size, etc. Just it's running an older = FreeBSD. >>=20 >> # uname -a >> FreeBSD SAN1.XXXXX 9.2-STABLE FreeBSD 9.2-STABLE #0: Wed Dec 25 = 15:12:14 >> AST 2013 aatech@FreeBSD-Update = Server:/usr/obj/usr/src/sys/GENERIC >> amd64 >>=20 >> root@SAN1:/san1 # uptime >> 7:44AM up 58 days, 38 mins, 4 users, load averages: 0.42, 0.80, 0.91 >>=20 >> root@SAN1:/san1 # netstat -m >> 37930/15755/53685 mbufs in use (current/cache/total) >> 4080/10996/15076/524288 mbuf clusters in use = (current/cache/total/max) >> 4080/5775 mbuf+clusters out of packet secondary zone in use = (current/cache) >> 0/692/692/262144 4k (page size) jumbo clusters in use >> (current/cache/total/max) >> 32773/4257/37030/96000 9k jumbo clusters in use = (current/cache/total/max) >>=20 >> 0/0/0/508538 16k jumbo clusters in use (current/cache/total/max) >> 312599K/67011K/379611K bytes allocated to network = (current/cache/total) >>=20 >> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >> 0/0/0 sfbufs in use (current/peak/max) >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >> 0 calls to protocol drain routines >>=20 >> Lastly, please note this link: >>=20 >> = http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033660.html >>=20 >> It's so old that I assume the TSO leak that he speaks of has been = patched, >> but perhaps not. More things to look into tomorrow. >>=20 >>=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20