Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Mar 2014 14:04:07 +0100
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        Christopher Forgeron <csforgeron@gmail.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Rick Macklem <rmacklem@uoguelph.ca>, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <D1B4320A-DFFD-4647-8A43-238A088D7EF1@hostpoint.ch>
In-Reply-To: <CAB2_NwBSc3KWPYD-xbWYpRFTxpsKnXEr4V1ySP5g83aZM59MvQ@mail.gmail.com>
References:  <CAB2_NwB=21H5pcx=Wzz5gV38eRN%2BtfwhY28m2FZhdEi6X3JE7g@mail.gmail.com> <1543350122.637684.1395368002237.JavaMail.root@uoguelph.ca> <CAB2_NwCGsAHdMFPoST05azb9K_O-K_khk3Bi1sF2om3puCcyCw@mail.gmail.com> <CAB2_NwC3on1xP3UAutkQa-3zu_JhK0%2B-ZjVb6_3NVemw2Or-KQ@mail.gmail.com> <CAB2_NwBSc3KWPYD-xbWYpRFTxpsKnXEr4V1ySP5g83aZM59MvQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 21.03.2014, at 12:47, Christopher Forgeron <csforgeron@gmail.com> =
wrote:

> Hello all,
>=20
> I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer =
away
> at the NFS store overnight -  But the problem is still there.
>=20
> =46rom what I read, I think the MJUM9BYTES removal is probably good =
cleanup
> (as long as it doesn't trade performance on a lightly memory loaded =
system
> for performance on a heavily memory loaded system). If I can stabilize =
my
> system, I may attempt those benchmarks.
>=20
> I think the fix will be obvious at boot for me - My 9.2 has a 'clean'
> netstat
> - Until I can boot and see a 'netstat -m' that looks similar to that, =
I'm
> going to have this problem.
>=20
> Markus: Do your systems show denied mbufs at boot like mine does?

No. Our systems never show denied mbufs. Not on boot, not during normal =
operations and also not when the problem is occuring. I don=92t know =
what you do differently, but in our case neither 4k nor 9k mbufs get =
used, only the normal ones.

I=92m beginning to think that we look at different problems and at least =
quite different symptoms of a similar problem. Have you had luck in =
trying to find out, where EFBIG originates from in your case?


Markus


> Turning off TSO works for me, but at a performance hit.
>=20
> I'll compile Rick's patch (and extra debugging) this morning and let =
you
> know soon.
>=20
>=20
>=20
>=20
> On Thu, Mar 20, 2014 at 11:47 PM, Christopher Forgeron =
<csforgeron@gmail.com
>> wrote:
>=20
>> BTW - I think this will end up being a TSO issue, not the patch that =
Jack
>> applied.
>>=20
>> When I boot Jack's patch (MJUM9BYTES removal) this is what netstat -m
>> shows:
>>=20
>> 21489/2886/24375 mbufs in use (current/cache/total)
>> 4080/626/4706/6127254 mbuf clusters in use (current/cache/total/max)
>> 4080/587 mbuf+clusters out of packet secondary zone in use =
(current/cache)
>> 16384/50/16434/3063627 4k (page size) jumbo clusters in use
>> (current/cache/total/max)
>> 0/0/0/907741 9k jumbo clusters in use (current/cache/total/max)
>>=20
>> 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
>> 79068K/2173K/81241K bytes allocated to network (current/cache/total)
>> 18831/545/4542 requests for mbufs denied =
(mbufs/clusters/mbuf+clusters)
>>=20
>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> 15626/0/0 requests for jumbo clusters denied (4k/9k/16k)
>>=20
>> 0 requests for sfbufs denied
>> 0 requests for sfbufs delayed
>> 0 requests for I/O initiated by sendfile
>>=20
>> Here is an un-patched boot:
>>=20
>> 21550/7400/28950 mbufs in use (current/cache/total)
>> 4080/3760/7840/6127254 mbuf clusters in use (current/cache/total/max)
>> 4080/2769 mbuf+clusters out of packet secondary zone in use =
(current/cache)
>> 0/42/42/3063627 4k (page size) jumbo clusters in use
>> (current/cache/total/max)
>> 16439/129/16568/907741 9k jumbo clusters in use =
(current/cache/total/max)
>>=20
>> 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
>> 161498K/10699K/172197K bytes allocated to network =
(current/cache/total)
>> 18345/155/4099 requests for mbufs denied =
(mbufs/clusters/mbuf+clusters)
>>=20
>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> 3/3723/0 requests for jumbo clusters denied (4k/9k/16k)
>>=20
>> 0 requests for sfbufs denied
>> 0 requests for sfbufs delayed
>> 0 requests for I/O initiated by sendfile
>>=20
>>=20
>>=20
>> See how removing the MJUM9BYTES is just pushing the problem from the =
9k
>> jumbo cluster into the 4k jumbo cluster?
>>=20
>> Compare this to my FreeBSD 9.2 STABLE machine from ~ Dec 2013 : Exact =
same
>> hardware, revisions, zpool size, etc. Just it's running an older =
FreeBSD.
>>=20
>> # uname -a
>> FreeBSD SAN1.XXXXX 9.2-STABLE FreeBSD 9.2-STABLE #0: Wed Dec 25 =
15:12:14
>> AST 2013     aatech@FreeBSD-Update =
Server:/usr/obj/usr/src/sys/GENERIC
>> amd64
>>=20
>> root@SAN1:/san1 # uptime
>> 7:44AM  up 58 days, 38 mins, 4 users, load averages: 0.42, 0.80, 0.91
>>=20
>> root@SAN1:/san1 # netstat -m
>> 37930/15755/53685 mbufs in use (current/cache/total)
>> 4080/10996/15076/524288 mbuf clusters in use =
(current/cache/total/max)
>> 4080/5775 mbuf+clusters out of packet secondary zone in use =
(current/cache)
>> 0/692/692/262144 4k (page size) jumbo clusters in use
>> (current/cache/total/max)
>> 32773/4257/37030/96000 9k jumbo clusters in use =
(current/cache/total/max)
>>=20
>> 0/0/0/508538 16k jumbo clusters in use (current/cache/total/max)
>> 312599K/67011K/379611K bytes allocated to network =
(current/cache/total)
>>=20
>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>> 0/0/0 sfbufs in use (current/peak/max)
>> 0 requests for sfbufs denied
>> 0 requests for sfbufs delayed
>> 0 requests for I/O initiated by sendfile
>> 0 calls to protocol drain routines
>>=20
>> Lastly, please note this link:
>>=20
>> =
http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033660.html
>>=20
>> It's so old that I assume the TSO leak that he speaks of has been =
patched,
>> but perhaps not. More things to look into tomorrow.
>>=20
>>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D1B4320A-DFFD-4647-8A43-238A088D7EF1>