From owner-freebsd-net@FreeBSD.ORG  Sun Mar 23 14:54:56 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 326DC646;
 Sun, 23 Mar 2014 14:54:56 +0000 (UTC)
Received: from mail-qg0-x22a.google.com (mail-qg0-x22a.google.com
 [IPv6:2607:f8b0:400d:c04::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id D3D8A15E;
 Sun, 23 Mar 2014 14:54:55 +0000 (UTC)
Received: by mail-qg0-f42.google.com with SMTP id q107so13448236qgd.1
 for <multiple recipients>; Sun, 23 Mar 2014 07:54:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=imHw7gs6vO9gS5lHL8nqeqcUdLU9sVBEhPQKR2M3x6c=;
 b=TMwcmC98hvgxLx4eSy+Zn8jD/yGUyCelxznKPcnEAS8snWKq8xKkmUN1mSDvghNV4y
 HmtYNDzEACoh+O+YDNUSRbPHZFg5x2xkmn3z5M9yeRkE1DMO2xLW3IHwGY8/pAjFB8He
 pGPFBFljHiUJCAZi/Q6TjlMwXq3FrbH2PADSChAO62jkuZ9qLSpsf0KbLQMaXewhE2Zp
 jmrzvWcw8hS5Sgh3QYoJ9eUwF+LMtizGR99t85tLqFDEVzqZM79cIHv/JdUefv0gy+pd
 enaBOwpji0oYBCv//g3OGMIFTpDyr0Vpji9jms0CmaGBoSaGx8fywkQuJAqoivq6x/gR
 qgSw==
MIME-Version: 1.0
X-Received: by 10.224.147.206 with SMTP id m14mr10354026qav.41.1395586494566; 
 Sun, 23 Mar 2014 07:54:54 -0700 (PDT)
Received: by 10.96.79.97 with HTTP; Sun, 23 Mar 2014 07:54:54 -0700 (PDT)
In-Reply-To: <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca>
References: <CAB2_NwDRAxmnszh7jKKPfvxBdgaA9Z0CcJ9c1wSNncKb55Td5w@mail.gmail.com>
 <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca>
Date: Sun, 23 Mar 2014 11:54:54 -0300
Message-ID: <CAB2_NwAEzgs1u7GkueKrhMT7iSRqZqkHObrOrXeaLC_EW7Nnwg@mail.gmail.com>
Subject: Re: 9.2 ixgbe tx queue hang
From: Christopher Forgeron <csforgeron@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: FreeBSD Net <freebsd-net@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>,
 Markus Gebert <markus.gebert@hostpoint.ch>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Mar 2014 14:54:56 -0000

Hi Rick, very helpful as always.


On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Christopher Forgeron wrote:
>
> Well, you could try making if_hw_tsomax somewhat smaller. (I can't see
> how the packet including ethernet header would be more than 64K with the
> patch, but?? For example, the ether_output() code can call ng_output()
> and I have no idea if that might grow the data size of the packet?)
>

That's what I was thinking - I was going to drop it down to 32k, which is
extreme, but I wanted to see if it cured it or not. Something would have to
be very broken to be adding nearly 32k to a packet.


> To be honest, the optimum for NFS would be setting if_hw_tsomax == 56K,
> since that would avoid the overhead of the m_defrag() calls. However,
> it is suboptimal for other TCP transfers.
>

I'm very interested in NFS performance, so this is interesting to me - Do
you have the time to educate me on this? I was going to spend this week
hacking out the NFS server cache, as I feel ZFS does a better job, and my
cache stats are always terrible, as to be expected when I have such a wide
data usage on these sans.

>
> One other thing you could do (if you still have them) is scan the logs
> for the code with my previous printf() patch and see if there is ever
> a size > 65549 in it. If there is, then if_hw_tsomax needs to be smaller
> by at least that size - 65549. (65535 + 14 == 65549)
>

There were some 65548's for sure. Interestingly enough, the amount that it
ruptures by seems to be increasing slowly. I should possibly let it rupture
and run for a long time to see if there is a steadily increasing pattern...
perhaps something is accidentally incrementing the packet by say 4 bytes in
a heavily loaded error condition.

>

> I'm not familiar enough with the mbuf/uma allocators to "confirm" it,
> but I believe the "denied" refers to cases where m_getjcl() fails to get
> a jumbo mbuf and returns NULL.
>
> If this were to happen in m_defrag(), it would return NULL and the ix
> driver returns ENOBUFS, so this is not the case for EFBIG errors.
>
> BTW, the loop that your original printf code is in, just before the retry:
goto label: That's an error loop, and it looks to me that all/most packets
traverse it at some time?


> I don't know if increasing the limits for the jumbo mbufs via sysctl
> will help. If you are using the code without Jack's patch, which uses
> 9K mbufs, then I think it can fragment the address space and result
> in no 9K contiguous areas to allocate from. (I'm just going by what
> Garrett and others have said about this.)
>
>
I never seem to be running out of mbufs - 4k or 9k. Unless it's possible
for a starvation to occur without incrementing the counters. Additionally,
netstat -m is recording denied mbufs on boot, so on a 96 Gig system that is
just starting up, I don't think I am.. but a large increase in the buffers
is on my list of desperation things to try.

Thanks for the hint on m_getjcl().. I'll dig around and see if I can find
what's happening there. I guess it's time for me to learn basic dtrace as
well. :-)