From owner-freebsd-net@FreeBSD.ORG Sun Mar 23 14:54:56 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 326DC646; Sun, 23 Mar 2014 14:54:56 +0000 (UTC) Received: from mail-qg0-x22a.google.com (mail-qg0-x22a.google.com [IPv6:2607:f8b0:400d:c04::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D3D8A15E; Sun, 23 Mar 2014 14:54:55 +0000 (UTC) Received: by mail-qg0-f42.google.com with SMTP id q107so13448236qgd.1 for ; Sun, 23 Mar 2014 07:54:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=imHw7gs6vO9gS5lHL8nqeqcUdLU9sVBEhPQKR2M3x6c=; b=TMwcmC98hvgxLx4eSy+Zn8jD/yGUyCelxznKPcnEAS8snWKq8xKkmUN1mSDvghNV4y HmtYNDzEACoh+O+YDNUSRbPHZFg5x2xkmn3z5M9yeRkE1DMO2xLW3IHwGY8/pAjFB8He pGPFBFljHiUJCAZi/Q6TjlMwXq3FrbH2PADSChAO62jkuZ9qLSpsf0KbLQMaXewhE2Zp jmrzvWcw8hS5Sgh3QYoJ9eUwF+LMtizGR99t85tLqFDEVzqZM79cIHv/JdUefv0gy+pd enaBOwpji0oYBCv//g3OGMIFTpDyr0Vpji9jms0CmaGBoSaGx8fywkQuJAqoivq6x/gR qgSw== MIME-Version: 1.0 X-Received: by 10.224.147.206 with SMTP id m14mr10354026qav.41.1395586494566; Sun, 23 Mar 2014 07:54:54 -0700 (PDT) Received: by 10.96.79.97 with HTTP; Sun, 23 Mar 2014 07:54:54 -0700 (PDT) In-Reply-To: <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca> References: <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca> Date: Sun, 23 Mar 2014 11:54:54 -0300 Message-ID: Subject: Re: 9.2 ixgbe tx queue hang From: Christopher Forgeron To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Net , Garrett Wollman , Jack Vogel , Markus Gebert X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Mar 2014 14:54:56 -0000 Hi Rick, very helpful as always. On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem wrote: > Christopher Forgeron wrote: > > Well, you could try making if_hw_tsomax somewhat smaller. (I can't see > how the packet including ethernet header would be more than 64K with the > patch, but?? For example, the ether_output() code can call ng_output() > and I have no idea if that might grow the data size of the packet?) > That's what I was thinking - I was going to drop it down to 32k, which is extreme, but I wanted to see if it cured it or not. Something would have to be very broken to be adding nearly 32k to a packet. > To be honest, the optimum for NFS would be setting if_hw_tsomax == 56K, > since that would avoid the overhead of the m_defrag() calls. However, > it is suboptimal for other TCP transfers. > I'm very interested in NFS performance, so this is interesting to me - Do you have the time to educate me on this? I was going to spend this week hacking out the NFS server cache, as I feel ZFS does a better job, and my cache stats are always terrible, as to be expected when I have such a wide data usage on these sans. > > One other thing you could do (if you still have them) is scan the logs > for the code with my previous printf() patch and see if there is ever > a size > 65549 in it. If there is, then if_hw_tsomax needs to be smaller > by at least that size - 65549. (65535 + 14 == 65549) > There were some 65548's for sure. Interestingly enough, the amount that it ruptures by seems to be increasing slowly. I should possibly let it rupture and run for a long time to see if there is a steadily increasing pattern... perhaps something is accidentally incrementing the packet by say 4 bytes in a heavily loaded error condition. > > I'm not familiar enough with the mbuf/uma allocators to "confirm" it, > but I believe the "denied" refers to cases where m_getjcl() fails to get > a jumbo mbuf and returns NULL. > > If this were to happen in m_defrag(), it would return NULL and the ix > driver returns ENOBUFS, so this is not the case for EFBIG errors. > > BTW, the loop that your original printf code is in, just before the retry: goto label: That's an error loop, and it looks to me that all/most packets traverse it at some time? > I don't know if increasing the limits for the jumbo mbufs via sysctl > will help. If you are using the code without Jack's patch, which uses > 9K mbufs, then I think it can fragment the address space and result > in no 9K contiguous areas to allocate from. (I'm just going by what > Garrett and others have said about this.) > > I never seem to be running out of mbufs - 4k or 9k. Unless it's possible for a starvation to occur without incrementing the counters. Additionally, netstat -m is recording denied mbufs on boot, so on a 96 Gig system that is just starting up, I don't think I am.. but a large increase in the buffers is on my list of desperation things to try. Thanks for the hint on m_getjcl().. I'll dig around and see if I can find what's happening there. I guess it's time for me to learn basic dtrace as well. :-)