Date: Sun, 23 Mar 2014 11:54:54 -0300 From: Christopher Forgeron <csforgeron@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch> Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <CAB2_NwAEzgs1u7GkueKrhMT7iSRqZqkHObrOrXeaLC_EW7Nnwg@mail.gmail.com> In-Reply-To: <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca> References: <CAB2_NwDRAxmnszh7jKKPfvxBdgaA9Z0CcJ9c1wSNncKb55Td5w@mail.gmail.com> <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick, very helpful as always. On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Christopher Forgeron wrote: > > Well, you could try making if_hw_tsomax somewhat smaller. (I can't see > how the packet including ethernet header would be more than 64K with the > patch, but?? For example, the ether_output() code can call ng_output() > and I have no idea if that might grow the data size of the packet?) > That's what I was thinking - I was going to drop it down to 32k, which is extreme, but I wanted to see if it cured it or not. Something would have to be very broken to be adding nearly 32k to a packet. > To be honest, the optimum for NFS would be setting if_hw_tsomax == 56K, > since that would avoid the overhead of the m_defrag() calls. However, > it is suboptimal for other TCP transfers. > I'm very interested in NFS performance, so this is interesting to me - Do you have the time to educate me on this? I was going to spend this week hacking out the NFS server cache, as I feel ZFS does a better job, and my cache stats are always terrible, as to be expected when I have such a wide data usage on these sans. > > One other thing you could do (if you still have them) is scan the logs > for the code with my previous printf() patch and see if there is ever > a size > 65549 in it. If there is, then if_hw_tsomax needs to be smaller > by at least that size - 65549. (65535 + 14 == 65549) > There were some 65548's for sure. Interestingly enough, the amount that it ruptures by seems to be increasing slowly. I should possibly let it rupture and run for a long time to see if there is a steadily increasing pattern... perhaps something is accidentally incrementing the packet by say 4 bytes in a heavily loaded error condition. > > I'm not familiar enough with the mbuf/uma allocators to "confirm" it, > but I believe the "denied" refers to cases where m_getjcl() fails to get > a jumbo mbuf and returns NULL. > > If this were to happen in m_defrag(), it would return NULL and the ix > driver returns ENOBUFS, so this is not the case for EFBIG errors. > > BTW, the loop that your original printf code is in, just before the retry: goto label: That's an error loop, and it looks to me that all/most packets traverse it at some time? > I don't know if increasing the limits for the jumbo mbufs via sysctl > will help. If you are using the code without Jack's patch, which uses > 9K mbufs, then I think it can fragment the address space and result > in no 9K contiguous areas to allocate from. (I'm just going by what > Garrett and others have said about this.) > > I never seem to be running out of mbufs - 4k or 9k. Unless it's possible for a starvation to occur without incrementing the counters. Additionally, netstat -m is recording denied mbufs on boot, so on a 96 Gig system that is just starting up, I don't think I am.. but a large increase in the buffers is on my list of desperation things to try. Thanks for the hint on m_getjcl().. I'll dig around and see if I can find what's happening there. I guess it's time for me to learn basic dtrace as well. :-)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAB2_NwAEzgs1u7GkueKrhMT7iSRqZqkHObrOrXeaLC_EW7Nnwg>