Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Mar 2014 11:54:54 -0300
From:      Christopher Forgeron <csforgeron@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <CAB2_NwAEzgs1u7GkueKrhMT7iSRqZqkHObrOrXeaLC_EW7Nnwg@mail.gmail.com>
In-Reply-To: <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca>
References:  <CAB2_NwDRAxmnszh7jKKPfvxBdgaA9Z0CcJ9c1wSNncKb55Td5w@mail.gmail.com> <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick, very helpful as always.


On Sat, Mar 22, 2014 at 6:18 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Christopher Forgeron wrote:
>
> Well, you could try making if_hw_tsomax somewhat smaller. (I can't see
> how the packet including ethernet header would be more than 64K with the
> patch, but?? For example, the ether_output() code can call ng_output()
> and I have no idea if that might grow the data size of the packet?)
>

That's what I was thinking - I was going to drop it down to 32k, which is
extreme, but I wanted to see if it cured it or not. Something would have to
be very broken to be adding nearly 32k to a packet.


> To be honest, the optimum for NFS would be setting if_hw_tsomax == 56K,
> since that would avoid the overhead of the m_defrag() calls. However,
> it is suboptimal for other TCP transfers.
>

I'm very interested in NFS performance, so this is interesting to me - Do
you have the time to educate me on this? I was going to spend this week
hacking out the NFS server cache, as I feel ZFS does a better job, and my
cache stats are always terrible, as to be expected when I have such a wide
data usage on these sans.

>
> One other thing you could do (if you still have them) is scan the logs
> for the code with my previous printf() patch and see if there is ever
> a size > 65549 in it. If there is, then if_hw_tsomax needs to be smaller
> by at least that size - 65549. (65535 + 14 == 65549)
>

There were some 65548's for sure. Interestingly enough, the amount that it
ruptures by seems to be increasing slowly. I should possibly let it rupture
and run for a long time to see if there is a steadily increasing pattern...
perhaps something is accidentally incrementing the packet by say 4 bytes in
a heavily loaded error condition.

>

> I'm not familiar enough with the mbuf/uma allocators to "confirm" it,
> but I believe the "denied" refers to cases where m_getjcl() fails to get
> a jumbo mbuf and returns NULL.
>
> If this were to happen in m_defrag(), it would return NULL and the ix
> driver returns ENOBUFS, so this is not the case for EFBIG errors.
>
> BTW, the loop that your original printf code is in, just before the retry:
goto label: That's an error loop, and it looks to me that all/most packets
traverse it at some time?


> I don't know if increasing the limits for the jumbo mbufs via sysctl
> will help. If you are using the code without Jack's patch, which uses
> 9K mbufs, then I think it can fragment the address space and result
> in no 9K contiguous areas to allocate from. (I'm just going by what
> Garrett and others have said about this.)
>
>
I never seem to be running out of mbufs - 4k or 9k. Unless it's possible
for a starvation to occur without incrementing the counters. Additionally,
netstat -m is recording denied mbufs on boot, so on a 96 Gig system that is
just starting up, I don't think I am.. but a large increase in the buffers
is on my list of desperation things to try.

Thanks for the hint on m_getjcl().. I'll dig around and see if I can find
what's happening there. I guess it's time for me to learn basic dtrace as
well. :-)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAB2_NwAEzgs1u7GkueKrhMT7iSRqZqkHObrOrXeaLC_EW7Nnwg>