Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Mar 2014 18:21:50 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Markus Gebert <markus.gebert@hostpoint.ch>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Christopher Forgeron <csforgeron@gmail.com>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <1212254686.521736.1395786110243.JavaMail.root@uoguelph.ca>
In-Reply-To: <1573EFCE-EFCF-4ABF-A1A5-77714B56F9F1@hostpoint.ch>

next in thread | previous in thread | raw e-mail | index | archive | help
Markus Gebert wrote:
>=20
> On 25.03.2014, at 22:46, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>=20
> > Markus Gebert wrote:
> >>=20
> >> On 25.03.2014, at 02:18, Rick Macklem <rmacklem@uoguelph.ca>
> >> wrote:
> >>=20
> >>> Christopher Forgeron wrote:
> >>>>=20
> >>>>=20
> >>>>=20
> >>>> This is regarding the TSO patch that Rick suggested earlier.
> >>>> (With
> >>>> many thanks for his time and suggestion)
> >>>>=20
> >>>>=20
> >>>> As I mentioned earlier, it did not fix the issue on a 10.0
> >>>> system.
> >>>> It
> >>>> did make it less of a problem on 9.2, but either way, I think
> >>>> it's
> >>>> not needed, and shouldn't be considered as a patch for
> >>>> testing/etc.
> >>>>=20
> >>>>=20
> >>>> Patching TSO to anything other than a max value (and by default
> >>>> the
> >>>> code gives it IP_MAXPACKET) is confusing the matter, as the
> >>>> packet
> >>>> length ultimately needs to be adjusted for many things on the
> >>>> fly
> >>>> like TCP Options, etc. Using static header sizes won't be a good
> >>>> idea.
> >>>>=20
> >>> If you look at tcp_output(), you'll notice that it doesn't do TSO
> >>> if
> >>> there are any options. That way it knows that the TCP/IP header
> >>> is
> >>> just hdrlen.
> >>>=20
> >>> If you don't limit the TSO packet (including TCP/IP and ethernet
> >>> headers)
> >>> to 64K, then the "ix" driver can't send them, which is the
> >>> problem
> >>> you guys are seeing.
> >>>=20
> >>> There are other ways to fix this problem, but they all may
> >>> introduce
> >>> issues that reducing if_hw_tsomax by a small amount does not.
> >>> For example, m_defrag() could be modified to use 4K pagesize
> >>> clusters,
> >>> but this might introduce memory fragmentation problems. (I
> >>> observed
> >>> what I think are memory fragmentation problems when I switched
> >>> NFS
> >>> to use 4K pagesize clusters for large I/O messages.)
> >>>=20
> >>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG
> >>> error replies), then that is the size that if_hw_tsomax can be
> >>> set
> >>> to (just can't change IP_MAXPACKET, but that is defined for other
> >>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax
> >>> defaults to. It has no other effect w.r.t. TSO.)
> >>>=20
> >>>>=20
> >>>> Additionally, it seems that setting nic TSO will/may be ignored
> >>>> by
> >>>> code like this in sys/netinet/tcp_output.c:
> >>>>=20
> >>=20
> >> Is this confirmed or still a =E2=80=98it seems=E2=80=99? Have you actu=
ally seen a
> >> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was
> >> this just speculation because the values are stored in different
> >> places? (Sorry, if you already stated this in another email, it=E2=80=
=99s
> >> currently hard to keep track of all the information.)
> >>=20
> >> Anyway, this dtrace one-liner should be a good test if other
> >> values
> >> appear in tp->t_tsomax:
> >>=20
> >> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 &&
> >> args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax:
> >> %i\n", args[0]->t_tsomax); stack(); }'
> >>=20
> >> Remember to adjust the value in the condition to whatever you=E2=80=99=
re
> >> currently expecting. The value seems to be 0 for new connections,
> >> probably when tcp_mss() has not been called yet. So that=E2=80=99s see=
ms
> >> normal and I have excluded that case too. This will also print a
> >> kernel stack trace in case it sees an unexpected value.
> >>=20
> >>=20
> >>> Yes, but I don't know why.
> >>> The only conjecture I can come up with is that another net driver
> >>> is
> >>> stacked above "ix" and the setting for if_hw_tsomax doesn't
> >>> propagate
> >>> up. (If you look at the commit log message for r251296, the
> >>> intent
> >>> of adding if_hw_tsomax was to allow device drivers to set a
> >>> smaller
> >>> tsomax than IP_MAXPACKET.)
> >>>=20
> >>> Are you using any of the "stacked" network device drivers like
> >>> lagg? I don't even know what the others all are?
> >>> Maybe someone else can list them?
> >>=20
> >> I guess the most obvious are lagg and vlan (and probably carp on
> >> FreeBSD 9.x or older).
> >>=20
> >> On request from Jack, we=E2=80=99ve eliminated lagg and vlan from the
> >> picture, which gives us plain ixgbe interfaces with no stacked
> >> interfaces on top of it. And we can still reproduce the problem.
> >>=20
> > This was related to the "did if_hw_tsomax set tp->t_tsomax to the
> > same value?" question. Since you reported that my patch that set
> > if_hw_tsomax in the driver didn't fix the problem, that suggests
> > that tp->t_tsomax isn't being set to if_hw_tsomax from the driver,
> > but we don't know why?
>=20
> Jack asked us to remove lagg/vlans in the very beginning of this
> thread, and when had done that, the problem was still there. So my
> answer was not related to your recent patch. I wanted to clarify
> that we have been testing with ixgbe only for quite some time and
> that stacked interfaces could not be a source of problems in our
> test scenario.
>=20
> We have just started testing your patch that sets if_hw_tsomax
> yesterday. So far I have it running on two systems along with some
> printfs and the dtrace one-liner that watches over tp->t_tsomax in
> tcp_output(). So far we=E2=80=99ve haven=E2=80=99t had any problems with =
these two
> servers, and the dtrace probe never fired, so far it looks like
> tp->t_tsomax always gets set from if_hw_tsomax. But it=E2=80=99s too soon=
 to
> make a conclusion, it may take days to trigger the problem again. It
> might also be fixed with your patch.
>=20
Righto. Setting if_hw_tsomax in the driver is supposed to set tp->t_tsomax
and I could see it work in a trivial test (I hacked the code so the assignm=
ents
are done for the non-tso case and it worked for the non-tso "re" driver I r=
un.)
{ As an aside, one of these assignments does happen for non-tso cases, sinc=
e
  although it is indented, there are no {} for the block. In tcp_subr.c if =
I
  recall. However, doing the assignment for the non-tso case seems harmless=
 to me. }

> I=E2=80=99m booting more systems with the test kernel and I will be watch=
ing
> all of them with dtrace to see I i find an occurence where
> tp->t_tsomax is off. I hope that with more systems, I=E2=80=99ll have an
> answer more quickly.
>=20
> But digging around the code, I still don=E2=80=99t see a way how tp->tsom=
ax
> could not have been set from if_hw_tsomax when there are no stacked
> interfaces=E2=80=A6
>=20
It seems to happen where you mentioned before. Since it only gets set
from cap.tsomax and that gets set from if_hw_tsomax, it would be 0
otherwise. Christopher sees in change when he changes IP_MAXPACET, so
the default setting works, but for him setting it in the driver didn't,
for some reason?

Thanks for doing the testing, rick

>=20
> Markus
>=20
>=20
> > rick
> >=20
> >>=20
> >> Markus
> >>=20
> >>=20
> >>>=20
> >>> rick
> >>>>=20
> >>>> 10.0 Code:
> >>>>=20
> >>>> 780 if (len > tp->t_tsomax - hdrlen) { !!
> >>>> 781 len =3D tp->t_tsomax - hdrlen; !!
> >>>> 782 sendalot =3D 1;
> >>>> 783 }
> >>>>=20
> >>>>=20
> >>>>=20
> >>>>=20
> >>>> I've put debugging here, set the nic's max TSO as per Rick's
> >>>> patch
> >>>> (
> >>>> set to say 32k), and have seen that tp->t_tsomax =3D=3D
> >>>> IP_MAXPACKET.
> >>>> It's being set someplace else, and thus our attempts to set TSO
> >>>> on
> >>>> the nic may be in vain.
> >>>>=20
> >>>>=20
> >>>> It may have mattered more in 9.2, as I see the code doesn't use
> >>>> tp->t_tsomax in some locations, and may actually default to what
> >>>> the
> >>>> nic is set to.
> >>>>=20
> >>>> The NIC may still win, I didn't walk through the code to
> >>>> confirm,
> >>>> it
> >>>> was enough to suggest to me that setting TSO wouldn't fix this
> >>>> issue.
> >>>>=20
> >>>>=20
> >>>> However, this is still a TSO related issue, it's just not one
> >>>> related
> >>>> to the setting of TSO's max size.
> >>>>=20
> >>>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a
> >>>> single
> >>>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a
> >>>> bit
> >>>> longer to increase confidence in this assertion, but I don't
> >>>> want
> >>>> to
> >>>> waste time on this when I could be logging problem packets on a
> >>>> system with TSO enabled.
> >>>>=20
> >>>>=20
> >>>> Comments are very welcome..
> >>>>=20
> >>>>=20
> >>>>=20
> >>> _______________________________________________
> >>> freebsd-net@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>> To unsubscribe, send any mail to
> >>> "freebsd-net-unsubscribe@freebsd.org"
> >>>=20
> >>=20
> >>=20
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "freebsd-net-unsubscribe@freebsd.org"
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1212254686.521736.1395786110243.JavaMail.root>