Date: Tue, 25 Mar 2014 18:21:50 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Markus Gebert <markus.gebert@hostpoint.ch> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Christopher Forgeron <csforgeron@gmail.com> Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <1212254686.521736.1395786110243.JavaMail.root@uoguelph.ca> In-Reply-To: <1573EFCE-EFCF-4ABF-A1A5-77714B56F9F1@hostpoint.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
Markus Gebert wrote: >=20 > On 25.03.2014, at 22:46, Rick Macklem <rmacklem@uoguelph.ca> wrote: >=20 > > Markus Gebert wrote: > >>=20 > >> On 25.03.2014, at 02:18, Rick Macklem <rmacklem@uoguelph.ca> > >> wrote: > >>=20 > >>> Christopher Forgeron wrote: > >>>>=20 > >>>>=20 > >>>>=20 > >>>> This is regarding the TSO patch that Rick suggested earlier. > >>>> (With > >>>> many thanks for his time and suggestion) > >>>>=20 > >>>>=20 > >>>> As I mentioned earlier, it did not fix the issue on a 10.0 > >>>> system. > >>>> It > >>>> did make it less of a problem on 9.2, but either way, I think > >>>> it's > >>>> not needed, and shouldn't be considered as a patch for > >>>> testing/etc. > >>>>=20 > >>>>=20 > >>>> Patching TSO to anything other than a max value (and by default > >>>> the > >>>> code gives it IP_MAXPACKET) is confusing the matter, as the > >>>> packet > >>>> length ultimately needs to be adjusted for many things on the > >>>> fly > >>>> like TCP Options, etc. Using static header sizes won't be a good > >>>> idea. > >>>>=20 > >>> If you look at tcp_output(), you'll notice that it doesn't do TSO > >>> if > >>> there are any options. That way it knows that the TCP/IP header > >>> is > >>> just hdrlen. > >>>=20 > >>> If you don't limit the TSO packet (including TCP/IP and ethernet > >>> headers) > >>> to 64K, then the "ix" driver can't send them, which is the > >>> problem > >>> you guys are seeing. > >>>=20 > >>> There are other ways to fix this problem, but they all may > >>> introduce > >>> issues that reducing if_hw_tsomax by a small amount does not. > >>> For example, m_defrag() could be modified to use 4K pagesize > >>> clusters, > >>> but this might introduce memory fragmentation problems. (I > >>> observed > >>> what I think are memory fragmentation problems when I switched > >>> NFS > >>> to use 4K pagesize clusters for large I/O messages.) > >>>=20 > >>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG > >>> error replies), then that is the size that if_hw_tsomax can be > >>> set > >>> to (just can't change IP_MAXPACKET, but that is defined for other > >>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax > >>> defaults to. It has no other effect w.r.t. TSO.) > >>>=20 > >>>>=20 > >>>> Additionally, it seems that setting nic TSO will/may be ignored > >>>> by > >>>> code like this in sys/netinet/tcp_output.c: > >>>>=20 > >>=20 > >> Is this confirmed or still a =E2=80=98it seems=E2=80=99? Have you actu= ally seen a > >> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was > >> this just speculation because the values are stored in different > >> places? (Sorry, if you already stated this in another email, it=E2=80= =99s > >> currently hard to keep track of all the information.) > >>=20 > >> Anyway, this dtrace one-liner should be a good test if other > >> values > >> appear in tp->t_tsomax: > >>=20 > >> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 && > >> args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax: > >> %i\n", args[0]->t_tsomax); stack(); }' > >>=20 > >> Remember to adjust the value in the condition to whatever you=E2=80=99= re > >> currently expecting. The value seems to be 0 for new connections, > >> probably when tcp_mss() has not been called yet. So that=E2=80=99s see= ms > >> normal and I have excluded that case too. This will also print a > >> kernel stack trace in case it sees an unexpected value. > >>=20 > >>=20 > >>> Yes, but I don't know why. > >>> The only conjecture I can come up with is that another net driver > >>> is > >>> stacked above "ix" and the setting for if_hw_tsomax doesn't > >>> propagate > >>> up. (If you look at the commit log message for r251296, the > >>> intent > >>> of adding if_hw_tsomax was to allow device drivers to set a > >>> smaller > >>> tsomax than IP_MAXPACKET.) > >>>=20 > >>> Are you using any of the "stacked" network device drivers like > >>> lagg? I don't even know what the others all are? > >>> Maybe someone else can list them? > >>=20 > >> I guess the most obvious are lagg and vlan (and probably carp on > >> FreeBSD 9.x or older). > >>=20 > >> On request from Jack, we=E2=80=99ve eliminated lagg and vlan from the > >> picture, which gives us plain ixgbe interfaces with no stacked > >> interfaces on top of it. And we can still reproduce the problem. > >>=20 > > This was related to the "did if_hw_tsomax set tp->t_tsomax to the > > same value?" question. Since you reported that my patch that set > > if_hw_tsomax in the driver didn't fix the problem, that suggests > > that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, > > but we don't know why? >=20 > Jack asked us to remove lagg/vlans in the very beginning of this > thread, and when had done that, the problem was still there. So my > answer was not related to your recent patch. I wanted to clarify > that we have been testing with ixgbe only for quite some time and > that stacked interfaces could not be a source of problems in our > test scenario. >=20 > We have just started testing your patch that sets if_hw_tsomax > yesterday. So far I have it running on two systems along with some > printfs and the dtrace one-liner that watches over tp->t_tsomax in > tcp_output(). So far we=E2=80=99ve haven=E2=80=99t had any problems with = these two > servers, and the dtrace probe never fired, so far it looks like > tp->t_tsomax always gets set from if_hw_tsomax. But it=E2=80=99s too soon= to > make a conclusion, it may take days to trigger the problem again. It > might also be fixed with your patch. >=20 Righto. Setting if_hw_tsomax in the driver is supposed to set tp->t_tsomax and I could see it work in a trivial test (I hacked the code so the assignm= ents are done for the non-tso case and it worked for the non-tso "re" driver I r= un.) { As an aside, one of these assignments does happen for non-tso cases, sinc= e although it is indented, there are no {} for the block. In tcp_subr.c if = I recall. However, doing the assignment for the non-tso case seems harmless= to me. } > I=E2=80=99m booting more systems with the test kernel and I will be watch= ing > all of them with dtrace to see I i find an occurence where > tp->t_tsomax is off. I hope that with more systems, I=E2=80=99ll have an > answer more quickly. >=20 > But digging around the code, I still don=E2=80=99t see a way how tp->tsom= ax > could not have been set from if_hw_tsomax when there are no stacked > interfaces=E2=80=A6 >=20 It seems to happen where you mentioned before. Since it only gets set from cap.tsomax and that gets set from if_hw_tsomax, it would be 0 otherwise. Christopher sees in change when he changes IP_MAXPACET, so the default setting works, but for him setting it in the driver didn't, for some reason? Thanks for doing the testing, rick >=20 > Markus >=20 >=20 > > rick > >=20 > >>=20 > >> Markus > >>=20 > >>=20 > >>>=20 > >>> rick > >>>>=20 > >>>> 10.0 Code: > >>>>=20 > >>>> 780 if (len > tp->t_tsomax - hdrlen) { !! > >>>> 781 len =3D tp->t_tsomax - hdrlen; !! > >>>> 782 sendalot =3D 1; > >>>> 783 } > >>>>=20 > >>>>=20 > >>>>=20 > >>>>=20 > >>>> I've put debugging here, set the nic's max TSO as per Rick's > >>>> patch > >>>> ( > >>>> set to say 32k), and have seen that tp->t_tsomax =3D=3D > >>>> IP_MAXPACKET. > >>>> It's being set someplace else, and thus our attempts to set TSO > >>>> on > >>>> the nic may be in vain. > >>>>=20 > >>>>=20 > >>>> It may have mattered more in 9.2, as I see the code doesn't use > >>>> tp->t_tsomax in some locations, and may actually default to what > >>>> the > >>>> nic is set to. > >>>>=20 > >>>> The NIC may still win, I didn't walk through the code to > >>>> confirm, > >>>> it > >>>> was enough to suggest to me that setting TSO wouldn't fix this > >>>> issue. > >>>>=20 > >>>>=20 > >>>> However, this is still a TSO related issue, it's just not one > >>>> related > >>>> to the setting of TSO's max size. > >>>>=20 > >>>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a > >>>> single > >>>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a > >>>> bit > >>>> longer to increase confidence in this assertion, but I don't > >>>> want > >>>> to > >>>> waste time on this when I could be logging problem packets on a > >>>> system with TSO enabled. > >>>>=20 > >>>>=20 > >>>> Comments are very welcome.. > >>>>=20 > >>>>=20 > >>>>=20 > >>> _______________________________________________ > >>> freebsd-net@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net > >>> To unsubscribe, send any mail to > >>> "freebsd-net-unsubscribe@freebsd.org" > >>>=20 > >>=20 > >>=20 > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to > > "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1212254686.521736.1395786110243.JavaMail.root>