Date: Wed, 26 Mar 2014 00:07:18 +0100 From: Markus Gebert <markus.gebert@hostpoint.ch> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Christopher Forgeron <csforgeron@gmail.com> Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <2C869AD2-FC04-4E0B-9CBC-02FFA9AEC0EA@hostpoint.ch> In-Reply-To: <1212254686.521736.1395786110243.JavaMail.root@uoguelph.ca> References: <1212254686.521736.1395786110243.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On 25.03.2014, at 23:21, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Markus Gebert wrote: >>=20 >> On 25.03.2014, at 22:46, Rick Macklem <rmacklem@uoguelph.ca> wrote: >>=20 >>> Markus Gebert wrote: >>>>=20 >>>> On 25.03.2014, at 02:18, Rick Macklem <rmacklem@uoguelph.ca> >>>> wrote: >>>>=20 >>>>> Christopher Forgeron wrote: >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> This is regarding the TSO patch that Rick suggested earlier. >>>>>> (With >>>>>> many thanks for his time and suggestion) >>>>>>=20 >>>>>>=20 >>>>>> As I mentioned earlier, it did not fix the issue on a 10.0 >>>>>> system. >>>>>> It >>>>>> did make it less of a problem on 9.2, but either way, I think >>>>>> it's >>>>>> not needed, and shouldn't be considered as a patch for >>>>>> testing/etc. >>>>>>=20 >>>>>>=20 >>>>>> Patching TSO to anything other than a max value (and by default >>>>>> the >>>>>> code gives it IP_MAXPACKET) is confusing the matter, as the >>>>>> packet >>>>>> length ultimately needs to be adjusted for many things on the >>>>>> fly >>>>>> like TCP Options, etc. Using static header sizes won't be a good >>>>>> idea. >>>>>>=20 >>>>> If you look at tcp_output(), you'll notice that it doesn't do TSO >>>>> if >>>>> there are any options. That way it knows that the TCP/IP header >>>>> is >>>>> just hdrlen. >>>>>=20 >>>>> If you don't limit the TSO packet (including TCP/IP and ethernet >>>>> headers) >>>>> to 64K, then the "ix" driver can't send them, which is the >>>>> problem >>>>> you guys are seeing. >>>>>=20 >>>>> There are other ways to fix this problem, but they all may >>>>> introduce >>>>> issues that reducing if_hw_tsomax by a small amount does not. >>>>> For example, m_defrag() could be modified to use 4K pagesize >>>>> clusters, >>>>> but this might introduce memory fragmentation problems. (I >>>>> observed >>>>> what I think are memory fragmentation problems when I switched >>>>> NFS >>>>> to use 4K pagesize clusters for large I/O messages.) >>>>>=20 >>>>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG >>>>> error replies), then that is the size that if_hw_tsomax can be >>>>> set >>>>> to (just can't change IP_MAXPACKET, but that is defined for other >>>>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax >>>>> defaults to. It has no other effect w.r.t. TSO.) >>>>>=20 >>>>>>=20 >>>>>> Additionally, it seems that setting nic TSO will/may be ignored >>>>>> by >>>>>> code like this in sys/netinet/tcp_output.c: >>>>>>=20 >>>>=20 >>>> Is this confirmed or still a =91it seems=92? Have you actually seen = a >>>> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was >>>> this just speculation because the values are stored in different >>>> places? (Sorry, if you already stated this in another email, it=92s >>>> currently hard to keep track of all the information.) >>>>=20 >>>> Anyway, this dtrace one-liner should be a good test if other >>>> values >>>> appear in tp->t_tsomax: >>>>=20 >>>> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 && >>>> args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax: >>>> %i\n", args[0]->t_tsomax); stack(); }' >>>>=20 >>>> Remember to adjust the value in the condition to whatever you=92re >>>> currently expecting. The value seems to be 0 for new connections, >>>> probably when tcp_mss() has not been called yet. So that=92s seems >>>> normal and I have excluded that case too. This will also print a >>>> kernel stack trace in case it sees an unexpected value. >>>>=20 >>>>=20 >>>>> Yes, but I don't know why. >>>>> The only conjecture I can come up with is that another net driver >>>>> is >>>>> stacked above "ix" and the setting for if_hw_tsomax doesn't >>>>> propagate >>>>> up. (If you look at the commit log message for r251296, the >>>>> intent >>>>> of adding if_hw_tsomax was to allow device drivers to set a >>>>> smaller >>>>> tsomax than IP_MAXPACKET.) >>>>>=20 >>>>> Are you using any of the "stacked" network device drivers like >>>>> lagg? I don't even know what the others all are? >>>>> Maybe someone else can list them? >>>>=20 >>>> I guess the most obvious are lagg and vlan (and probably carp on >>>> FreeBSD 9.x or older). >>>>=20 >>>> On request from Jack, we=92ve eliminated lagg and vlan from the >>>> picture, which gives us plain ixgbe interfaces with no stacked >>>> interfaces on top of it. And we can still reproduce the problem. >>>>=20 >>> This was related to the "did if_hw_tsomax set tp->t_tsomax to the >>> same value?" question. Since you reported that my patch that set >>> if_hw_tsomax in the driver didn't fix the problem, that suggests >>> that tp->t_tsomax isn't being set to if_hw_tsomax from the driver, >>> but we don't know why? >>=20 >> Jack asked us to remove lagg/vlans in the very beginning of this >> thread, and when had done that, the problem was still there. So my >> answer was not related to your recent patch. I wanted to clarify >> that we have been testing with ixgbe only for quite some time and >> that stacked interfaces could not be a source of problems in our >> test scenario. >>=20 >> We have just started testing your patch that sets if_hw_tsomax >> yesterday. So far I have it running on two systems along with some >> printfs and the dtrace one-liner that watches over tp->t_tsomax in >> tcp_output(). So far we=92ve haven=92t had any problems with these = two >> servers, and the dtrace probe never fired, so far it looks like >> tp->t_tsomax always gets set from if_hw_tsomax. But it=92s too soon = to >> make a conclusion, it may take days to trigger the problem again. It >> might also be fixed with your patch. >>=20 > Righto. Setting if_hw_tsomax in the driver is supposed to set = tp->t_tsomax > and I could see it work in a trivial test (I hacked the code so the = assignments > are done for the non-tso case and it worked for the non-tso "re" = driver I run.) > { As an aside, one of these assignments does happen for non-tso cases, = since > although it is indented, there are no {} for the block. In tcp_subr.c = if I > recall. However, doing the assignment for the non-tso case seems = harmless to me. } >=20 >> I=92m booting more systems with the test kernel and I will be = watching >> all of them with dtrace to see I i find an occurence where >> tp->t_tsomax is off. I hope that with more systems, I=92ll have an >> answer more quickly. >>=20 >> But digging around the code, I still don=92t see a way how tp->tsomax >> could not have been set from if_hw_tsomax when there are no stacked >> interfaces=85 >>=20 > It seems to happen where you mentioned before. Since it only gets set > from cap.tsomax and that gets set from if_hw_tsomax, it would be 0 > otherwise. Sorry, my sentence was probably a bit misleading. What you=92re saying = is what I meant. There=92s the tcp_mss() -> tcp_mss_update() -> = tcp_maxmtu() call chain that ultimately sets tp->t_tsomax from = if_hw_tsomax (via the cap struct). tp->t_tsomax is indeed 0 on fresh = connections when tcp_mss() has not been called yet, I could confirm that = with dtrace. As soon as the connection gets running, it=92s set to = whatever the interface=92s if_hw_tsomax is. What I have _not_ found is another place that alters tp->t_tsomax, so I = really don=92t get how Christopher can see different values for = tp->t_tsomax. > Christopher sees in change when he changes IP_MAXPACET, so > the default setting works, but for him setting it in the driver = didn't, > for some reason? Christopher, can you run your tests again with default IP_MAXPACKET and = just Rick's if_hw_maxtso patch? I think it=92s important to confirm that = tp->t_tsomax is really off in that case, which you can easily test with = my dtrace one-liner. I=92m running this test too, but I will take me = much more time until I can make a statement. > Thanks for doing the testing, rick No problem. Thank you guys! >>=20 >> Markus >>=20 >>=20 >>> rick >>>=20 >>>>=20 >>>> Markus >>>>=20 >>>>=20 >>>>>=20 >>>>> rick >>>>>>=20 >>>>>> 10.0 Code: >>>>>>=20 >>>>>> 780 if (len > tp->t_tsomax - hdrlen) { !! >>>>>> 781 len =3D tp->t_tsomax - hdrlen; !! >>>>>> 782 sendalot =3D 1; >>>>>> 783 } >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> I've put debugging here, set the nic's max TSO as per Rick's >>>>>> patch >>>>>> ( >>>>>> set to say 32k), and have seen that tp->t_tsomax =3D=3D >>>>>> IP_MAXPACKET. >>>>>> It's being set someplace else, and thus our attempts to set TSO >>>>>> on >>>>>> the nic may be in vain. >>>>>>=20 >>>>>>=20 >>>>>> It may have mattered more in 9.2, as I see the code doesn't use >>>>>> tp->t_tsomax in some locations, and may actually default to what >>>>>> the >>>>>> nic is set to. >>>>>>=20 >>>>>> The NIC may still win, I didn't walk through the code to >>>>>> confirm, >>>>>> it >>>>>> was enough to suggest to me that setting TSO wouldn't fix this >>>>>> issue. >>>>>>=20 >>>>>>=20 >>>>>> However, this is still a TSO related issue, it's just not one >>>>>> related >>>>>> to the setting of TSO's max size. >>>>>>=20 >>>>>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a >>>>>> single >>>>>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a >>>>>> bit >>>>>> longer to increase confidence in this assertion, but I don't >>>>>> want >>>>>> to >>>>>> waste time on this when I could be logging problem packets on a >>>>>> system with TSO enabled. >>>>>>=20 >>>>>>=20 >>>>>> Comments are very welcome.. >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-net@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>> To unsubscribe, send any mail to >>>>> "freebsd-net-unsubscribe@freebsd.org" >>>>>=20 >>>>=20 >>>>=20 >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to >>> "freebsd-net-unsubscribe@freebsd.org" >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to >> "freebsd-net-unsubscribe@freebsd.org" >>=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2C869AD2-FC04-4E0B-9CBC-02FFA9AEC0EA>