Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Mar 2014 00:07:18 +0100
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Christopher Forgeron <csforgeron@gmail.com>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <2C869AD2-FC04-4E0B-9CBC-02FFA9AEC0EA@hostpoint.ch>
In-Reply-To: <1212254686.521736.1395786110243.JavaMail.root@uoguelph.ca>
References:  <1212254686.521736.1395786110243.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On 25.03.2014, at 23:21, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Markus Gebert wrote:
>>=20
>> On 25.03.2014, at 22:46, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>=20
>>> Markus Gebert wrote:
>>>>=20
>>>> On 25.03.2014, at 02:18, Rick Macklem <rmacklem@uoguelph.ca>
>>>> wrote:
>>>>=20
>>>>> Christopher Forgeron wrote:
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>> This is regarding the TSO patch that Rick suggested earlier.
>>>>>> (With
>>>>>> many thanks for his time and suggestion)
>>>>>>=20
>>>>>>=20
>>>>>> As I mentioned earlier, it did not fix the issue on a 10.0
>>>>>> system.
>>>>>> It
>>>>>> did make it less of a problem on 9.2, but either way, I think
>>>>>> it's
>>>>>> not needed, and shouldn't be considered as a patch for
>>>>>> testing/etc.
>>>>>>=20
>>>>>>=20
>>>>>> Patching TSO to anything other than a max value (and by default
>>>>>> the
>>>>>> code gives it IP_MAXPACKET) is confusing the matter, as the
>>>>>> packet
>>>>>> length ultimately needs to be adjusted for many things on the
>>>>>> fly
>>>>>> like TCP Options, etc. Using static header sizes won't be a good
>>>>>> idea.
>>>>>>=20
>>>>> If you look at tcp_output(), you'll notice that it doesn't do TSO
>>>>> if
>>>>> there are any options. That way it knows that the TCP/IP header
>>>>> is
>>>>> just hdrlen.
>>>>>=20
>>>>> If you don't limit the TSO packet (including TCP/IP and ethernet
>>>>> headers)
>>>>> to 64K, then the "ix" driver can't send them, which is the
>>>>> problem
>>>>> you guys are seeing.
>>>>>=20
>>>>> There are other ways to fix this problem, but they all may
>>>>> introduce
>>>>> issues that reducing if_hw_tsomax by a small amount does not.
>>>>> For example, m_defrag() could be modified to use 4K pagesize
>>>>> clusters,
>>>>> but this might introduce memory fragmentation problems. (I
>>>>> observed
>>>>> what I think are memory fragmentation problems when I switched
>>>>> NFS
>>>>> to use 4K pagesize clusters for large I/O messages.)
>>>>>=20
>>>>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG
>>>>> error replies), then that is the size that if_hw_tsomax can be
>>>>> set
>>>>> to (just can't change IP_MAXPACKET, but that is defined for other
>>>>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax
>>>>> defaults to. It has no other effect w.r.t. TSO.)
>>>>>=20
>>>>>>=20
>>>>>> Additionally, it seems that setting nic TSO will/may be ignored
>>>>>> by
>>>>>> code like this in sys/netinet/tcp_output.c:
>>>>>>=20
>>>>=20
>>>> Is this confirmed or still a =91it seems=92? Have you actually seen =
a
>>>> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was
>>>> this just speculation because the values are stored in different
>>>> places? (Sorry, if you already stated this in another email, it=92s
>>>> currently hard to keep track of all the information.)
>>>>=20
>>>> Anyway, this dtrace one-liner should be a good test if other
>>>> values
>>>> appear in tp->t_tsomax:
>>>>=20
>>>> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 &&
>>>> args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax:
>>>> %i\n", args[0]->t_tsomax); stack(); }'
>>>>=20
>>>> Remember to adjust the value in the condition to whatever you=92re
>>>> currently expecting. The value seems to be 0 for new connections,
>>>> probably when tcp_mss() has not been called yet. So that=92s seems
>>>> normal and I have excluded that case too. This will also print a
>>>> kernel stack trace in case it sees an unexpected value.
>>>>=20
>>>>=20
>>>>> Yes, but I don't know why.
>>>>> The only conjecture I can come up with is that another net driver
>>>>> is
>>>>> stacked above "ix" and the setting for if_hw_tsomax doesn't
>>>>> propagate
>>>>> up. (If you look at the commit log message for r251296, the
>>>>> intent
>>>>> of adding if_hw_tsomax was to allow device drivers to set a
>>>>> smaller
>>>>> tsomax than IP_MAXPACKET.)
>>>>>=20
>>>>> Are you using any of the "stacked" network device drivers like
>>>>> lagg? I don't even know what the others all are?
>>>>> Maybe someone else can list them?
>>>>=20
>>>> I guess the most obvious are lagg and vlan (and probably carp on
>>>> FreeBSD 9.x or older).
>>>>=20
>>>> On request from Jack, we=92ve eliminated lagg and vlan from the
>>>> picture, which gives us plain ixgbe interfaces with no stacked
>>>> interfaces on top of it. And we can still reproduce the problem.
>>>>=20
>>> This was related to the "did if_hw_tsomax set tp->t_tsomax to the
>>> same value?" question. Since you reported that my patch that set
>>> if_hw_tsomax in the driver didn't fix the problem, that suggests
>>> that tp->t_tsomax isn't being set to if_hw_tsomax from the driver,
>>> but we don't know why?
>>=20
>> Jack asked us to remove lagg/vlans in the very beginning of this
>> thread, and when had done that, the problem was still there. So my
>> answer was not related to your recent patch. I wanted to clarify
>> that we have been testing with ixgbe only for quite some time and
>> that stacked interfaces could not be a source of problems in our
>> test scenario.
>>=20
>> We have just started testing your patch that sets if_hw_tsomax
>> yesterday. So far I have it running on two systems along with some
>> printfs and the dtrace one-liner that watches over tp->t_tsomax in
>> tcp_output(). So far we=92ve haven=92t had any problems with these =
two
>> servers, and the dtrace probe never fired, so far it looks like
>> tp->t_tsomax always gets set from if_hw_tsomax. But it=92s too soon =
to
>> make a conclusion, it may take days to trigger the problem again. It
>> might also be fixed with your patch.
>>=20
> Righto. Setting if_hw_tsomax in the driver is supposed to set =
tp->t_tsomax
> and I could see it work in a trivial test (I hacked the code so the =
assignments
> are done for the non-tso case and it worked for the non-tso "re" =
driver I run.)
> { As an aside, one of these assignments does happen for non-tso cases, =
since
>  although it is indented, there are no {} for the block. In tcp_subr.c =
if I
>  recall. However, doing the assignment for the non-tso case seems =
harmless to me. }
>=20
>> I=92m booting more systems with the test kernel and I will be =
watching
>> all of them with dtrace to see I i find an occurence where
>> tp->t_tsomax is off. I hope that with more systems, I=92ll have an
>> answer more quickly.
>>=20
>> But digging around the code, I still don=92t see a way how tp->tsomax
>> could not have been set from if_hw_tsomax when there are no stacked
>> interfaces=85
>>=20
> It seems to happen where you mentioned before. Since it only gets set
> from cap.tsomax and that gets set from if_hw_tsomax, it would be 0
> otherwise.

Sorry, my sentence was probably a bit misleading. What you=92re saying =
is what I meant. There=92s the tcp_mss() -> tcp_mss_update() -> =
tcp_maxmtu() call chain that ultimately sets tp->t_tsomax from =
if_hw_tsomax (via the cap struct). tp->t_tsomax is indeed 0 on fresh =
connections when tcp_mss() has not been called yet, I could confirm that =
with dtrace. As soon as the connection gets running, it=92s set to =
whatever the interface=92s if_hw_tsomax is.

What I have _not_ found is another place that alters tp->t_tsomax, so I =
really don=92t get how Christopher can see different values for =
tp->t_tsomax.


> Christopher sees in change when he changes IP_MAXPACET, so
> the default setting works, but for him setting it in the driver =
didn't,
> for some reason?

Christopher, can you run your tests again with default IP_MAXPACKET and =
just Rick's if_hw_maxtso patch? I think it=92s important to confirm that =
tp->t_tsomax is really off in that case, which you can easily test with =
my dtrace one-liner. I=92m running this test too, but I will take me =
much more time until I can make a statement.

> Thanks for doing the testing, rick

No problem. Thank you guys!


>>=20
>> Markus
>>=20
>>=20
>>> rick
>>>=20
>>>>=20
>>>> Markus
>>>>=20
>>>>=20
>>>>>=20
>>>>> rick
>>>>>>=20
>>>>>> 10.0 Code:
>>>>>>=20
>>>>>> 780 if (len > tp->t_tsomax - hdrlen) { !!
>>>>>> 781 len =3D tp->t_tsomax - hdrlen; !!
>>>>>> 782 sendalot =3D 1;
>>>>>> 783 }
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>> I've put debugging here, set the nic's max TSO as per Rick's
>>>>>> patch
>>>>>> (
>>>>>> set to say 32k), and have seen that tp->t_tsomax =3D=3D
>>>>>> IP_MAXPACKET.
>>>>>> It's being set someplace else, and thus our attempts to set TSO
>>>>>> on
>>>>>> the nic may be in vain.
>>>>>>=20
>>>>>>=20
>>>>>> It may have mattered more in 9.2, as I see the code doesn't use
>>>>>> tp->t_tsomax in some locations, and may actually default to what
>>>>>> the
>>>>>> nic is set to.
>>>>>>=20
>>>>>> The NIC may still win, I didn't walk through the code to
>>>>>> confirm,
>>>>>> it
>>>>>> was enough to suggest to me that setting TSO wouldn't fix this
>>>>>> issue.
>>>>>>=20
>>>>>>=20
>>>>>> However, this is still a TSO related issue, it's just not one
>>>>>> related
>>>>>> to the setting of TSO's max size.
>>>>>>=20
>>>>>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a
>>>>>> single
>>>>>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a
>>>>>> bit
>>>>>> longer to increase confidence in this assertion, but I don't
>>>>>> want
>>>>>> to
>>>>>> waste time on this when I could be logging problem packets on a
>>>>>> system with TSO enabled.
>>>>>>=20
>>>>>>=20
>>>>>> Comments are very welcome..
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>> _______________________________________________
>>>>> freebsd-net@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>> To unsubscribe, send any mail to
>>>>> "freebsd-net-unsubscribe@freebsd.org"
>>>>>=20
>>>>=20
>>>>=20
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to
>>> "freebsd-net-unsubscribe@freebsd.org"
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to
>> "freebsd-net-unsubscribe@freebsd.org"
>>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2C869AD2-FC04-4E0B-9CBC-02FFA9AEC0EA>