Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jan 2015 22:26:33 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>
Subject:   Re: svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys
Message-ID:  <7C692107-51CF-4DFA-BD6C-623D56893150@bsdimp.com>
In-Reply-To: <54BE21F0.6010602@selasky.org>
References:  <201501151532.t0FFWV2Y037455@svn.freebsd.org> <CAJ-Vmok0GXZoojyi=jE=b5D-d338APztaf3Pw0_AAQ-173XSWw@mail.gmail.com> <54BDD9E1.6090505@selasky.org> <20150120075126.GA42409@kib.kiev.ua> <54BE0AAA.4050104@selasky.org> <20150120090057.GD42409@kib.kiev.ua> <54BE21F0.6010602@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Jan 20, 2015, at 2:37 AM, Hans Petter Selasky <hps@selasky.org> =
wrote:
>=20
> On 01/20/15 10:00, Konstantin Belousov wrote:
>> On Tue, Jan 20, 2015 at 08:58:34AM +0100, Hans Petter Selasky wrote:
>>> On 01/20/15 08:51, Konstantin Belousov wrote:
>>>> On Tue, Jan 20, 2015 at 05:30:25AM +0100, Hans Petter Selasky =
wrote:
>>>>> On 01/19/15 22:59, Adrian Chadd wrote:
>>>>>> Hi,
>>>>>>=20
>>>>>> Would you please check what the results of this are with CPU =
specific
>>>>>> callwheels?
>>>>>>=20
>>>>>> I'm doing some 10+ gig traffic testing on -HEAD with RSS enabled =
(on
>>>>>> ixgbe) and with this setup, the per-CPU TCP callwheel stuff is
>>>>>> enabled. But all the callwheels are now back on clock(0) and so =
is the
>>>>>> lock contention. :(
>>>>>>=20
>>>>>> Thanks,
>>>>>>=20
>>>>>=20
>>>>> Hi,
>>>>>=20
>>>>> Like stated in the manual page, callout_reset_curcpu/on() does not =
work
>>>>> with MPSAFE callouts any more!
>>>> I.e. you 'fixed' some undeterminate bugs in callout migration by =
not
>>>> doing migration at all anymore.
>>>>=20
>>>>>=20
>>>>> You need to use callout_init_{mtx,rm,rw} and remove the custom =
locking
>>>>> inside the callback in the TCP stack to get it working like =
before!
>>>>=20
>>>> No, you need to do this, if you think that whole callout KPI must =
be
>>>> rototiled.  It is up to the person who modifies the KPI, to ensure =
that
>>>> existing code is not broken.
>=20
> Hi,
>=20
> It is not very hard to update existing callout clients and you can do =
it too, if you need the extra bits of performance.
>=20
> Are there more API's than the TCP stack which you think needs an =
update and are performance critical?
>=20
>>>>=20
>>>> As I understand, currently we are back to the one-cpu callouts.
>>>> Do other people consider this situation acceptable ?
>=20
> For the TCP stack - yes, but not for other clients like cv_timedwait() =
and such.
>=20
> If you think you have a better way to solve the callout problems, =
please tell me! In order for a callout to change its CPU you need a lock =
to protect which CPU the callout is on. Instead of introducing a third =
lock in the callout path, which will be a congestion point, to protect =
against changing the CPU number, I decided that we will use the client's =
mutex and the MPSAFE implies the client doesn't have any mutex. So it =
won't work with callout clients which use the CALLOUT_MPSAFE flag. =
Honestly CALLOUT_MPSAFE should not be used, because it leads to extra =
complexity in the clients catching the race when tearing down the =
callouts and any pending callbacks.

Then it is incumbent on you to fix them. You can=92t just fix one =
instance and wash your hands of the problem.

Maybe this is a real and legitimate bug. However, until you=92ve =
followed your solution through by actually fixing the abusers of it, my =
confidence that another issue won=92t present itself is quite low. The =
code seems half baked to me. And from reading this thread, it seems like =
perhaps I=92m not the only one.

>>> Please read the callout 9 manual page first.
>>=20
>> Assume I read it.  How this changes any of my points above ?
>> """
>> A change in the CPU selection cannot happen if this function is
>> re-scheduled inside a callout function. Else the callback function =
given
>> by the func argument will be executed on the same CPU like previously
>> done.
>> """
>> You cannot do this without fixing consumers.
>>=20
>=20
> The code simply needs an update. It is not broken in any ways - right? =
If it is not broken, fixing it is not that urgent.

Radically changing the performance characteristics is breaking the code. =
Performance regression in the TCP stack is urgent to fix. Not being able =
to enumerate what all the consumers are that use this and provide an =
analysis about why they aren=92t important to fix is a bug in your =
process, and in your interaction with the project. We simply do not =
operate that way.

Warner=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7C692107-51CF-4DFA-BD6C-623D56893150>