Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Mar 2016 11:16:21 -0700
From:      Stanislav Sedov <stas@FreeBSD.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Adrian Chadd <adrian.chadd@gmail.com>, Konstantin Belousov <kostikbel@gmail.com>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Peformance issues with r278325
Message-ID:  <8EE51E0E-41F4-4B5A-A755-B58E8E1D1776@FreeBSD.org>
In-Reply-To: <3277812.DVsZx4uMun@ralph.baldwin.cx>
References:  <FA50A68E-7F3D-4361-8A8A-EB7F97EF3D00@FreeBSD.org> <8403291.NqUNo0Qq5W@ralph.baldwin.cx> <CAJ-VmonUdHRJ0jcFphqXi1w2PwbDycLG190Yd5z8JQWGxW_1iQ@mail.gmail.com> <3277812.DVsZx4uMun@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Mar 18, 2016, at 10:37 AM, John Baldwin <jhb@freebsd.org> wrote:
>=20
> On Thursday, March 17, 2016 10:45:39 AM Adrian Chadd wrote:
>> On 17 March 2016 at 10:23, John Baldwin <jhb@freebsd.org> wrote:
>>=20
>>>=20
>>> I had not expected this commit to have this impact, but Konstantin =
is correct.
>>> I wonder if simply reducing the DELAY() from 5 down to 1 or so would =
be
>>> sufficient?  (Note that you need to adjust the prior loop to use +=3D =
1 instead
>>> of +=3D 5 in that case.)
>>>=20
>>> Note that DETECT_DEADLOCK is not enabled by default, so the =
AFTER_SPIN case
>>> (which waits for an IPI just sent to be delivered) shouldn't be =
enabled (and
>>> in fact I'd like to just remove that code entirely).  This means =
that only
>>> BEFORE_SPIN should be spinning, and it should only be spinning if a =
CPU sends
>>> IPIs back to back such that the previous IPI is still pending (not =
yet
>>> delivered) when the CPU wants to send another IPI.
>>>=20
>>> We can probably assume a TSC if we have SMP, so if changing the =
delay from 5
>>> to 1 doesn't work we can try just using the TSC directly to control =
the
>>> spin length and go back to using a simple pause.
>>>=20
>>> I have an old set of changes that might also be interesting that =
permit
>>> TLB shootdown IPI handlers to run while spinlocks are held (by using =
cr8/TPR
>>> to control interrupts when a spinlock is held instead of disabling =
all
>>> interrupts).  I haven't found a workload where that helped yet.  =
However,
>>> yours might be an interesting workload to try those changes out on.
>>=20
>> Do you think it's worth just reverting it for now just so it lands in
>> 10.3-RELEASE?
>=20
> Probably.  If the '1' change fixes it that is a simple test, otherwise =
we
> can revert in 10.x.  I think I'll likely just convert it to use a =
direct
> TSC delay loop always in HEAD (assuming that verifies ok in testing as =
well).
>=20

FWIW we are currently testing the delay '1' change.  Unfortunately, the =
test is
not easy to repeat (we didn't find a synthetic one yet that results in =
the same
outcome), so it does take more time that I would like.  Will follow up =
with the=20
results.

We did try HEAD as well a while ago, and although it exhibited the same =
pattern.
However it did not utilize the x2apic unfortunately, as it does seem to =
be disables
in the BIOS (FreeBSD reports it being disables in the DMAR table).

Thanks for looking into it!

--
Stanislav Sedov
ST4096-RIPE





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8EE51E0E-41F4-4B5A-A755-B58E8E1D1776>