Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Nov 2012 11:36:13 -0700
From:      Jim Harris <jim.harris@gmail.com>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: CACHE_LINE_SIZE on x86
Message-ID:  <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO%2BQ0x67KrM7w@mail.gmail.com>
In-Reply-To: <50928AE5.4010107@freebsd.org>
References:  <CAJP=Hc_F%2B-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com> <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com> <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com> <50928AE5.4010107@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 1, 2012 at 7:44 AM, Andre Oppermann <andre@freebsd.org> wrote:

> On 01.11.2012 01:50, Jim Harris wrote:
>
>>
>>
>> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com<mailto:
>> jim.harris@gmail.com>> wrote:
>>
>>
>>     On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org<mailto:
>> jhb@freebsd.org>> wrote:
>>      >
>>      > It would be good to know though if there are performance benefits
>> from
>>      > avoiding sharing across paired lines in this manner.  Even if it
>> has
>>      > its own MOESI state, there might still be negative effects from
>> sharing
>>      > the pair.
>>
>>     On 2S, I do see further benefits by using 128 byte padding instead of
>>     64.  On 1S, I see no difference.  I've been meaning to turn off
>>     prefetching on my system to see if it has any effect in the 2S case -
>>     I can give that a shot tomorrow.
>>
>>
>> So tomorrow turned into next week, but I have some data finally.
>>
>> I've updated to HEAD from today, including all of the mtx_padalign
>> changes.  I tested 64 v. 128 byte
>> alignment on 2S amd64 (SNB Xeon).  My BIOS also has a knob to disable the
>> adjacent line prefetching
>> (MLC spatial prefetcher), so I ran both 64b and 128b against this
>> specific prefetcher both enabled
>> and disabled.
>>
>> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in
>> CPU utilization by using 128b
>> padding instead of 64b.
>>
>
> Just to be sure.  The numbers you show are just for the one location you've
> converted to the new padded mutex and a particular test case?
>

There are two locations actually - the struct tdq lock in the ULE
scheduler, and the callout_cpu lock in kern_timeout.c.

And yes, I've been only running a custom benchmark I developed here to help
to try to uncover some of these areas of spinlock contention.  It was
originally used for NVMe driver performance testing, but has been helpful
in uncovering some other issues outside of the NVMe driver itself (such as
these contended spinlocks).  It spawns a large number of kernel threads,
each of which submits an I/O and then sleeps until it is woken by the
interrupt thread when the I/O completes.  It stresses the scheduler and
also callout since I start and stop a timer for each I/O.

I think the only thing proves is that there is benefit to having x86
CACHE_LINE_SIZE still set to 128.

Thanks,

-Jim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO%2BQ0x67KrM7w>