Date: Thu, 1 Nov 2012 11:36:13 -0700 From: Jim Harris <jim.harris@gmail.com> To: Andre Oppermann <andre@freebsd.org> Cc: Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org Subject: Re: CACHE_LINE_SIZE on x86 Message-ID: <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO%2BQ0x67KrM7w@mail.gmail.com> In-Reply-To: <50928AE5.4010107@freebsd.org> References: <CAJP=Hc_F%2B-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com> <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com> <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com> <50928AE5.4010107@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 1, 2012 at 7:44 AM, Andre Oppermann <andre@freebsd.org> wrote: > On 01.11.2012 01:50, Jim Harris wrote: > >> >> >> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com<mailto: >> jim.harris@gmail.com>> wrote: >> >> >> On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org<mailto: >> jhb@freebsd.org>> wrote: >> > >> > It would be good to know though if there are performance benefits >> from >> > avoiding sharing across paired lines in this manner. Even if it >> has >> > its own MOESI state, there might still be negative effects from >> sharing >> > the pair. >> >> On 2S, I do see further benefits by using 128 byte padding instead of >> 64. On 1S, I see no difference. I've been meaning to turn off >> prefetching on my system to see if it has any effect in the 2S case - >> I can give that a shot tomorrow. >> >> >> So tomorrow turned into next week, but I have some data finally. >> >> I've updated to HEAD from today, including all of the mtx_padalign >> changes. I tested 64 v. 128 byte >> alignment on 2S amd64 (SNB Xeon). My BIOS also has a knob to disable the >> adjacent line prefetching >> (MLC spatial prefetcher), so I ran both 64b and 128b against this >> specific prefetcher both enabled >> and disabled. >> >> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in >> CPU utilization by using 128b >> padding instead of 64b. >> > > Just to be sure. The numbers you show are just for the one location you've > converted to the new padded mutex and a particular test case? > There are two locations actually - the struct tdq lock in the ULE scheduler, and the callout_cpu lock in kern_timeout.c. And yes, I've been only running a custom benchmark I developed here to help to try to uncover some of these areas of spinlock contention. It was originally used for NVMe driver performance testing, but has been helpful in uncovering some other issues outside of the NVMe driver itself (such as these contended spinlocks). It spawns a large number of kernel threads, each of which submits an I/O and then sleeps until it is woken by the interrupt thread when the I/O completes. It stresses the scheduler and also callout since I start and stop a timer for each I/O. I think the only thing proves is that there is benefit to having x86 CACHE_LINE_SIZE still set to 128. Thanks, -Jim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO%2BQ0x67KrM7w>