Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Mar 2018 13:26:00 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        "Kononov, Oleksandr" <oleksandr.kononov@intel.com>
Cc:        "freebsd-drivers@freebsd.org" <freebsd-drivers@freebsd.org>, "Vanco, Juraj" <juraj.vanco@intel.com>
Subject:   Re: FreeBSD 11.1 contigfree performance issue
Message-ID:  <20180327102600.GY76926@kib.kiev.ua>
In-Reply-To: <865AA1660A1A014C99D99B800FA40800813681@IRSMSX101.ger.corp.intel.com>
References:  <865AA1660A1A014C99D99B800FA40800813681@IRSMSX101.ger.corp.intel.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 27, 2018 at 09:53:11AM +0000, Kononov, Oleksandr wrote:
> 
> I am using FreeBSD 11.1 -RELEASE-amd64 running on a single 32 core CPU and am having issues with contigmalloc performance.
> Timing the function using rdtsc shows that it uses up on average about 10 million cycles on that function along.
> Using the same code, FreeBSD and timing method I ran it on anther machine on two CPU's with a total of 32 cores.
> This gave about 12 thousand cycles on that function.
> 
> Digging through the source code (on the single CPU) I found that
> 
> smp_targeted_tlb_shootdown function in /usr/src/sys/x86/x86/mp_x86.c
> cause the majority of performance hit due to some cores remaining in a paused state longer after the
> interrupt was send to them.
> 
> 
> I attached a sample code and Makefile in this email.
> 
> Steps to recreate (and show rdtsc cycles):
> 
> $ make
> $ kldload ./test.ko
> $ dmesg
> 
> If anyone has any idea what is the cause of this issue, it would be greatly appreciated.
You noted that CPUs have very long time coming out of the idle state.
What is the output of sysctl dev.cpu ?  In particular, look at the cx_*
MIBs and if you have configured deep sleep modes, try to step it back.
Also look at what idle method (cx_method) is used, and if MWAIT is problematic,
change to the legacy idling, see acpi(4) and debug.acpi.disable="mwait"
knob.

What are those CPUs ? Wouldn't such high latency of waking up cores make
the interactive performance of the machine a mess already.

In principle, we can avoid sending shutdowns to the idle cores, instead
doing global TLB flush on wakeup of the core if some percpu flag is set.
But as I noted in the paragraph before, it should be quite bad even with
IPI latency fixed, for other reasons.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180327102600.GY76926>