Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Aug 2010 04:50:08 GMT
From:      Jeff Roberson <jroberson@jroberson.net>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: kern/145385: [cpu] Logical processor cannot be disabled for some SMT-enabled Intel procs
Message-ID:  <201008260450.o7Q4o8I7021196@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/145385; it has been noted by GNATS.

From: Jeff Roberson <jroberson@jroberson.net>
To: Garrett Cooper <gcooper@FreeBSD.org>
Cc: bug-followup@freebsd.org, jkim@freebsd.org, 
    Attilio Rao <attilio@freebsd.org>, jeff@freebsd.org
Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some
 SMT-enabled Intel procs
Date: Wed, 25 Aug 2010 18:44:16 -1000 (HST)

   This message is in MIME format.  The first part should be readable text,
   while the remaining parts are likely unreadable without MIME-aware tools.
 
 --2547152148-1230734415-1282797860=:23448
 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 8BIT
 
 On Wed, 25 Aug 2010, Garrett Cooper wrote:
 
 > On Tue, Aug 24, 2010 at 9:53 PM, Jeff Roberson <jroberson@jroberson.net> wrote:
 >> On Tue, 24 Aug 2010, Garrett Cooper wrote:
 >>
 >>> On Tue, Aug 24, 2010 at 3:45 PM, Garrett Cooper <gcooper@freebsd.org>
 >>> wrote:
 >>>>
 >>>> On Tue, Aug 24, 2010 at 2:51 PM, Garrett Cooper <yanegomi@gmail.com>
 >>>> wrote:
 >>>>>
 >>>>> On Aug 24, 2010, at 2:03 PM, Jeff Roberson wrote:
 >>>>>
 >>>>>
 >>>>> On Tue, 24 Aug 2010, Garrett Cooper wrote:
 >>>>>
 >>>>> On Tue, Aug 24, 2010 at 12:22 PM, Jeff Roberson
 >>>>> <jroberson@jroberson.net>
 >>>>> wrote:
 >>>>>
 >>>>> On Tue, 24 Aug 2010, Garrett Cooper wrote:
 >>>>>
 >>>>> On Mon, Aug 23, 2010 at 6:33 AM, John Baldwin <jhb@freebsd.org> wrote:
 >>>>>
 >>>>> On Sunday, August 22, 2010 4:17:37 am Garrett Cooper wrote:
 >>>>>
 >>>>>       The following trivial patch fixes the issue on my W3520 processor;
 >>>>>
 >>>>> AFAICS
 >>>>>
 >>>>> it's what should be done after reading several of the specs because the
 >>>>>
 >>>>> logical count that's tracked with ebx is exactly what is needed for
 >>>>>
 >>>>> logical_cpus (it's an absolute quantity). I need to verify it with a
 >>>>>
 >>>>> multi-cpu
 >>>>>
 >>>>> topology at work (the two r710s I was testing with E-series Xeons on
 >>>>>
 >>>>> aren't
 >>>>>
 >>>>> available remotely right now).
 >>>>>
 >>>>> Thanks!
 >>>>>
 >>>>> -Garrett
 >>>>>
 >>>>> Jung-uk Kim and Attilio Rao have both been looking at this code recently
 >>>>>
 >>>>> and
 >>>>>
 >>>>> are in a better position to review the patch in the PR.
 >>>>>
 >>>>> (Moving jhb@ to BCC, adding jeff@ for possible input on ULE)
 >>>>>
 >>>>> The patch works as expected (it now properly detects the SMIT CPUs as
 >>>>>
 >>>>> logical CPUs), but setting machdep.hlt_logical_cpus=1 causes other
 >>>>>
 >>>>> problems with scheduling tasks because certain kernel threads get
 >>>>>
 >>>>> stuck at boot when netbooting (in particular I've seen problems with
 >>>>>
 >>>>> usbhub* and a few others bits), so in order for
 >>>>>
 >>>>> machdep.hlt_logical_cpus to be fixed on SMT processors, it might
 >>>>>
 >>>>> require some changes to the ULE scheduler to shuffle around the
 >>>>>
 >>>>> threads to available cores/processors?
 >>>>>
 >>>>>
 >>>>> hlt_logical_cpus should be rewritten to use cpusets to change the
 >>>>> default
 >>>>>
 >>>>> system set rather than specifically halting those cpus.  There are a
 >>>>> number
 >>>>>
 >>>>> of loops in the kernel that iterate over all cpus and attempt to bind
 >>>>> and
 >>>>>
 >>>>> perform some task.  I think there are a number of other reasons to
 >>>>> prefer a
 >>>>>
 >>>>> less aggressive approach to avoiding the logical cpus as well. Simply
 >>>>>
 >>>>> preventing user thread schedule will achieve the intent of the sysctl in
 >>>>> any
 >>>>>
 >>>>> event.
 >>>>>
 >>>>>   Ok... in that event then the bug is ok, but maybe I should add
 >>>>>
 >>>>> some code to the patch to warn the user about functional issues
 >>>>>
 >>>>> associated with halting logical CPUs?
 >>>>>
 >>>>> I don't think the bug is ok.  We probably shouldn't have sysctls which
 >>>>> readily break the kernel.  As I said we should instead have the sysctl
 >>>>> backend to cpuset.  It shouldn't take more than an hour to code and
 >>>>> test.
 >>>>
 >>>>    Ok.. I'll look at this once I have my other system back online so
 >>>> I can actively break something until I get it to work.
 >>>
 >>>   BTW... there's a lot of code in machdep.c that does the same thing
 >>> to idle the CPU, for instance, cpu_idle_hlt, cpu_idle_acpi,
 >>> cpu_idle_amdc1e (on amd64). What should be done about those cases
 >>> (same thing, or different)?
 >>
 >> Those are the actual idle functions that the scheduler uses.  Those are
 >> safe.
 >
 >    I'll look into running this on a Nehalem processor machine, but
 > this appears to as expected on my Penryn processor test machine with
 > machdep.hlt_cpus = { 110, 101, 11, 0 } and with machdep.idle=acpi; I'm
 > not sure if the if the loop is supposed to be there still, but it
 > wouldn't make sense because the CPU would be spinning in the kernel.
 
 This doesn't actually idle the cores.  You need to change the root cpuset 
 to remove cpus.
 
 Jeff
 
 > Thanks,
 > -Garrett
 >
 --2547152148-1230734415-1282797860=:23448--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201008260450.o7Q4o8I7021196>