From owner-freebsd-bugs@FreeBSD.ORG Wed Aug 25 05:00:24 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DB811065735 for ; Wed, 25 Aug 2010 05:00:24 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1FC4B8FC3D for ; Wed, 25 Aug 2010 05:00:19 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7P50IVO090137 for ; Wed, 25 Aug 2010 05:00:18 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7P50IvS090136; Wed, 25 Aug 2010 05:00:18 GMT (envelope-from gnats) Date: Wed, 25 Aug 2010 05:00:18 GMT Message-Id: <201008250500.o7P50IvS090136@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Jeff Roberson Cc: Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some SMT-enabled Intel procs X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jeff Roberson List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Aug 2010 05:00:24 -0000 The following reply was made to PR kern/145385; it has been noted by GNATS. From: Jeff Roberson To: Garrett Cooper Cc: bug-followup@freebsd.org, jkim@freebsd.org, Attilio Rao , jeff@freebsd.org Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some SMT-enabled Intel procs Date: Tue, 24 Aug 2010 18:53:25 -1000 (HST) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --2547152148-1953797491-1282712009=:23448 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8BIT On Tue, 24 Aug 2010, Garrett Cooper wrote: > On Tue, Aug 24, 2010 at 3:45 PM, Garrett Cooper wrote: >> On Tue, Aug 24, 2010 at 2:51 PM, Garrett Cooper wrote: >>> On Aug 24, 2010, at 2:03 PM, Jeff Roberson wrote: >>> >>> >>> On Tue, 24 Aug 2010, Garrett Cooper wrote: >>> >>> On Tue, Aug 24, 2010 at 12:22 PM, Jeff Roberson >>> wrote: >>> >>> On Tue, 24 Aug 2010, Garrett Cooper wrote: >>> >>> On Mon, Aug 23, 2010 at 6:33 AM, John Baldwin wrote: >>> >>> On Sunday, August 22, 2010 4:17:37 am Garrett Cooper wrote: >>> >>>       The following trivial patch fixes the issue on my W3520 processor; >>> >>> AFAICS >>> >>> it's what should be done after reading several of the specs because the >>> >>> logical count that's tracked with ebx is exactly what is needed for >>> >>> logical_cpus (it's an absolute quantity). I need to verify it with a >>> >>> multi-cpu >>> >>> topology at work (the two r710s I was testing with E-series Xeons on >>> >>> aren't >>> >>> available remotely right now). >>> >>> Thanks! >>> >>> -Garrett >>> >>> Jung-uk Kim and Attilio Rao have both been looking at this code recently >>> >>> and >>> >>> are in a better position to review the patch in the PR. >>> >>> (Moving jhb@ to BCC, adding jeff@ for possible input on ULE) >>> >>> The patch works as expected (it now properly detects the SMIT CPUs as >>> >>> logical CPUs), but setting machdep.hlt_logical_cpus=1 causes other >>> >>> problems with scheduling tasks because certain kernel threads get >>> >>> stuck at boot when netbooting (in particular I've seen problems with >>> >>> usbhub* and a few others bits), so in order for >>> >>> machdep.hlt_logical_cpus to be fixed on SMT processors, it might >>> >>> require some changes to the ULE scheduler to shuffle around the >>> >>> threads to available cores/processors? >>> >>> >>> hlt_logical_cpus should be rewritten to use cpusets to change the default >>> >>> system set rather than specifically halting those cpus.  There are a number >>> >>> of loops in the kernel that iterate over all cpus and attempt to bind and >>> >>> perform some task.  I think there are a number of other reasons to prefer a >>> >>> less aggressive approach to avoiding the logical cpus as well. Simply >>> >>> preventing user thread schedule will achieve the intent of the sysctl in any >>> >>> event. >>> >>>   Ok... in that event then the bug is ok, but maybe I should add >>> >>> some code to the patch to warn the user about functional issues >>> >>> associated with halting logical CPUs? >>> >>> I don't think the bug is ok.  We probably shouldn't have sysctls which >>> readily break the kernel.  As I said we should instead have the sysctl >>> backend to cpuset.  It shouldn't take more than an hour to code and test. >> >>    Ok.. I'll look at this once I have my other system back online so >> I can actively break something until I get it to work. > > BTW... there's a lot of code in machdep.c that does the same thing > to idle the CPU, for instance, cpu_idle_hlt, cpu_idle_acpi, > cpu_idle_amdc1e (on amd64). What should be done about those cases > (same thing, or different)? Those are the actual idle functions that the scheduler uses. Those are safe. Thanks, Jeff > Thanks, > -Garrett > --2547152148-1953797491-1282712009=:23448--