From owner-freebsd-arch Mon Dec 13 16:33:51 1999 Delivered-To: freebsd-arch@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 0F06214BC2 for ; Mon, 13 Dec 1999 16:33:49 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id BAA17956 for ; Tue, 14 Dec 1999 01:33:47 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id BAA58648 for freebsd-arch@freebsd.org; Tue, 14 Dec 1999 01:33:46 +0100 (MET) Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 1EAC7151BE for ; Mon, 13 Dec 1999 16:33:34 -0800 (PST) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id RAA24422; Mon, 13 Dec 1999 17:33:04 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAN0ayEV; Mon Dec 13 17:32:52 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id RAA27290; Mon, 13 Dec 1999 17:33:12 -0700 (MST) From: Terry Lambert Message-Id: <199912140033.RAA27290@usr08.primenet.com> Subject: Re: Thread scheduling To: nate@mt.sri.com Date: Tue, 14 Dec 1999 00:33:12 +0000 (GMT) Cc: chuckr@picnic.mat.net, adsharma@sharmas.dhs.org, freebsd-arch@freebsd.org In-Reply-To: <199912110455.VAA24095@mt.sri.com> from "Nate Williams" at Dec 10, 99 09:55:29 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > OK, let me state it again. I wasn't asking if it was a good thing to > > share out threads among multiple processors, because the advantages of > > using a multiple CPU system *as* a multiple CPU seem obvious enough not to > > need asking. > > Except that it's possible that you may want to limit multiple-CPU's to > multiple processes for cache reasons. Once could argue that for certain > classes of threaded applications, you'd get a better hit rate by > sticking with a single CPU for *all* of the threads, although this > wouldn't be true for all threaded applications. The MESI cache coherency protocol solves most of this; the reload would only be from L2 to L1 cache (slow enough on most systems, I grant you). I don't really know what cache coherency protocol is used by the 4 CPU Alpha box Mike Smith has. I do know that the dual PPC-603 "BeBox" and several similar boxes didn't have anything like APIC support, and so they removed the L2 cache entirely, and used the cache signalling lines to implement MEI coherency. > It depends on the application. Right. I think, though, for most threaded applications, what you'd really want is negative affinity, where you want your quantum reservations for different threads to be most likely to be on a different CPU. This has two effects: o It allows multiple CPUs to be executing in user space in the same program, simultaneously o With a per CPU scheduler, you get rid of The Big Giant Lock(tm), in favor of a per CPU scheduler lock, that is only held by a different CPU when migrating processes for load balancing reasons. Even so, only the two CPUs involved in the migration get stalled, and, frankly, they most likely do not get stalled at all, so long as you stagger your quantum clock between the CPUs. This assumes that each thread is not pounding heavily on globally shared state with other threads (i.e. each thread has limited locality, and most locality is not in a contention domain). For the rare application that doesn't meet this (most likely, as a result of a bad design; very few problems would require this as part of their soloution), it might make sense to lockstep all, or more likely, just the badly behaved threads, onto the same processor. > > I was asking to see if it would be a good thing to add a > > strong bias to the system, in such a way as to make the co-scheduling of > > threads among the different processors so that all processors that are > > made available to the program's threads are executing in that address > > space as the same moment in time. > > I wouldn't think it would help for cache rate and/or CPU usage, but > that's just a gut feeling and not based on anything else. Each CPU in > an SMP system has it's own cache, so what happens on another CPU > shouldn't effect how the one CPU performs. > > Adding this bias wouldn't help, and may in fact make things worse (see > above). I would go further: I believe that it would make the scheduler significantly more complicated than it has to be, as well as setting The Big Giant Lock(tm) deeper into cement. It would also tend to result in starvation of other processes, unless you delayed scheduling, since accelerating it each time another thread in the process became ready to run would have the effect of double-booking quantum for the process. Finally, it would also tend to leave the processors partially idle (stalled) as a result of voluntary context switches not occurring at exactly the same place in all threads (i.e. a cache miss on a read from disk putting the caller to sleep). > > Not a guarantee, but would it be a good thing to have them > > "co-scheduled" (or a bias towards that likelihood). > > But, I can't see any advantage to have them co-scheduled. Gang scheduling is more appropriate to non-NUMA multiprocessors working on data flow problems. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message