From owner-freebsd-arch  Fri Sep 20 13:53: 5 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id AAB8337B401; Fri, 20 Sep 2002 13:53:02 -0700 (PDT)
Received: from snipe.mail.pas.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3D70543E3B; Fri, 20 Sep 2002 13:53:02 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0248.cvx40-bradley.dialup.earthlink.net ([216.244.42.248] helo=mindspring.com)
	by snipe.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 17sUln-0004J7-00; Fri, 20 Sep 2002 13:53:00 -0700
Message-ID: <3D8B8A63.9B3DE20B@mindspring.com>
Date: Fri, 20 Sep 2002 13:51:47 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Jon Mini <mini@freebsd.org>
Cc: Daniel Eischen <eischen@pcnet1.pcnet.com>,
	Bill Huey <billh@gnuppy.monkey.org>, freebsd-arch@FreeBSD.ORG
Subject: Re: New Linux threading model
References: <Pine.GSO.4.10.10209200002280.2162-100000@pcnet1.pcnet.com> <3D8B62DB.C27B7E07@mindspring.com> <20020920191244.GY24394@elvis.mu.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

Jon Mini wrote:
> It's not scontext(2), but setcontext(2) -- of Solaris fame. Currnetly,
> we have {get,set,swap}context(3), but being in userland causes some
> interesting race conditions. Really, these functions needs to be
> atomic from the process's perspective, and since they have to call
> sigprocmask(2) anyways, the best solution is to just move them into
> the kernel. This is how Solaris does it, among others.

Well, it's *a* solution, anyway.  ;^).  The reason Solaris does it,
though, is because it can't know about the existant register frames
when it comes to a push, and so it has to put an explicit, rather
than implicit, stall barrier in there to make sure.  Otherwise, you
would need to unwind the context switches in the reverse order they
were originally made.  The Keppel paper has details:

	http://citeseer.nj.nec.com/keppel91register.html
	Register Windows and User-Space Threads on the SPARC
	David Keppel
	Department of Computer Science and Engineering
	University of Washington


> Under KSE, we needn't consult the kernel for thread context swaps,
> because we can enter a critical section and avoid the race conditions
> endemic with setcontext(2). Also, we don't modify the process signal
> mask when we swap thread contexts, so we don't need to call
> sigprocmask(2).

Which kind of begs the question of why it needs to be there, or be
called, which is what I was saying.  I think there are legitimate
reasons for having it, so it's not avoidable, like the Linux paper
implies, but I don't agree with the Linux reasons they say it's needed
for N:M threads.  You argument here invalidates a couple of theirs,
fromt he paper, but not all of them.


> > I think the "the Linux scheduler is O(1), so we don't have to
> > care" argument is invalid; O(1) means that for N entries, there
> > need to be N elements traversed 1 time.  What this really means
> > is that, as the number of schedulable entitites goes up, the time
> > required goes up linearly, as opposed to going up exponentially;
> > or, better, to *not* going up in the first place.
> 
> Terry? You must have misspoken here. O(N) is linear, O(1) is constant.

See Rik's posting; My N in this case is not the N in N:M, it's
what Rik's calling 'n'.  I've upcased it to make it visually
distinct in my text; sorry if that confused things.  Over the
set of all processes, it *is* a linear algorithm.  Scheduling
the next thing to run is not as interesting as scheduling the
thing you are descheduling now so that it's run *again*.  The
distance that needs consideration is the distance between the
times that it's scheduled.  If you think about this in the
context of my microbenchmarking comments, this should be more
clear.


> > One exception is the use
> > "futex" wakeup in order to improve thread joins: FreeBSD should
> > look closely at this.
> 
> "Futexes" are not new. We had this at Be, but we called them Bennaphores.

I didn't mean looking closely at it as a new technology, I meant
looking closely at it because the current FreeBSD recursion-able
mutex implementation is really too heavy weight for the problem
at hand.  The "futex" (or "bennaphore" or whatever) implementation
differs in that it has significantly lower overhead, with the cost
being that you can't just regrab a lock, and expect it to be
magically counted up and down.

If you've ever programmed timer code in the Windows 95/98/NT/XP/2000
kernels, the timers basically run on whatever kernel thread is
available to run on, rather than a specific thread (kernel threads
only provide context).  This basically means that you have to build
non-reentrant semaphores on top of the kernel services that are
already there, or you can grab a semaphore in a normal operation,
have a timer fire, and, even though it's technically a seperate
context, in theory, in application, you end up being allowed to grab
a semaphore that is already grabbed by the kernel context that the
timer is "borrowing" to run itself.  Matt Day ran into this with the
soft updates syncer in our port of the Heidemann stacking VFS code
to Windows 95 (different soft updates implementation than Kirk's
code; it predates Kirks work by a couple of years).  The upshot is
that things you think are protected aren't really protected, under
certain conditions that, while uncommon, are still possible.

My personal preference is for the tradeoff that Linux made here,
where they ate the code refactoring overhead implied by failure
to permit recursive acquisition.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message