Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Nov 1998 00:22:11 +0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Chuck Robey <chuckr@mat.net>
Cc:        Daniel Eischen <eischen@vigrid.com>, lists@tar.com, current@FreeBSD.ORG, jb@cimlogic.com.au
Subject:   Re: Kernel threading (was Re: Thread Scheduler bug) 
Message-ID:  <199811011622.AAA25247@spinner.netplex.com.au>
In-Reply-To: Your message of "Sun, 01 Nov 1998 10:54:28 EST." <Pine.BSF.4.05.9811011041430.306-100000@picnic.mat.net> 

next in thread | previous in thread | raw e-mail | index | archive | help
Chuck Robey wrote:
> On Sun, 1 Nov 1998, Peter Wemm wrote:
> 
> > > I'd like to help in this effort, but I'd first like to see   
> > > exactly what threading model is desired.  Do we want a Solaris
> > > lightweight process model with the ability have both bound
> > > and unbound user threads?  Or do we want libpthread to keep
> > > a one-one mapping of threads to kernel threads?
> > 
> > I'm not familiar with Solaris threads but have seen the man pages for the 
> > SVR4.2-whatever-it-is-this-week as used in Unixware.
> > 
> > What I had in mind was something along the lines of:
> > - the kernel context switching entity would become a 'thread' rather than 
> > a 'proc'.
> > - a "process" (struct proc) would have one or more threads, all using the 
> > same address space, pid, signals, etc.
> > - The logistics of doing this are ugly.  I don't expect that a global
> > 's/sturct proc/struct thread/g' would go down well.
> > - the boundaries between the present 'struct proc', pcb, upages, etc would 
> > get muddied a bit in the process.  The context that a thread would need 
> > would be something like a superset of the pcb at present.
> > - A thread would have just enough context for making syscalls etc.
> > - context switching between threads would be bloody quick, as good or 
> > better than switching between rfork shared address space siblings.
> > - There would be one kernel stack per thread, up to a limit of the number 
> > of present CPU in operation.  If you had 4 cpus and 1000 threads, you 
> > still only need 4 stacks and other associated things (PTD etc).
> > - It would be nice to have some sort of cooperative kernel<->user 
> > scheduling interface so that it would be possible to have the libpthread 
> > "engine" schedule it's pthreads onto kernel threads.  Suppose one wants a 
> > few thousand pthreads, but only realistically needs 10 or 20 of them to 
> > block in syscalls at any given time.
> 
> The way it's put, it *seems* like you're saying that a thread needs more
> context than a proc, but since the proc context really must be shared
> among all it's threads, you wouldn't want to duplicate the proc context,
> you'd just want to make the right proc context available to the right
> threads, right?

Err, a thread needs much less context than a proc, otherwise large numbers
of them become very expensive.

A process, as it presently exists, consists of all the VM, permissions, 
state, execution context, signals, etc etc.  Part of the process context 
is in the struct proc, part is in the pcb and upages with the kernel stack.

What would be ideal would be that a 'process' could gain multiple 
execution contexts, and each of these could make syscalls and block on IO 
etc.

> The idea is that threads have less context, meaning less complicated
> context switching (and lower overhead for that context switching),
> right?

Yes.

> The idea about each thread having it's own kernel stack seems
> unavoidable, but that stack could be pretty limited in size, and
> actually settable size, right?  I'm wondering about memory allocation
> here for threads ... in a proc now, if you run out of stack, it grows
> for you.  I think that would be too big a hammer on the system for each
> thread, wouldn't it?  If a thread ran out of stack, would it just give a
> signal indicating the problem, and let the thread itself (or a thread
> managing thread) handle new stack problems.  I mean, threads handle
> stack setup themselves, when they start, unlike processes, and it ought
> to be as lightweight as possible.

The tricky part comes when you try and get this to fly on a SMP system, 
where the same process could really be executing in parallel on multiple 
cpus at once.

The kernel stack and user stack are two different things.  The processes 
switch to the kernel stack for interrupts, traps and syscalls.  This is in 
the UPAGES at present, 8K of kvm per process.  The pcb and signal state 
live in the first part of the first page, so there's less than 8K of 
kernel stack per process - actually there is no guard at all, the kernel 
stack could conceivably grow down and clobber the snot out of the signal 
handlers and the pcb, but the system would be in big trouble by then and a 
process coredump would be the last thing on your mind.

You need a kernel stack per thread in the lightweight model, up to a limit 
of the number of cpus running, because it's needed for each possibly 
active thread to make a syscall.

> I'm not an expert, these are questions.  I don't want threads to become
> little processes with implicitly shared memories, I want them to be
> lightweight, as they were originally intended.  The only reason to move
> to kernel threads at all is to make the signal management lighter in
> weight, and get single thread blocking on syscalls, right?

Two reasons..  blocking on syscalls, and SMP support.  A select() buzz 
loop scheduler like in libc_r gets no gain from a SMP system, and that's 
the main gain to be had.

There are other problems that need to be resolved for SMP address space
sharing.  Activating the same PTD from a single address space blows away
the per-cpu private pages and this really screws things up since both cpus
aquire the same cpuid and explode.  Teaching the pmap code about multiple
PTD's per process (per shared rfork() thread, up to a max of numcpus again)
is an interesting problem - I wonder if it might be easier to simply have 
a different PTD for the kernel for each CPU and switch from the user PTD 
to the kernel one at trap/syscall/interrupt time.  This means major 
changes to the protection interface, copyin/out etc are presently done by 
having the kernel read the user pages and faulting if needed.  Under a 
per-cpu kernel space PTD, the kernel would have to do much more work to 
access the current user process address space.

> I guess I'm trying to emphasize "lightweight".

I know, so am I.

Cheers,
-Peter
--
Peter Wemm <peter@netplex.com.au>   Netplex Consulting
"No coffee, No workee!" :-)



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811011622.AAA25247>