From owner-freebsd-bugs  Mon Nov  2 21:00:02 1998
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id VAA11287
          for freebsd-bugs-outgoing; Mon, 2 Nov 1998 21:00:02 -0800 (PST)
          (envelope-from owner-freebsd-bugs@FreeBSD.ORG)
Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA11237
          for <freebsd-bugs@FreeBSD.org>; Mon, 2 Nov 1998 20:59:59 -0800 (PST)
          (envelope-from gnats@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.8.8/8.8.5) id VAA02637;
	Mon, 2 Nov 1998 21:00:00 -0800 (PST)
Date: Mon, 2 Nov 1998 21:00:00 -0800 (PST)
Message-Id: <199811030500.VAA02637@freefall.freebsd.org>
To: freebsd-bugs@FreeBSD.ORG
From: Peter Wemm <peter@netplex.com.au>
Subject: Re: kern/8500: FreeBSD 3.0 thread scheduler is broken 
Reply-To: Peter Wemm <peter@netplex.com.au>
Sender: owner-freebsd-bugs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

The following reply was made to PR kern/8500; it has been noted by GNATS.

From: Peter Wemm <peter@netplex.com.au>
To: Tony Finch <dot@dotat.at>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/8500: FreeBSD 3.0 thread scheduler is broken 
Date: Tue, 03 Nov 1998 12:54:48 +0800

 Tony Finch wrote:
 > Peter Wemm <peter@netplex.com.au> wrote:
 > > HighWind Software Information wrote:
 > >
 > > >    The only alternatives are to use the aio/lio syscalls (which work), or
 > > >    rfork().  libc_r could probably be modified to use rfork() to have the
 > > >    read/write/open/close/etc done in parallel.
 > > >
 > > > I don't think that is necessary.
 > >
 > > It is if you want the threading to continue while the disk is grinding
 > > away.  aio_read() and aio_write() would probably be enough to help file
 > > IO, but open() will still be blocking.
 > >
 > > Squid has some fairly extensive async disk-IO routines.  They happen to
 > > use pthreads as a mechanism of having child processes do the blocking
 > > work.  FreeBSD could use rfork() for arranging the blocking stuff in child
 > > processes with shared address space.  It would be a lot of work though,
 > > and would be a problem on SMP systems.
 > 
 > We have been trying out Squid on 3.0 because of the possibilities
 > offered by async IO, but so far we haven't managed to get it to work
 > satisfactorily. I was also thinking about the possibility of using
 > rfork() to implement threads -- the Linux pthreads implementation does
 > this (except that Linux has clone() instead of rfork() and the
 > interface is slightly different).
 > 
 > What are the SMP issues?
 
 Pretty dramatic, ie: it doesn't work. :-(
 
 The reason is that under SMP, there is a per-cpu page table directory slot 
 that is changed each context switch.  We store a heap of per-cpu variables 
 here (with more to come), including the virtual cpuid.
 
 With a shared address space rfork(), the same PTD, page tables and pages 
 are used in both processes.  If both CPUs happened to schedule both 
 processes on each cpu at the same time, one cpu would clobber the other 
 CPU's private PTD slot and they would both end up using the same privated 
 pages on both cpus.  This kills the system on the spot as they both think 
 they are the same cpu.
 
 For this reason, fast vfork is disabled and rfork() in shared address space
 mode returns an error.
 
 There is not a simple fix for this.  There is a possibility that loading 
 the MPPTDI slot after gaining the giant kernel lock could be made to work 
 as a short-term fix, but obviously that fails when the giant kernel lock 
 starts to go away, and something needs to be done about fast interrupts 
 and the boundary code that runs outside the kernel lock.
 
 Longer term fixes include drastic VM (pmap and support) modifications:
  - have seperate address spaces for the kernel and user.  This isn't such 
 a bad option as it positions us for very large memory systems very well.  
 The kernel would load and run at 0x00100000 rather than 0xf0100000, and 
 would have one PTD[] for each CPU.  Each process could have 4GB of address 
 space, rather than having to leave room for the kernel to live at the top 
 of it.  Needless to say this is a fair amount of work. :-)
  - have multiple PTDs for each shared address space, up to the number of 
 present cpus.  ie: if an address space was rforked for 20 threads, but you 
 had 4 CPUs, then you need 4 PTDs.
 
 Neither of these have been attempted yet, but the second is probably the
 simpler of the two, while the first is probably the best for future
 capabilities.  It would give us a lot more room to move on the large memory
 PPro and PII systems with 36 bits (64GB ram) of address space.
 
 > Tony.
 
 Cheers,
 -Peter
 
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message