From owner-freebsd-bugs Mon Nov 2 21:00:02 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id VAA11287 for freebsd-bugs-outgoing; Mon, 2 Nov 1998 21:00:02 -0800 (PST) (envelope-from owner-freebsd-bugs@FreeBSD.ORG) Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA11237 for ; Mon, 2 Nov 1998 20:59:59 -0800 (PST) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.8.8/8.8.5) id VAA02637; Mon, 2 Nov 1998 21:00:00 -0800 (PST) Date: Mon, 2 Nov 1998 21:00:00 -0800 (PST) Message-Id: <199811030500.VAA02637@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.ORG From: Peter Wemm Subject: Re: kern/8500: FreeBSD 3.0 thread scheduler is broken Reply-To: Peter Wemm Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The following reply was made to PR kern/8500; it has been noted by GNATS. From: Peter Wemm To: Tony Finch Cc: freebsd-gnats-submit@FreeBSD.ORG Subject: Re: kern/8500: FreeBSD 3.0 thread scheduler is broken Date: Tue, 03 Nov 1998 12:54:48 +0800 Tony Finch wrote: > Peter Wemm wrote: > > HighWind Software Information wrote: > > > > > The only alternatives are to use the aio/lio syscalls (which work), or > > > rfork(). libc_r could probably be modified to use rfork() to have the > > > read/write/open/close/etc done in parallel. > > > > > > I don't think that is necessary. > > > > It is if you want the threading to continue while the disk is grinding > > away. aio_read() and aio_write() would probably be enough to help file > > IO, but open() will still be blocking. > > > > Squid has some fairly extensive async disk-IO routines. They happen to > > use pthreads as a mechanism of having child processes do the blocking > > work. FreeBSD could use rfork() for arranging the blocking stuff in child > > processes with shared address space. It would be a lot of work though, > > and would be a problem on SMP systems. > > We have been trying out Squid on 3.0 because of the possibilities > offered by async IO, but so far we haven't managed to get it to work > satisfactorily. I was also thinking about the possibility of using > rfork() to implement threads -- the Linux pthreads implementation does > this (except that Linux has clone() instead of rfork() and the > interface is slightly different). > > What are the SMP issues? Pretty dramatic, ie: it doesn't work. :-( The reason is that under SMP, there is a per-cpu page table directory slot that is changed each context switch. We store a heap of per-cpu variables here (with more to come), including the virtual cpuid. With a shared address space rfork(), the same PTD, page tables and pages are used in both processes. If both CPUs happened to schedule both processes on each cpu at the same time, one cpu would clobber the other CPU's private PTD slot and they would both end up using the same privated pages on both cpus. This kills the system on the spot as they both think they are the same cpu. For this reason, fast vfork is disabled and rfork() in shared address space mode returns an error. There is not a simple fix for this. There is a possibility that loading the MPPTDI slot after gaining the giant kernel lock could be made to work as a short-term fix, but obviously that fails when the giant kernel lock starts to go away, and something needs to be done about fast interrupts and the boundary code that runs outside the kernel lock. Longer term fixes include drastic VM (pmap and support) modifications: - have seperate address spaces for the kernel and user. This isn't such a bad option as it positions us for very large memory systems very well. The kernel would load and run at 0x00100000 rather than 0xf0100000, and would have one PTD[] for each CPU. Each process could have 4GB of address space, rather than having to leave room for the kernel to live at the top of it. Needless to say this is a fair amount of work. :-) - have multiple PTDs for each shared address space, up to the number of present cpus. ie: if an address space was rforked for 20 threads, but you had 4 CPUs, then you need 4 PTDs. Neither of these have been attempted yet, but the second is probably the simpler of the two, while the first is probably the best for future capabilities. It would give us a lot more room to move on the large memory PPro and PII systems with 36 bits (64GB ram) of address space. > Tony. Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message