From owner-freebsd-hardware  Mon Apr  1 17:49:10 1996
Return-Path: owner-hardware
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id RAA22419
          for hardware-outgoing; Mon, 1 Apr 1996 17:49:10 -0800 (PST)
Received: from lserver.infoworld.com (lserver.infoworld.com [192.216.48.4])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id RAA22397
          Mon, 1 Apr 1996 17:49:06 -0800 (PST)
Received: from ccgate.infoworld.com by lserver.infoworld.com with smtp
	(Smail3.1.29.1 #12) id m0u3vWg-000wt0C; Mon, 1 Apr 96 18:08 PST
Received: from cc:Mail by ccgate.infoworld.com
	id AA828409505; Mon, 01 Apr 96 18:38:32 PST
Date: Mon, 01 Apr 96 18:38:32 PST
From: "Brett Glass" <Brett_Glass@ccgate.infoworld.com>
Message-Id: <9603018284.AA828409505@ccgate.infoworld.com>
To: Garth Kidd <garth@dogbert.systems.sa.gov.au>, msmith@atrad.adelaide.edu.au
Cc: hdalog@zipnet.net, davidg@Root.COM, hardware@FreeBSD.org, bugs@FreeBSD.org
Subject: Re: Cannot boot after install
Sender: owner-hardware@FreeBSD.org
X-Loop: FreeBSD.org
Precedence: bulk

> That might be fine for normal filesystem access, Brett, but consider your
> poor swap partition.  If a page fault occurs, the kernel's just going to
> have to sit around and wait for the drive to spin up, ne?

Not necessarily. It is possible for tasks that have NOT page faulted to
keep running while others are waiting for pages to arrive. (It's even
better if the swapper can postpone the choosing of the "victim" page until
the drive is ready to go, but this is rarely done because most bus
mastering controllers need to be pointed at a block of RAM in advance.)

If a task that's running in the interim suddenly needs a page, it will
simply block and be queued up for it.

> Unless you want to re-write the scheduler so that processes can only get
> CPU when all of their pages are in memory at the time, you're going to
> have to either put up with 30-second hangs or run with no swap.

Again, this is not necessary. The key thing is not to have no disk
accesses but rather to perform them concurrently with other tasks.

It's a fundamental rule of concurrent programming that critical sections
should never block or busy-wait. FreeBSD's disk access code breaks that
rule, potentially hanging the machine if a peripheral goes offline
and leaving it vulnerable to all sorts of hardware quirks.  To avoid this,
one needs to create a finite state machine.  (There is one, of a sort, in
the disk code now -- but it is not implemented to avoid busy-waits.) In a
correct implementation, each state is a non-blocking critical section. When
the state machine must wait for an event (in this case, for the disk to
come ready), it alters its state variables in preparation for a state
transition and immediately yields control to the scheduler.  When it is
re-awakened either by a clock tick or by an I/O completion interrupt, it
proceeds with the code for the next state. If the clock tick method is
used, there will be "polling" states that branch to themselves -- yielding
control each time -- until a condition is satisfied.

There's a very good and surprisingly interesting exposition of this topic
in the book "Soul of a New Machine." It even covers the case of double
faults.

--Brett