From owner-freebsd-hackers  Fri Jan 27 09:32:53 1995
Return-Path: hackers-owner
Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id JAA23067 for hackers-outgoing; Fri, 27 Jan 1995 09:32:53 -0800
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.9/8.6.6) with SMTP id JAA23059 for <freebsd-hackers@FreeBSD.org>; Fri, 27 Jan 1995 09:32:51 -0800
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA29338; Fri, 27 Jan 95 10:26:53 MST
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9501271726.AA29338@cs.weber.edu>
Subject: Re: Help with SCSI development (fwd)
To: dufault@hda.com (Peter Dufault)
Date: Fri, 27 Jan 95 10:26:52 MST
Cc: freebsd-hackers@FreeBSD.org, vernick@cs.sunysb.edu
In-Reply-To: <199501271539.KAA08510@hda.com> from "Peter Dufault" at Jan 27, 95 10:39:51 am
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

> Michael Vernick writes:

[ ... disk striping project ... ]

[ ... async I/O ... ]

> > Are there any direct calls to the scsi driver from user space? I saw
> > something about user calls in scsi_ioctl.c but couldn't figure out how
> > to access it.  If there are user level calls, are they asynchronous?

There are user space SCSI calls.  Check current, per  Peter's recommendation.

The read calls in the kernel are synchronus.  The write calls are either
sync or async, depending on parameters.

For the most part, the async nature of the calls is *not* exposed to the
user space process.  Generally, this is held in the kernel as a preemption
point, and the sync writes are used simply to ensure writes of file system
meta-data.

Both types of sync operations cause the process to be suspended while
they are serviced.

Many user space threading mechanisms depend on async read and write
operations to provide preemption points for user space tread scheduling
algorithms.  The specific calls dealing with async I/O are aioread, aiowrite,
aiowait, and aiocancel.  This is for the SunOS 4.x LWP system and for the
SVR4 and SunOS 5.x N->M mapping of user to kernel threads, where N > M.
In addition, there are primitives required for stack switching and
register set saving, as well as pipeline synchronization.  SunOS and SVR4
provide system calls to accomplish these tasks.

These facilities are not generally implemented in *BSD at this time.

You may want to look into the port of the pthreads library.  This will
allow you to get started on a user space implementation.

Providing asynchronus reads within as single thread of control within
the kernel is unlikely to be portable to other than SCSI devices, and
will probably require a great deal of work on the SCSI devices themselves.
If you indeed plan to move this into the kernel at a later date, you may
wish to consider implementing or waiting for kernel threading.  Otherwise
you may see speed penalties for your striping.

In effect, the NFS nfsd and biod processes, which make system calls and
never return, as well as the swapper and update daemons, are in effect
kernel threads, albiet threads with a proc structure attached to them
for no good reason.

A kernel threaded implementation would not *require* a divorce of this
type, but is likely to suffer greatly increased complexity if this does
not occur.

Kernel thread spawning is perhaps the single easiest way of forcing sync
operations to act s if they were async.  In simplest parlance, a kernel
thread is nothing more than the kernel stack, registers, and program
counter of a process.  Most of the work for supporting this is, or will
be, talking place for medium and fine grained SMP support, which is
directly analgous to kernel multithreading, or requires it (respectively;
coarse grained SMP does not require that more than one processer be
allowed to execute kernel code at a time, only that there is anonymity
as to which processer is actually doing the kernel code execution).

I would suggest a user space implementation at this time, with an eye
to the coding techniques used to do as little to prevent moving into the
kernel in the future as you can.


> > What I'd like to do is issue a bunch of reads (1 for each disk) to the
> > controller all at once and then wait for the data to be returned. If I
> > can't make asynchronous calls in user space, I guess that I could have
> > multiple processes, one for each disk, where each process sends a
> > synchronous read to the driver, and once all of the processes finish,
> > accumulate the data.

For a user space implementation, the pthreads library is an ideal way
to achieve this.

> > Eventually, I want to move this into the kernel.  When it becomes a
> > kernel level process, at what level do I issue calls.  At the
> > physio/bread level or lower?

This is a connundrum.

The problem is presented because the "slice" abstraction and therefore the
per-slice device abstraction is embedded in the SCSI driver itself.  This
is probably a mistake, if what you want to do approaches RAID I or the
Zebra FS striping techniques (I would strongly advise *against* the Zebra
approach for performance reasons).

This bottlenecks async reads, especially, through the device/slice
abstraction.

Really, you want a cleanener logical seperation of layers, and to stick
at least part of your code between the layers.

If you do this, please ensure that your layering is clean, so that you
can impose logical device management between your layer and the underlying
driver code.  AIX style logical volume management has long been a goal of
at least several people on these lists, myself included.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.