Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Nov 1997 21:38:36 -0800 (PST)
From:      Julian Elischer <julian@whistle.com>
To:        hackers@freebsd.org
Cc:        Julian Elischer <julian@whistle.com>
Subject:   Stackable storage Alpha release
Message-ID:  <Pine.BSF.3.95.971129200733.9210A-100000@current1.whistle.com>

next in thread | raw e-mail | index | archive | help

Over the last couple of years I have been slowely
trying to increase the modularity of freeBSD.

One of the things that I really didn't like about UNiX when I first
started using it was the 'disconnection' between the contents of /dev and
reality. To this end I have been working in the background on DEVFS. A
device filesystem which allows (in fact requires) the device drivers to
keep the exported picture of available devices in sync with what is
actually attached. DEVFS has had it's ups and downs, but one of the
difficulties it has had, is in dealing with the current idea of slices and
partitions. Particularly, the way in which slices and devices are all
mixed together.  I finally gave up, and have spent the last 4 weeks
or so rewriting The disk storage system. This comes from discussions I
have had with PHK at TFS, and others (e.g. Peter Wemm in Perth) over the
last few years.

 Redoing this, which is so basic to the system of course requires that
many things be changed.  A lot of the changes turn out to be clean-ups.
For example, the code for interpretting the boot device as handed in by
the bootblocks could be a lot cleaner. Mounting the root filesystem is in
general a messy business in freeBSD, and a general cleanup there might
make things easier to fix in the future.

I have now a set of sample code and patches for freeBSD-current
which allow the system to run on a DEVFS, using a primative version
of the rewritten storage code. Anyone interested can get a copy of the
changes from hub.freebsd.org in:
ftp://hub.freebsd.org/pub/scsi/slice.tar.gz
unpack the tar file in /sys to get all teh new files, then
apply the patch slicediff that it leaves in /sys to get file CHANGES.


This is very early code. It can however run on must systems that have scsi
or IDE drives. (As long as bad144 is not used)
there are the following points to be made:

1/ I have yet to integrate a whole bunch of work that phk has done on
this, as I elected to get to a booting and running stage, before I did
that.

2/ This code will not support old ESDI drives that cannot report
their geometry. (there is support for it, but it is unfinished, and
a change in direction is under way after a discussion with Mike Smith.)

3/ You need to change all the entries in /etc/fstab to use their CANONICAL
names. e.g. sd1s1a rather that sd1a. There should be an entry of the form: 

devfs	/dev	 devfs	rw 1 1

possibly BEFORE root.

4/ As I write this you need to boot single user, and manually do a
'mount /dev'
fsck -p
^D
I'm not yet sure why.
Rather than just proceding into multi-user mode,
I would suggest trying out your devices in single-user mode anyway.

5/ there is a file i386/isa/ide.c this is wd.c with all the old code
removed, and some cleanups.  I did this just to see how much difference,
removing all the old stuff made. 

6/ The SCSI disk can still be accessed through the old interface
in parallel with the new interface. the IDE disk cannot. If you boot with
the root fs mounted from a DEVFS device, you will not be able to
do the "mount -u / " from the normal /dev. so root has to be either devfs
or not. If you use the "options SLICE", then you will get your
root device from in internal kernel-only instance of devfs.

7/ I have no support or reading or writing 'in-core disklabels' yet. fdisk
works on the raw device (warning, ANY raw device)  and so does disklabel
using the -r flag. There is no core-dump support yet in the new stuff.

Storage Layering:
Here is a brief description of storage layering:

Every device exports a single storage interface. This is called a 'slice'
Each slice is represented by a "struct slice". The struct has one and only
one handler below it, (in this case the driver) and one or zero handlers
above it. The slice itself exports a device to the devfs, so even if a raw
disk had no handler above it, it would still have one raw device
available for use. (e.g. rdsd0).

If the slice were divided up using fdisk, so that an MBR was installed,
defining some partitions, then the handler abovethe raw slice would be
the MBR handler. If howeverm it were divided up using the "dangerously
dedicated" mode, with a disklabel defining partitions, then the disklabel
handler would be the handler above the slice.

Each partitionning of the slice by the handler, produces
more 'slices'. They export the identical interface that the lower slice
does, so that it might be possible to fdisk an fdisk partiton
for example. The only notable diffenence between two slices at different
layers is they name.

sd0->sd0s1->sd0s1a
if we left out the fdisk stage, it would be:
sdd0->sd0a
and if we divided up an fdisk partition, using an MBR, we might see:
sd0->sd0s1->sd0s1s1->sd0s1s1a
(assuming we then disklabeled it)

it is up to any handler to define how many slices it mutiplexes to above
and below, but the slices themselves cannot multiplex.
Thsi what eventually appears is a sandwich of:


slices (sd0s1a) (sd0s1b)
           |      |
handler  (disklabel)
              |
slices      (sd0s1) (sd0s2)    (vn0a)   (vn0b)
               |      |           |       |
handler        ( MBR )           (disklabel)
                  |                   |
slices          (sd0)     (wd0)     (vn0)
                  |         |         |
handler/driver  (sd.c)    (wd.c)    (vn.c)


There would be other layers eventually.
e.g. A layer to do bad-block mapping (cough, I forgot to say I was't doing
that yet?) 
A layer to do CCD or RAID type things.

The 'slice' structure is well known, in that all handlers know all the
fields, and can access them. This provides a 'mailbox' (SIC) for handlers
to identify and communicate with each other, without needing too much
knowledge about each other. The handlers supply an array of methods that
they support, either from calls from above, or calls from below. 

I am still cleaning up the way that handlers invoke each-other's methods
so be kind.. :) 

Comments are not only welcome, they are sought!

If you repartition a disk, the entries in /dev should
dynamically change. The present version however maintains consitency, by
disallowing opens on lever level devices while higher level devices ar
eopen, so you cannot at this time repartition your root disk while
running on it. (I'm not convinced this is a good thing,
but I should support it).

julian

I hope I haven't left anything out..

BTW SOS and Luigi.. thw patch includes
DEVFS fixes for your device drivers.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.971129200733.9210A-100000>