Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Jun 1998 05:13:55 +0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Chuck Robey <chuckr@glue.umd.edu>
Cc:        Poul-Henning Kamp <phk@FreeBSD.ORG>, freebsd-current@FreeBSD.ORG
Subject:   Re: Heads up: block devices to disappear! 
Message-ID:  <199806232113.FAA08470@spinner.netplex.com.au>
In-Reply-To: Your message of "Tue, 23 Jun 1998 09:33:01 -0400." <Pine.BSF.3.96.980623092434.303C-100000@localhost> 

next in thread | previous in thread | raw e-mail | index | archive | help
Chuck Robey wrote:
> On Tue, 23 Jun 1998, Poul-Henning Kamp wrote:
> 
> I checked the header, this seemed to be sent to the hackers list.  I am
> moving it to current, cause it seems more reasonable.

Not really, it's a long term design issue, not (yet) an operational issue 
for -current.

> > Unless compelling evidence to the contrary is presented, I will remove
> > blockdevices as a concept from FreeBSD RSN.
> >
> > In the future all devices will be character devices, and mounts will
> > happen using these as well.
> >
> > Adequate compatibility code will be provided.
> 
> 
> Regarding removing the block devices, Poul, how come?  I'm not arguing
> with you, I just don't understand why one of the most basic features of
> all unixes I've ever seen are being removed, and I'd appreciate an
> explanation so that I don't have to be so completely clueless about this
> rather startling change.

Because they are both pretty much the same thing..  All the block drivers 
also have a character interface.

With the bdevsw entry points, the main ones are open,close,strategy,ioctl. 
You open a device, submit "it'd be nice if you'd do what this buf wants" 
requests via strategy and close.

With the cdevsw entry points, you've got open,close,read,write,ioctl,poll,
mmap,strategy etc.  Note the common points?  A character device is a 
superset of a block device (at this level anyway).

In kern_conf.c:
/*
 * Since the bdevsw struct for a disk contains all the information
 * needed to create a cdevsw entry, these two routines do that, rather
 * than specifying it by hand.
 */

The main differences are that character device read/write accesses go via
physio which calls the strategy entry points on buffers that are not in 
the buffer cache, while the filesystem block accesses go via bread()/
bwrite() etc which use cached buf's also via the same strategy points.

So, If Poul-Henning is talking about what I think he is, the split devices 
will become one and the same thing, and when it's accessed via block/
buffer cache methods, it'll behave like a block device.  When accessed via 
read/write it'll behave like a character dev the same as before.  However, 
if you try and mount() /dev/tty or something else without the strategy 
methods then it can't be done.

With the present system, specfs looks at the device type.  If the vnode for
the /dev node is a character device, specfs calls the device's read/write
entry points, leading to physio on disk type devices using non-buffer-cache
bufs being passed to the strategy point.  If the vnode is a block device,
it does buffer cache IO directly with the device's strategy point.  I 
think phk is suggesting reducing that to the character case so that all 
user-mode access to the /dev "disk" nodes goes via physio, while kernel 
filesystem IO would go via bread/strategy directly (as it should).  
Incidently, I have never once in my entire unix background used the 
"block" devices except by mistake when I should have used character mode.  
I've been caught cursing doing a 1.44MB dd write to a bad floppy and spent 
the next umpteen eons waiting for the buffer cache to give up trying to 
write out the write cached blocks.

Of course, phk might be talking about promoting SLICE style access beyond 
the DEVFS case at the same time.  Now that'd really be something. :-)

Also, thinking about this, I still await the day when the dev_t interface
can go.  Nearly all access to disks and devices starts out at the vnode
layer and has to get translated into dev_t accesses.  This is kinda silly
when you think about it since one major cache in the system (the VM area)
has all of it's cache indexed via objects and associated vnodes.  The other
area (the buffer cache) is indexed by blocks and buffers.  John's unified
VM/buffer cache work involves sharing the data pages between the two
different systems to maintain coherency.  The effort required to get this
all to fit together and work is extreme.  Take an example of a write of a
dirty mmap page..  The page starts out in a vm object with an associated
vnode.  In the process of writing it out, this object gets translated into
a devsw write via a buffer header being allocated and attached to the page,
then the buffer goes through the devsw system.  So at this point, we have
the page "appearing" in both the VM/object/vnode cache system with a second
reference appearance in the buffer cache.  Now, if somebody does an open()
and read() of the mmap'ed file, the vnode access goes through the
filesystem, is converted to a devsw/buffer cache lookup and it's "found"
over there and copied back to the user.   It'd make more sense to access
everything via the vnode in the first place, doing away with the specfs
layer entirely and modifying the driver interface accordingly.  The buffer 
cache functionality would need a new form as the VM object system doesn't 
deal with sub-page entities real well for small-block filesystems, 
metadata etc.

Oh the things that could be done with a 200 hour week. :-)

> I'd also appreciate letting me know about the compatibility stuff you
> mentioned ... does that mean I don't have to make any changes to my
> system in order to survive this change?  Else, what do I have to do?

If we're talking about unifying the bdev/cdev interface, there would need
to be a device major number translator or something so that /dev/wd0s1a and
/dev/rwd0s1a which have different block and character major device numbers
would work seamlessly. I'd assume that MAKEDEV would probably be making the
two nodes hard links to each other under the 'c' device mode and using the
cdev majors so this will not be a long-term problem.

Cheers,
-Peter
--
Peter Wemm <peter@netplex.com.au>   Netplex Consulting



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199806232113.FAA08470>