Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Oct 1999 12:43:40 +0200
From:      Poul-Henning Kamp <phk@critter.freebsd.dk>
To:        freebsd-arch@freebsd.org
Subject:   Re: The eventual fate of BLOCK devices. 
Message-ID:  <447.939897820@critter.freebsd.dk>
In-Reply-To: Your message of "Thu, 14 Oct 1999 13:15:25 %2B1000." <Pine.BSF.4.10.9910141222290.32868-100000@alphplex.bde.org> 

next in thread | previous in thread | raw e-mail | index | archive | help

[I know it was Julian who threw this ball in the air, but I take
the liberty of doing the final round: I have been the primus
motor on this issue from the beginning and it is part of the
dev_t cleanup project.]

SUMMARY:

So far we have identified the following two classes of software
which access disk-like devices through cdev and bdev:

  1) Database software.

  2) Filesystem maintenance tools

  3) savecore(8)

Database software prefer cdev semantics if at all possible, if
running on anything but a cdev database software call fsync(2) a
lot to make sure the writes have hit the media.

Terry argues for retaining the bdev semantics rather than the cdev
semantics, but I think we can dismiss that idea based on the above
observation: it would penalize software which know better.  Retaining
the bdev would in essence be emulating the mistake Linux made, and
which they are now unmaking.

The filesystem maintenance applications mentioned so far which rely
on bdev semantics, the EXT2FS tools, can be trivially converted to
operate on cdev semantics.  The majority of such tools already
correctly operate on cdevs.

Savecore(8) has already been converted to operate on cdevs.

Using mmap(2) to provide a new type of buffered semantics for
disk-like devices is insteresting, but its applicability will be
limited by the virtual address space of a process: you can't map
a 20GB database into a 32bit address space, so a lot of mmap(2)
calls will be needed for serious sized data.  The need for, and
actual use of such a facility seemes uncertain.

There is general disagreement about how much code we save, but
nobody disputes that we will be able to remove some amount of
complexity from the kernel.  Most people seem to overlook the
needlessly replicated code in a number of xxx(8) tools to DTRT with
/dev/foo vs /dev/rfoo.

Implementing an ioctl(2) to switch a disk-like device into bdev
mode is relatively trivial, but there currently seems to be no
point in doing so.

There is a significant majority supporting the removal of bdev
semantics.

CONCLUSION:
-----------

Unless we have significant new information to the contrary, I will
commence the bdev removal after November 1st 1999.

In order to try to trigger any such information, I will change
the default value of the vfs.bdev_buffered sysctl to zero this
weekend, this will make bdevs react like cdevs.

An ioctl(2) based mode-switch will only be implemented if a
very good reason for doing so materializes.

Thanks for participating.

Poul-Henning

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?447.939897820>