Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jun 2003 13:53:29 +0200
From:      "Poul-Henning Kamp" <phk@phk.freebsd.dk>
To:        Dmitry Sivachenko <demon@FreeBSD.org>
Cc:        arch@FreeBSD.org
Subject:   Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c 
Message-ID:  <39081.1055937209@critter.freebsd.dk>
In-Reply-To: Your message of "Wed, 18 Jun 2003 15:22:26 %2B0400." <20030618112226.GA42606@fling-wing.demos.su> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <20030618112226.GA42606@fling-wing.demos.su>, Dmitry Sivachenko writes
:

[I've moved this to arch@]

>> The main problems with nullfs seem to be locking and trying to create clones
>> of the lower vnode (wrt. the VM system and special files). Once kern/51583
>
>BTW, what is the reason for creating these clone vnodes?
>Why we can't simply return the original vnode?

This is a question in the same caliber as a kid asking mom where
the babies come from :-)

Back in history, when vnodes first appeared as part of stacking
filesystems, there were no merged vm/buffer cache.

There were also some suboptimal design "decisions" made in the VFS
implementation, made to expedite the implementation, but introducing
issues which "could be cleaned up later".

NFS added a few interesting wrinkles to the vnode area, mostly because
it does not follow the model implicitly assumed in the VFS layering.
The buffer cache expects a disk device behind all buffers, that took
some hacking too.

Then we got a semi-merged vm/buffer cache.  Semi, becuase it was never
finished so it became some sort of hybrid almost but not quite entirely
unlike either state.  A few filesystems got VOP_GETPAGES, none of them
got VOP_PUTPAGES as far as I recall.

Then we got softupdates and snapshots, which due to shortcomings in
the vm/buf area could not be implemented in the architecturally
obvious way, but instead had to put fingers into specfs and the 
buffer cache to get the job done.

All of this have tangled the simple component formerly known as the
buffer cache up in so many ways, that it is very hard for anybody
to make heads and tails of it any more.

So I am tempted to answer you question with:  "Because it is all a
mess"

A number of us heavy-duty people have started to say rude things
and do menacing gestures with our flow-diagram templates in the
general direction of the buffer cache, but any real solution is
unlikely to happen until we are talking 6-current.

The cleanup would probably be easier to perform if we could ditch
the stuff and layers which have been glued on and reduce the code
to its core functionality first, and this may indeed be what we
have to do, but considering the list of the stuff which are talking
about, it is unlikely to be a politically feasible path to take:

	vinum -- abuses getebuf(), should be GEOM class.
	raidframe -- abuses getebuf(), should be GEOM class.
	cluster code -- must be rewritten
	snapshots -- must be untangled from the bio path.
	softupdates -- ditto.
	unionfs -- does not correctly layer VOP_STRATEGY
	nullfs -- maybe same problem.
	swap_pager -- abuses bogus vnode

I am hoping that we may be able to carve a path by changing the
bio structure operate on vm pages rather than KVM mapped
byte arrays (most disk device drivers don't care for thing being
mapped, they use bus-master DMA and only need physical location).

Next, giving buffers a set of object methods could maybe avoid
the detour around VOP_BMAP and VOP_STRATEGY thereby possibly
making it possible for softupdates and snapshots to be implemented
entirely inside UFS/FFS.

I have a couple of other ideas I want to explore as well, one of
them being not doing I/O via VCHR vnodes, but either at the fdesc
level (when from userland) or via a dedicated API (for disk I/O
from buf/vm).

But I have only just started seriously investigating how all this
can be done, and as I said, it is a royal mess, so it will take
time no matter what I and others find.

With that said,  I will also add, that I will take an incredibly
dim view of anybody who tries to add more gunk in this area, and
that I am perfectly willing to derail unionfs and nullfs (or pretty
much anything else on the list above) if that is what it takes to
clean up the buffer cache.  Any of those facilities can be reintroduced
later on in a cleaner fashion.

I agree that nullfs and unionfs are useful technologies, but if
they have to be reimplemented to fit our kernel, then so be it.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39081.1055937209>