Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jan 1999 03:18:48 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        tlambert@primenet.com, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject:   Re: questions/problems with vm_fault() in Stable
Message-ID:  <199901080318.UAA06106@usr01.primenet.com>
In-Reply-To: <199901080253.SAA36703@apollo.backplane.com> from "Matthew Dillon" at Jan 7, 99 06:53:36 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> :> :There's *no* memory waste, if you don't instanceincoherent copies
> :> :of pages in the first place.
> :> 
> :>     You are ignoring the point.  I mmap() a file.  I mmap() the
> :>     block device underlying the filesystem the file resides on.
> :>     I start accessing pages.  Poof... no coherency, plus lots of
> :>     wasted memory.
> :
> :No.  You mmap the file.  This instances a vnode, and a VM object using
> :the vnode as a swap store.
> 
>     What do you mean, no?  I just gave you an example of something someone
>     might want to do.
> 
>     Now you tell me how to ensure that it's coherent.  Disallowing the
>     case is not an option.  You can't just ignore cases that don't fall
>     under your model!

This particular case, the one where you treat a device node as if it
were something other than a vnode (and therefore an FS object and
therefore subjecting VOP_GETPAGE/VOP_PUTPAGES to semantic override)
and somehow exempt is bogus.

You can't ensure that accesses to the underlying device don't
break the rules of the FS which is mounted on this device.

In general, this is my objection to Julian's "slice" architecture,
which requires some user space program to do partitioning instead of
an abstract ioctl or fcntl interface to a partition agent.  The
kernel already knows about the frigging partition table (or disklabel
or DOS extendend partitiotn table or SVRR VTOC or *whatever*), and
we insist on not using this fact, and instead we insist that we have
to teach one program for each of these damn things how to do raw
disk access, and further we have to implement some ungodly lock
and notify coherency menchanism so we can engage in this stupidity
in the first place.

I would say that the FS that implements device file semantics is
responsible for dealing with this issue by forcing direct accesses
up through the FS on which its mounted.

In the "slice" code case, this would be tantamount to not allowing a
partition to be deleted if someone is using it.

For an FS mounted on a raw device simultaneous to raw device access,
I'd argue that it's the responsibility of the device FS to respect
opportunity locks as if they were non-advisory.


If an FS is on a physical device, it owns the damn thing.  If you
want to distribute this among 50 machines, then you implement an
FS that a variable granularity block store, and has no other
semantics other than a distributed cache coherency mechanism, and
you mount your FS *on top of it*, and let *it* worry about the
problem.



> :In each case, however, the requests to obtain the page for reading
> :and writing go down to the cnode object off of which it is hung.
> :...
> :easy to understand.  And in the last case, it doesn't really make
> :sense to expose the FS hierarchy underlying the RESOURCE FS
> :layer, because it exposes you to non-cache based namespace coherency
> :problems (e.g., how do you handle someone CD'ing into a directory
> 
>     That is the whole point of having a cache coherency protocol.
> 
>     Your design is strictly top-down.  You are specifically disallowing the
>     case where someone may tap into a node underneath your 'top' node.  
>     You are saying, basically, that it is illegal to do so because your
>     mechansism can't deal with the cache coherency problems associated with
>     that sort of access.
> 
>     There's just one problem:  I've given you half a dozen cases where
>     someone might want to tap into an intermediate node. 

No, I'm saying a VM cache coherency mechanism isn't going to do snot-all
to ensure that any semantic coherency of any kind other than VM cache
coherency is maintained, and that that's not good enough to allow
non-semantic access to the underlying substrata.

If I have a semantic layer, it doesn't *matter* if I access the
layers underneath it.  All I', doing is ignoring the semantic
constraints.

This is a very different thing than ignoring the implied contract
between a non-semantic layer and the layer underneath it to not
change data in unexpected ways.

For a semantic layer that imposed quotas, access to anything other
than the quota file would not screw things up (so long as the upper
layer asked for Andrew-style notifications and the lower layer
agreed to supply them).

For the quota file itself, there needs to be the concept of an
exclusivity lock for non-semantic regions/data.  Right now, we
have a mechanism for doing this on a per-file basis.  I'd have
no real objection to requiring support for non-advisory range
locking by VFS consumers of VFS providers (be the consumer the
system call layer or a QUOTAFS intent on not having its quota
file screwed with out from under it, or an ACLFS intent on ensuring
that the ACLsemantics are obeyed, with no exceptions by some schmuck
with a promiscuous view onto the underlying directory space).


> :thats a file, and removing the "filedata" file containing the file
> :data fork, without also removing the associated resource fork or
> :extended attribute?  ... You can't, and the VM cache coherency
> :protocol you propose won't handle this non-VM coherency problem,
> :either).
> 
>     Sure it will.  You are thinking top-down again.  The whole point of
>     the cache coherency protocol is that it is a two-way protocol... it
>     propogates out from the access point and the access point does NOT
>     have to be starting at the top.   The vm_object extension ( vm_alias's
>     or something similar ) is simply an optimization, but one that allows
>     the VM system to do its job almost trivially.   Now I suppose you
>     could explicitly design the ACL and QUOTA layers to ignore accesses 
>     made out from underneath them, but that seems silly to me... I can 
>     think of several situations where the QUOTA layer would definitely want
>     to be updated if someone else makes changes to files that it maps 
>     without going through it directly.

My answer to that is locking and Andrew-style notifications.

My personal preference for this would be to only expose the
underlying FS layer through a notification layer -- disallow
the notification layer from the underlying FS having promiscous
exposure, and allow mounting on top of either a non-exposed OR
a notification layer that may or may not be exposed.


>     If you would go back and fraggin READ the original proposal, you
>     will note that this case is explcitly covered.
> 
>     Terry, READ THE PROPOSAL.  You aren't reading it.  I have repeated
>     the solution to this case three times so far, I'm not going to do it
>     again.

Give me a URL.  It must have occured in non-hackers private mail.


[ ... ]

>     You then replaced it with an extremely specialized design of
>     your own that covers ONLY those specific cases and nothing
>     else, has all sorts of restrictions, and cannot maintain 
>     cache coherency across a network ( you are missing the point big time
>     if you think that the NFS client-server model is what someone would 
>     want to use in a network cluster! ).


(1)	Not *my* proposal.

(2)	MNFS manages distribute cache coherency across a network
	within the context of the existing framework.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901080318.UAA06106>