Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Oct 1998 22:15:53 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        otok@students.itb.ac.id (Otok Berliawan)
Cc:        tlambert@primenet.com, lha@e.kth.se, michaelh@cet.co.jp, freebsd-fs@FreeBSD.ORG, kom-arla@stacken.kth.se
Subject:   Re: deadfs in FreeBSD 3.0/current ?
Message-ID:  <199810272215.PAA08872@usr04.primenet.com>
In-Reply-To: <Pine.BSF.4.02A.9810270925280.28066-100000@students.itb.ac.id> from "Otok Berliawan" at Oct 27, 98 09:27:27 am

next in thread | previous in thread | raw e-mail | index | archive | help
> I'm the new comer in FreeBSD enviroment,I have been confusing with this
> topic, can u explain me, what for the deadfs file????

The "deadfs" VFS layer file system implementation is a file system
that can be referenced by revoked vnodes without each FS needing
to do revoked node management.

It's a rather unsatisfying kludge to get around the fact that
some file systems would like to manage their own vnode pool,
but the vnode pool has been declared, by fiat, to be a system
wide resource.


Here are the gory details...


The VFS implementation abstracts two interfaces:

	VFSOPS		Operations on VFS objects, which are the
			exposed interfaces to instances of actual
			file systems

	VNOPS		Operations on vnode objects, which are
			the exposed interfaces to instances of
			acutal files on a single file system
			instance

It's important to undestand what a "file" is in order to understand
the architecture, and thus the need for a deadfs.

For right now, we're going to ignore the mount, unmount, and other
per-fs operations, so we can concentrate on the VNOPS and totally
ignore the VFSOPS.


At the top level, we have a "file".  In user space, this is an
"fd" -- a "file descriptor object".  This object is an index.

Starting with the process, we see that there is a pointer in
called p_fd:

STRUCTURE:	struct proc
DEFINED:	/usr/include/sys/proc.h
MEMBERS:		p_fd
DEFINITION:	The p_fd member is a pointer to a struct filedesc,
		which is the list of files open by the process.

>From there, we can see what a filedesc is:

STRUCTURE:	struct filedesc
DEFINED:	/usr/include/sys/filedesc.h
MEMBERS:	fd_ofiles
DEFINITION:	The fd_ofiles member is a pointer to a list of struct
		file pointers that represent the list of files open
		in a process.

So to get to the actual struct file for any user space descriptor,
we dereference the current process (called "curproc" here, for
convenience), get the p_fd member, get the list of open struct file
pointers, and index it by the descriptor number:

	curproc->p_fd->fd_ofiles[ fd];

This gives us a pointer to the struct file associated with the open
file.  We are interested in three fields:

STRUCTURE:	struct file
DEFINED:	/usr/include/sys/file.h
MEMBERS:	f_type
		f_ops
		f_data
DESCRIPTION:	The f_type field specifies the type of the file
		descriptor (in the kernel, we call a "struct file"
		a descriptor; this is somewhat confusing if you
		are used to user space calling an fd a descriptor)
		can be one of:

		DTYPE_VNODE	A vnode, which is to say, the f_data
				member points to a vnode and the
				f_ops member points to the struct
				fileops vnops, which is defined in
				/sys/kern/vfs_vnops.c
		DTYPE_SOCKET	A socket, which is to say the f_data
				member points to a struct socket and
				the f_ops member points to the struct
				fileops socketops, which is defined
				in /sys/kern/sys_socket.c
		DTYPE_PIPE	A pipe, which is to say, the f_data
				member points to a struct pipe and
				the f_ops member points to the struct
				fileops pipeops, which is defined in
				/sys/kern/sys_pipe.c
		DTYPE_FIFO	A FIFO, which is to say, the f_data
				member points to a struct socket and
				the f_ops member points to the struct
				fileops socketops, which is defined 
				in /sys/kern/sys_socket.c

We should note at this point that the existance of the struct fileops
is a kludge.  It exists because the VFS layer was not completely and
correctly integrated in the 4.4BSD-Lite/4.4BSD-Lite2 code.  It is
the fact that the struct socket (defined in /usr/include/sys/socketvar.h)
and the struct pipe (defined in /usr/include/sys/pipe.h), combined
with the lack of accessor functions in struct fileops, which defines
only read/write/ioctl/poll entry points, a subset of the system call
interface, that it is impossible to do things like file locking on
sockets or credential management on pipes, etc..


Now the intersting part here is the DTYPE_VNODE value for f_data,
in which case the struct fileops entries map to VOP_ calls...
the VNOPS with which we are interested, and in which they operate
against the vnode itself.

Note: There are many system calls that are not covered by the struct
fileops functions -- the read/write/ioctl/poll entry points, noted
above.  Such calls *directly* test to see if f_type is DTYPE_VNODE,
and if it is not, fail the call immediately.  otherwise they make
appropriate VOP_ calls to implement the action requested by the user
process.  See /sys/kern/vfs_syscalls.c for more details.

Note: It may seem like the struct file is the only VFS/vnode consumer;
it isn't.  Other consumers are various kernel interfaces, execution
class loaders for loading FreeBSD native and emulated binary types,
and other direct VFS consumers, like the NFS server software.  It
only seems that the struct file is the only consumer because of poor
architectural abstraction (Some Pigs Are More Equal Than Others).


So now on to answering the original question... why is there a deadfs?

If we look at the vnode structure, we see a pointer to the VOP_
calls in the member v_op:

STRUCTURE:	struct vnode
DEFINED:	/usr/include/sys/vnode.h
MEMBERS:	v_op
DEFINITION:	The v_op file is the vnode operations vector; it
		is a pointer to a structure that contains the
		addresses of int pointers to functions that take
		opaque void * descriptor arguments -- in other
		words, a pointer to an array of vop_t's.

Now let's say that we decide a particular vnode is no longer valid;
if that vnode is referenced by an open file in user space, we now
have a problem: we are trying to invalidate the vnode itself, but
we don't know what f_data members of what kernel descriptors are
pointing to it on behalf of files opened by what processes.

We have a conundrum: how do we invalidate a vnode, such that the
next reference to the vnode fails gracefully, and causes the upper
level code to recognize that the underlying object is invalid, and,
hopefully, destroy the reference?  If only we could do this, then
we will eventually work our way down to zero references, at which
time the system can reclaim the vnode object back to the system vnode
pool.


The way we do this is we call VOP_REVOKE, which falls all the ways
down to the default code in /sys/kern/vfs_default.c (which is a
blatant failure to implement the Heidemann stacking framework
correctly, but one which we will not get into here because we are
interested in the reason for deadfs, not "why FreeBSD VFS stacking
doesn't work right"), and calls the generic vop_revoke() in the
file /sys/kern/vfs_subr.c.  This results in vclean() (in the same
file) setting v_op to dead_vnodeop_p... in other words, marking
the vnode "dead", so that no further operations against it will
call the real filesystem functions, but will instead fail all
the way back to the point that the references can be deleted.


And that's what deadfs is for: post invalidation because FreeBSD
doesn't have the integration it should to remove the need for the
struct fileops bypass, and because the open file instances are not
reverse linked so that the objects pointing to vnodes can have
their pointers invalidated, and because the model insists on vnode
ownership by the system instead of by a particular file system,
even when the particular file system architecture makes that a
bad design decision.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810272215.PAA08872>