From owner-freebsd-fs Tue Oct 27 14:17:25 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA00890 for freebsd-fs-outgoing; Tue, 27 Oct 1998 14:17:25 -0800 (PST) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA00879 for ; Tue, 27 Oct 1998 14:17:16 -0800 (PST) (envelope-from tlambert@usr04.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id PAA15710; Tue, 27 Oct 1998 15:16:14 -0700 (MST) Received: from usr04.primenet.com(206.165.6.204) via SMTP by smtp03.primenet.com, id smtpd015635; Tue Oct 27 15:16:08 1998 Received: (from tlambert@localhost) by usr04.primenet.com (8.8.5/8.8.5) id PAA08872; Tue, 27 Oct 1998 15:15:53 -0700 (MST) From: Terry Lambert Message-Id: <199810272215.PAA08872@usr04.primenet.com> Subject: Re: deadfs in FreeBSD 3.0/current ? To: otok@students.itb.ac.id (Otok Berliawan) Date: Tue, 27 Oct 1998 22:15:53 +0000 (GMT) Cc: tlambert@primenet.com, lha@e.kth.se, michaelh@cet.co.jp, freebsd-fs@FreeBSD.ORG, kom-arla@stacken.kth.se In-Reply-To: from "Otok Berliawan" at Oct 27, 98 09:27:27 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I'm the new comer in FreeBSD enviroment,I have been confusing with this > topic, can u explain me, what for the deadfs file???? The "deadfs" VFS layer file system implementation is a file system that can be referenced by revoked vnodes without each FS needing to do revoked node management. It's a rather unsatisfying kludge to get around the fact that some file systems would like to manage their own vnode pool, but the vnode pool has been declared, by fiat, to be a system wide resource. Here are the gory details... The VFS implementation abstracts two interfaces: VFSOPS Operations on VFS objects, which are the exposed interfaces to instances of actual file systems VNOPS Operations on vnode objects, which are the exposed interfaces to instances of acutal files on a single file system instance It's important to undestand what a "file" is in order to understand the architecture, and thus the need for a deadfs. For right now, we're going to ignore the mount, unmount, and other per-fs operations, so we can concentrate on the VNOPS and totally ignore the VFSOPS. At the top level, we have a "file". In user space, this is an "fd" -- a "file descriptor object". This object is an index. Starting with the process, we see that there is a pointer in called p_fd: STRUCTURE: struct proc DEFINED: /usr/include/sys/proc.h MEMBERS: p_fd DEFINITION: The p_fd member is a pointer to a struct filedesc, which is the list of files open by the process. >From there, we can see what a filedesc is: STRUCTURE: struct filedesc DEFINED: /usr/include/sys/filedesc.h MEMBERS: fd_ofiles DEFINITION: The fd_ofiles member is a pointer to a list of struct file pointers that represent the list of files open in a process. So to get to the actual struct file for any user space descriptor, we dereference the current process (called "curproc" here, for convenience), get the p_fd member, get the list of open struct file pointers, and index it by the descriptor number: curproc->p_fd->fd_ofiles[ fd]; This gives us a pointer to the struct file associated with the open file. We are interested in three fields: STRUCTURE: struct file DEFINED: /usr/include/sys/file.h MEMBERS: f_type f_ops f_data DESCRIPTION: The f_type field specifies the type of the file descriptor (in the kernel, we call a "struct file" a descriptor; this is somewhat confusing if you are used to user space calling an fd a descriptor) can be one of: DTYPE_VNODE A vnode, which is to say, the f_data member points to a vnode and the f_ops member points to the struct fileops vnops, which is defined in /sys/kern/vfs_vnops.c DTYPE_SOCKET A socket, which is to say the f_data member points to a struct socket and the f_ops member points to the struct fileops socketops, which is defined in /sys/kern/sys_socket.c DTYPE_PIPE A pipe, which is to say, the f_data member points to a struct pipe and the f_ops member points to the struct fileops pipeops, which is defined in /sys/kern/sys_pipe.c DTYPE_FIFO A FIFO, which is to say, the f_data member points to a struct socket and the f_ops member points to the struct fileops socketops, which is defined in /sys/kern/sys_socket.c We should note at this point that the existance of the struct fileops is a kludge. It exists because the VFS layer was not completely and correctly integrated in the 4.4BSD-Lite/4.4BSD-Lite2 code. It is the fact that the struct socket (defined in /usr/include/sys/socketvar.h) and the struct pipe (defined in /usr/include/sys/pipe.h), combined with the lack of accessor functions in struct fileops, which defines only read/write/ioctl/poll entry points, a subset of the system call interface, that it is impossible to do things like file locking on sockets or credential management on pipes, etc.. Now the intersting part here is the DTYPE_VNODE value for f_data, in which case the struct fileops entries map to VOP_ calls... the VNOPS with which we are interested, and in which they operate against the vnode itself. Note: There are many system calls that are not covered by the struct fileops functions -- the read/write/ioctl/poll entry points, noted above. Such calls *directly* test to see if f_type is DTYPE_VNODE, and if it is not, fail the call immediately. otherwise they make appropriate VOP_ calls to implement the action requested by the user process. See /sys/kern/vfs_syscalls.c for more details. Note: It may seem like the struct file is the only VFS/vnode consumer; it isn't. Other consumers are various kernel interfaces, execution class loaders for loading FreeBSD native and emulated binary types, and other direct VFS consumers, like the NFS server software. It only seems that the struct file is the only consumer because of poor architectural abstraction (Some Pigs Are More Equal Than Others). So now on to answering the original question... why is there a deadfs? If we look at the vnode structure, we see a pointer to the VOP_ calls in the member v_op: STRUCTURE: struct vnode DEFINED: /usr/include/sys/vnode.h MEMBERS: v_op DEFINITION: The v_op file is the vnode operations vector; it is a pointer to a structure that contains the addresses of int pointers to functions that take opaque void * descriptor arguments -- in other words, a pointer to an array of vop_t's. Now let's say that we decide a particular vnode is no longer valid; if that vnode is referenced by an open file in user space, we now have a problem: we are trying to invalidate the vnode itself, but we don't know what f_data members of what kernel descriptors are pointing to it on behalf of files opened by what processes. We have a conundrum: how do we invalidate a vnode, such that the next reference to the vnode fails gracefully, and causes the upper level code to recognize that the underlying object is invalid, and, hopefully, destroy the reference? If only we could do this, then we will eventually work our way down to zero references, at which time the system can reclaim the vnode object back to the system vnode pool. The way we do this is we call VOP_REVOKE, which falls all the ways down to the default code in /sys/kern/vfs_default.c (which is a blatant failure to implement the Heidemann stacking framework correctly, but one which we will not get into here because we are interested in the reason for deadfs, not "why FreeBSD VFS stacking doesn't work right"), and calls the generic vop_revoke() in the file /sys/kern/vfs_subr.c. This results in vclean() (in the same file) setting v_op to dead_vnodeop_p... in other words, marking the vnode "dead", so that no further operations against it will call the real filesystem functions, but will instead fail all the way back to the point that the references can be deleted. And that's what deadfs is for: post invalidation because FreeBSD doesn't have the integration it should to remove the need for the struct fileops bypass, and because the open file instances are not reverse linked so that the objects pointing to vnodes can have their pointers invalidated, and because the model insists on vnode ownership by the system instead of by a particular file system, even when the particular file system architecture makes that a bad design decision. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message