From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 7 02:15:53 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3323F16A41F; Wed, 7 Sep 2005 02:15:53 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 61AB943D5C; Wed, 7 Sep 2005 02:15:45 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9p2/8.12.9) with ESMTP id j872FfYk040260; Tue, 6 Sep 2005 19:15:41 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id j872FeQE040259; Tue, 6 Sep 2005 19:15:40 -0700 (PDT) (envelope-from dillon) Date: Tue, 6 Sep 2005 19:15:40 -0700 (PDT) From: Matthew Dillon Message-Id: <200509070215.j872FeQE040259@apollo.backplane.com> To: Robert Watson References: <868xyack37.fsf@xps.des.no> <20050906191929.E78038@fledge.watson.org> Cc: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , kamalp@acm.org, freebsd-hackers@freebsd.org, Sergey Uvarov Subject: Re: vn_fullpath() again X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Sep 2005 02:15:53 -0000 At the cost of drawing ire from FreeBSD core developers, I will point out that reverse-resolution is hardly a black-and-white issue. There are many shades of grey, and there is a huge problem set that can either be solved or 99.99% of the way solved (greatly reducing the time required to solve the remainder) with a more robust namecache implementation. FreeBSD's implementation is basically at the lowest rung on the ladder. DragonFly's is a couple of rungs up. DragonFly can returned a guarenteed consistent path (even through renames of any component) to any open vnode as well as tell you whether the namespace used to access the vnode was remove()'d. For an auditing program or for generating a high level journaling stream to generate a mirror on a remote host, that covers 99.99% of the filesystem. NFS views from the client are one of those shades of gray, since files and directories can be ripped up by other clients or the server, but since clients have to assume a certain level of consistency anyway it's hardly a show stopper from the point of view of any real-life use or need. Hardlinks are one of those shades of gray, but they hardly invalidate the many uses that namepath resolution can be put to. 99.99% of the files on most filesystems are either not hardlinked or not removed once acccessed, after all, and at least with UFS a directory CAN'T be hardlinked. The important thing is to reduce the problem set to something manageable. For a mirroring program or an auditing program, being able to get valid paths in real time for nearly all the changes made to a filesystem is no small thing, and you at least get a definitive red flag for any hardlinks (simply by the fact that st_nlink is greater then one) and can do it the slow way (aka scan/index all files with st_nlink > 1) for the remaining few, and track realtime namespace operations on hardlinked files after that (which is very easy to do). As to how to solve the basic problem in FreeBSD... well, it basically isn't solvable to the degree that DFly has solved it unless someone good spends a lot of time rewriting the namecache code and the VFS API. BUT, short of doing that, I think it *IS* possible to rewrite enough of the namecache to at least make the namecache records consistent against the active vnodes and to not throw away namecache records for the directory chain leading up to any vnode. It's even possible to generate the chain for vnodes generated from file handles (inode numbers), which an NFS server op has to do quite often, because the directory is available in those cases (DragonFly does this for NFS server operations so I know it's possible). It is even possible to do even less work to maintain the associations... you don't even NEED to have a working namecache, in fact. All you need are ref'd directory vnodes in a chain from any leaf leading to the mount point... basically taking the vnode->v_dd field and changing it from a verifier heuristic to a real, ref'd directory vnode, with appropriate feedback from filesystem to fix things up for rename(), and mark the namecache entry as invalid for remove(). Given a valid directory vnode chain, you can ALWAYS regenerate a valid path (maybe not the only path, but a *VALID* path) to any vnode for all cases except the case where you have a hardlinked file that you have open()'d and remove()'d. Very few programs care about open but completely unlinked files. DragonFly can provide the original path to such a file, but it flags it as having been removed and one can almost certainly ignore such files for, e.g. filesystem mirroring and even for auditing if the file is not otherwise important. Considering the rarity of the case, it would be sufficient to simply red-flag the condition (which you can reliably do with a v_dd directory chain implementation). In summary, the implementation would be: * Maintain vref'd v_dd pointers in leaf vnodes representing the directory tree to a leaf so they can't go away until the leaf vnode goes away. * Handle NFS server based file handle -> vnode translation by resolving the chain to root (doable because the NFS server has access to the related directory vnode for all such translations). * Use the namecache when it exists, and * Create related namecache records when asked to resolve a full path when it doesn't by recursing upwards through the directory chains and scanning the directory to locate the name translation for the underlying vnode. DragonFly does this for the NFS server (see the cache_inefficient_scan() procedure in kern/vfs_cache.c in the DFly source for an example). That is more achievable in FreeBSD. In fact, I would say that it is VERY achievable in FreeBSD because you are not trying to maintain a fully coherent namecache like DragonFly does, you are simply maintaining enough information to be able to regenerate the path when the namecache record happens not to exist. This means you don't have to fix all the places in FreeBSD where it unconditionally invalidates large chunks of the namecache out of laziness in the original implementation (that alone took several months for me to fix in DragonFly). If nobody wants to do it, well, that's one thing, but it's different from saying that it's impossible when it clearly is not impossible. -Matt