Date: Tue, 2 Sep 2003 22:08:10 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Kirk McKusick <mckusick@beastie.mckusick.com>, Paul Saab <ps@yahoo-inc.com> Cc: hackers@freebsd.org Subject: Re: mksnap_ffs, snapshot issues, again Message-ID: <200309030508.h8358Aj6020352@apollo.backplane.com> References: <200308232123.h7NLNwol054243@beastie.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
: :Thanks for your thoughts. Let me know how your idea progresses :(e.g., whether you get to work :-) : : ~Kirk I am starting work on on cache_lookup() and related functions now in DragonFly. I expect it will take at least the week to get a prototype working and I believe the resulting patch set will be fairly easy to port to FreeBSD. The first step I am taking is to make the namecache more persistent. *ALL* active vnodes accessed via lookup() will be guarenteed to have a namecache entry. I have successfully booted a test system with this change, though I am sure there are bugs :-). I am not entirely sure what to do about the filehandle functions but I am not going to worry about it for the moment. What this basically means is that vnodes cannot be recycled in the 'middle' of the topology, only at the leafs. This does not present a big problem since files are always leafs. The namecache topology will be guarenteed to remain unbroken except in cases where the namespace is deleted (e.g. removing a file with active descriptors). The second step will be to use a namecache structural pointer for our 'directory handle' in all places in the system where a directory handle is expected (e.g. 'dvp'). This will also involve getting rid of the vnode-based parent directory support (v_dd). Since the namecache structure has a pointer to the vnode this is a pretty easy step. A little harder will be to fix all the directory scanning functions to use the namecache topology instead of the vnode topology. The third step will be to use the namecache for all name-based locking operations instead of the underlying vnodes. For example, if you are renaming a/b to c/d you only need to hold locks on the namecache entry representing "b" and the one representing "d" prior to executing the rename operation. The one representing "d" will be a negative cache entry, placemarking the operation. This will not only completely solve the locking issues with rename(), remove(), and create, it also completely solves directory recursion stalls in both directions, completely solves the race to root issue, solves most of the directory lookup stalls (cache case lookups will not need a vnode lock and can run in parallel with directory operations that do need the vnode lock), gets rid of all name-related deadlock situations, and potentially allows modifying directory operations to become reentrant down the line. The fourth step will be a *BIG* Carrot... the namecache topology does not have a problem with vnodes appearing in multiple places in the filesystem. This means that (A) it will be possible to hardlink directories and (B) it will be possible to implement semi-hard links, basically softlinks that *look* like hardlinks and (C) to be able to CD forwards and backwards without the system getting confused about paths. In other words, some way cool shit. Additional optimizations are possible for the future. For example, it will be possible to cache ucred pointers in the namecache structure and thus allow namei() to *COMPLETELY* resolve a path without making any VOP calls at all, which will at least double and probably quadruple best case path lookup performance. I'll post an update after step 3, probably near the end of the week or on the weekend. I expect people will start screaming for Step 4 now that they know it is possible :-) -Matt Matthew Dillon <dillon@backplane.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200309030508.h8358Aj6020352>