From owner-freebsd-current@FreeBSD.ORG Wed Jan 14 13:50:59 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0447D16A4CE; Wed, 14 Jan 2004 13:50:59 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 055CD43D45; Wed, 14 Jan 2004 13:50:57 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i0ELnEUd053963; Wed, 14 Jan 2004 16:49:14 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i0ELnERa053960; Wed, 14 Jan 2004 16:49:14 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Wed, 14 Jan 2004 16:49:14 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Don Lewis In-Reply-To: <200401142101.i0EL1t7E040382@gw.catspoiler.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@FreeBSD.org Subject: Re: simplifying linux_emul_convpath() X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jan 2004 21:50:59 -0000 On Wed, 14 Jan 2004, Don Lewis wrote: > The typical user of something like this would be tar when it is deciding > what to hardlink together. One could make a case for making a nullfs > mounted copy match the original (or two separately mounted nullfs copies > match each other). That would do the "right" think when archiving a > file tree containing nullfs mount points and untarring into a single > file system, except that it would confuse the heck out of tar because > the link counts would be wrong. The VOP would be cheap, too. But what > about a crypto or compression layer? > > The problem for something like tar is that this mechanism doesn't scale > well. When creating an archive, tar keeps a database of pathnames of > files that have more than one link, with the inode number as the key. > Each time encounters a file with multiple links, it does a lookup in the > database. If it finds a match, it outputs a record with the pathname it > found in the database, and if it didn't find a match it adds a new > record to the database. This can be done with reasonable efficiency in > userland. If the only way of comparing if two files were the same were > to use syscalls, it would be terribly slow. Tar would only be able to > keep a list of the pathnames and would have to iterate through the list > doing the syscall for each entry in search of a match to the current > file it was processing. This is an O^2 problem with a syscall in the > loop. Tar might be able to narrow the search by matching file > attributes, but it would still be possible to have degenerate cases > unless the inode number were used as an attribute (which would not work > if you wanted nullfs copies to match). My thought was that device id and inode number could act as a hint -- only instead of assuming (id == id && fsid == fsid) meant a match, you'd then resolve the match using samefile(). > There are programs that could make use of samefile(), such as cp. It > would probably want a nullfs copy to match the original. The stacked file system issue is an interesting one -- I think only the file system that owns an object can decide if the two are the same. So in the stacked case, it might be possible to have a model where you actually call VOP_SAMEFILE() on both vnodes, and if either matches, it's considered OK. The stacked file system would determine if a vnode was "local" or "stacked", and then pass the VOP_SAMEFILE() down the stack if it was a stacked vnoded. Most stacked file systems would probably simply say it was different, but nullfs might say it was the same... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research