From owner-freebsd-current@FreeBSD.ORG  Wed Jan 14 13:50:59 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0447D16A4CE; Wed, 14 Jan 2004 13:50:59 -0800 (PST)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 055CD43D45; Wed, 14 Jan 2004 13:50:57 -0800 (PST)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i0ELnEUd053963;
	Wed, 14 Jan 2004 16:49:14 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i0ELnERa053960;
	Wed, 14 Jan 2004 16:49:14 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Wed, 14 Jan 2004 16:49:14 -0500 (EST)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: Don Lewis <truckman@FreeBSD.org>
In-Reply-To: <200401142101.i0EL1t7E040382@gw.catspoiler.org>
Message-ID: <Pine.NEB.3.96L.1040114164711.49872H-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: current@FreeBSD.org
Subject: Re: simplifying linux_emul_convpath()
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jan 2004 21:50:59 -0000


On Wed, 14 Jan 2004, Don Lewis wrote:

> The typical user of something like this would be tar when it is deciding
> what to hardlink together.  One could make a case for making a nullfs
> mounted copy match the original (or two separately mounted nullfs copies
> match each other).  That would do the "right" think when archiving a
> file tree containing nullfs mount points and untarring into a single
> file system, except that it would confuse the heck out of tar because
> the link counts would be wrong.  The VOP would be cheap, too. But what
> about a crypto or compression layer? 
> 
> The problem for something like tar is that this mechanism doesn't scale
> well. When creating an archive, tar keeps a database of pathnames of
> files that have more than one link, with the inode number as the key. 
> Each time encounters a file with multiple links, it does a lookup in the
> database.  If it finds a match, it outputs a record with the pathname it
> found in the database, and if it didn't find a match it adds a new
> record to the database.  This can be done with reasonable efficiency in
> userland.  If the only way of comparing if two files were the same were
> to use syscalls, it would be terribly slow.  Tar would only be able to
> keep a list of the pathnames and would have to iterate through the list
> doing the syscall for each entry in search of a match to the current
> file it was processing.  This is an O^2 problem with a syscall in the
> loop.  Tar might be able to narrow the search by matching file
> attributes, but it would still be possible to have degenerate cases
> unless the inode number were used as an attribute (which would not work
> if you wanted nullfs copies to match). 

My thought was that device id and inode number could act as a hint -- only
instead of assuming (id == id && fsid == fsid) meant a match, you'd then
resolve the match using samefile().

> There are programs that could make use of samefile(), such as cp.  It
> would probably want a nullfs copy to match the original.

The stacked file system issue is an interesting one -- I think only the
file system that owns an object can decide if the two are the same.  So in
the stacked case, it might be possible to have a model where you actually
call VOP_SAMEFILE() on both vnodes, and if either matches, it's considered
OK.  The stacked file system would determine if a vnode was "local" or
"stacked", and then pass the VOP_SAMEFILE() down the stack if it was a
stacked vnoded.  Most stacked file systems would probably simply say it
was different, but nullfs might say it was the same...

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research