Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Feb 2004 12:01:56 -0500 (EST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: cvs commit: src/sys/sys jail.h src/sys/kern kern_jail.c vfs_syscalls.c
Message-ID:  <Pine.NEB.3.96L.1040215114204.56481B-100000@fledge.watson.org>
In-Reply-To: <20040215161704.GY14639@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 15 Feb 2004, Pawel Jakub Dawidek wrote:

> On Sat, Feb 14, 2004 at 10:31:12AM -0800, Robert Watson wrote:
> +>   Commiter:	Robert Watson <rwatson@FreeBSD.org>
> +>   Branch:	HEAD
> +> 
> +>   Files:
> +> 	1.36    src/sys/kern/kern_jail.c     
> +> 	1.337   src/sys/kern/vfs_syscalls.c  
> +> 	1.20    src/sys/sys/jail.h           
> +> 
> +>   Log:
> +>   By default, when a process in jail calls getfsstat(), only return the
> +>   data for the file system on which the jail's root vnode is located.
> +>   Previous behavior (show data for all mountpoints) can be restored
> +>   by setting security.jail.getfsstatroot_only to 0.  Note: this also
> +>   has the effect of hiding other mounts inside a jail, such as /dev,
> +>   /tmp, and /proc, but errs on the side of leaking less information.
> 
> I don't like this fix... 
> 
> There are many problems related to the fact, that we store path where
> file system is mounted as a string.
> 
> This fix is one of them. I've wrote kld module some time ago that shows
> file systems with cutted path in front (jail chroot directory was
> removed).  This wasn't a nice, clean way, but... 
> 
> In your fix we still leak of where-the-real-root-is information, of
> course it is much better than we had before, but still not complete. 
> 
> Another problem (changing as PR somewhere) is that when you mount file
> system in chroot environment, wrong path is stored (path releated to
> chroot).  This problem was really important in the past, because such
> file system was totally unmountable, with FSID it is, but wrong path
> still exists. 
> 
> I think the complete way is to store vnode related to the directory
> where file system is mounted, instead of directory as a string.  We have
> some ideas to explore in future, for example allowing file systems
> mounts inside of jail if vfs.usermount is 1 and then your fix will not
> be enough.  With such fix (vnode instead of string), we will be able to
> always return file system names related to chroot directory.  I'm still
> not sure if we're able to implement this with our current vn_fullpath()
> implementation, but we can try, or more - we can try to add a flag to
> this function DONT_USE_CACHE_JUST_ASK_FILE_SYSTEM_DIRECTLY (as was
> discussed on #thatchannel). Sooner or later we must do this (before
> AUDIT will be merged?). 
> 
> I can prepare a patch to change this string to a vnode and we'll see. 
> What you say? 

Everything involving pathnames and VFS is evil and/or difficult.  This
problem smacks every UNIX system I've seen with regular frequency, and
it's complicated by the following:

- Vnodes may have no name (deleted but referenced files).

- Vnodes may have more than one name (hard links) -- not only that, but
  new names can be created for most objects by unprivileged users.

- Names may have more than one vnode (mountpoint covering, synthetic file 
  systems).

- Cached names become stale easily and cannot be easily updated.

- Names are relative to a process context due to notions of current
  process root and current working directory.

This is further complicated by the fact that UFS and NFS both encourage a
philosophy of names simply being a "path" to reach an object, not a
property of the object.  Trying to change these assumptions will both be
extremely difficult, and may also be un-UNIXy.  However, there are some
very strong motivations to find at least a partial solution:

(1) Make mount strings returned by fsstat() and getfsstat() make sense
    regardless of context.

(2) Make name pointers in procfs reliable and safe.

(3) Provide accurate path information for security audit logs.

Complications in solving this problem also include locking issues: it's
generally safe to access the name cache if you have a strong vnode
reference to look up "possible" names for an object.  However, asking the
file system for the name of an object reliably in UFS is probably both a
disk-intensive and locking-complex operation (even pre-SMPng).  The cache
is, of course, unreliable for the above-identified reasons, and also that
we can push intermediate vnodes in the path out of memory, meaning that
it's a very expensive operation to pull them back in.

If we lived in a world of HFS+ and volfs as on Darwin, we could cheat by
returning the volfs path to the object, but that's not very useful from a
user perspective, and so is basically useless despite being functionally
correct (mostly).

Finally, you might want to take a look at the implementation of
vn_getpath() on Darwin, which relies on the stronger namespace semantics
of HFS+, where all objects really do have parents, they maintain vnode
back-pointers to parents, and can rely on the catalog entries for the
directory tree being in memory (something we sacrifice for UFS directories
for scalability reasons).

So, I guess to conclude after railing: I went with the change I committed
for the reason that it was the simplest change to give the desired result
without increasing the strength of assumptions regarding the existence,
correctness, and usefulness of pathnames.  I agree we need a better
solution, but juggling the traditional UNIX conventions for names and
objects with the requirements of usability and security is hard.  In
earlier revisions of the patch, I did actually update the string for the
root directory before exporting to userspace when masking other file
system entries so that if you typed "df" in the jail, you saw the right
"/" entry.  However, I ommitted this in the committed version because it
required the getfsstat() code to know more about how Jails work, whereas
currently there's a simple jail decision function that is invoked by
getfsstat().  I'm willing to explore many different alternative
approaches, but I think we should avoid complexity, and also try to avoid
hurting ourselves too badly on the sharp edges of UNIX namespaces. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040215114204.56481B-100000>