Date: Sun, 15 Feb 2004 12:01:56 -0500 (EST) From: Robert Watson <rwatson@FreeBSD.org> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-arch@FreeBSD.org Subject: Re: cvs commit: src/sys/sys jail.h src/sys/kern kern_jail.c vfs_syscalls.c Message-ID: <Pine.NEB.3.96L.1040215114204.56481B-100000@fledge.watson.org> In-Reply-To: <20040215161704.GY14639@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 15 Feb 2004, Pawel Jakub Dawidek wrote: > On Sat, Feb 14, 2004 at 10:31:12AM -0800, Robert Watson wrote: > +> Commiter: Robert Watson <rwatson@FreeBSD.org> > +> Branch: HEAD > +> > +> Files: > +> 1.36 src/sys/kern/kern_jail.c > +> 1.337 src/sys/kern/vfs_syscalls.c > +> 1.20 src/sys/sys/jail.h > +> > +> Log: > +> By default, when a process in jail calls getfsstat(), only return the > +> data for the file system on which the jail's root vnode is located. > +> Previous behavior (show data for all mountpoints) can be restored > +> by setting security.jail.getfsstatroot_only to 0. Note: this also > +> has the effect of hiding other mounts inside a jail, such as /dev, > +> /tmp, and /proc, but errs on the side of leaking less information. > > I don't like this fix... > > There are many problems related to the fact, that we store path where > file system is mounted as a string. > > This fix is one of them. I've wrote kld module some time ago that shows > file systems with cutted path in front (jail chroot directory was > removed). This wasn't a nice, clean way, but... > > In your fix we still leak of where-the-real-root-is information, of > course it is much better than we had before, but still not complete. > > Another problem (changing as PR somewhere) is that when you mount file > system in chroot environment, wrong path is stored (path releated to > chroot). This problem was really important in the past, because such > file system was totally unmountable, with FSID it is, but wrong path > still exists. > > I think the complete way is to store vnode related to the directory > where file system is mounted, instead of directory as a string. We have > some ideas to explore in future, for example allowing file systems > mounts inside of jail if vfs.usermount is 1 and then your fix will not > be enough. With such fix (vnode instead of string), we will be able to > always return file system names related to chroot directory. I'm still > not sure if we're able to implement this with our current vn_fullpath() > implementation, but we can try, or more - we can try to add a flag to > this function DONT_USE_CACHE_JUST_ASK_FILE_SYSTEM_DIRECTLY (as was > discussed on #thatchannel). Sooner or later we must do this (before > AUDIT will be merged?). > > I can prepare a patch to change this string to a vnode and we'll see. > What you say? Everything involving pathnames and VFS is evil and/or difficult. This problem smacks every UNIX system I've seen with regular frequency, and it's complicated by the following: - Vnodes may have no name (deleted but referenced files). - Vnodes may have more than one name (hard links) -- not only that, but new names can be created for most objects by unprivileged users. - Names may have more than one vnode (mountpoint covering, synthetic file systems). - Cached names become stale easily and cannot be easily updated. - Names are relative to a process context due to notions of current process root and current working directory. This is further complicated by the fact that UFS and NFS both encourage a philosophy of names simply being a "path" to reach an object, not a property of the object. Trying to change these assumptions will both be extremely difficult, and may also be un-UNIXy. However, there are some very strong motivations to find at least a partial solution: (1) Make mount strings returned by fsstat() and getfsstat() make sense regardless of context. (2) Make name pointers in procfs reliable and safe. (3) Provide accurate path information for security audit logs. Complications in solving this problem also include locking issues: it's generally safe to access the name cache if you have a strong vnode reference to look up "possible" names for an object. However, asking the file system for the name of an object reliably in UFS is probably both a disk-intensive and locking-complex operation (even pre-SMPng). The cache is, of course, unreliable for the above-identified reasons, and also that we can push intermediate vnodes in the path out of memory, meaning that it's a very expensive operation to pull them back in. If we lived in a world of HFS+ and volfs as on Darwin, we could cheat by returning the volfs path to the object, but that's not very useful from a user perspective, and so is basically useless despite being functionally correct (mostly). Finally, you might want to take a look at the implementation of vn_getpath() on Darwin, which relies on the stronger namespace semantics of HFS+, where all objects really do have parents, they maintain vnode back-pointers to parents, and can rely on the catalog entries for the directory tree being in memory (something we sacrifice for UFS directories for scalability reasons). So, I guess to conclude after railing: I went with the change I committed for the reason that it was the simplest change to give the desired result without increasing the strength of assumptions regarding the existence, correctness, and usefulness of pathnames. I agree we need a better solution, but juggling the traditional UNIX conventions for names and objects with the requirements of usability and security is hard. In earlier revisions of the patch, I did actually update the string for the root directory before exporting to userspace when masking other file system entries so that if you typed "df" in the jail, you saw the right "/" entry. However, I ommitted this in the committed version because it required the getfsstat() code to know more about how Jails work, whereas currently there's a simple jail decision function that is invoked by getfsstat(). I'm willing to explore many different alternative approaches, but I think we should avoid complexity, and also try to avoid hurting ourselves too badly on the sharp edges of UNIX namespaces. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040215114204.56481B-100000>