Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Apr 2014 16:31:33 -0700
From:      Doug Ambrisko <ambrisko@ambrisko.com>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        FreeBSD Hackers <freebsd-hackers@FreeBSD.org>, kib@FreeBSD.org, Bryan Drewery <bdrewery@FreeBSD.org>
Subject:   Re: Fix MNAMELEN or reimplement struct statfs
Message-ID:  <20140415233133.GA14686@ambrisko.com>
In-Reply-To: <20140415204348.GA89088@stack.nl>
References:  <5348952A.3080304@FreeBSD.org> <20140415190302.GA21076@ambrisko.com> <20140415204348.GA89088@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 15, 2014 at 10:43:48PM +0200, Jilles Tjoelker wrote:
| On Tue, Apr 15, 2014 at 12:03:02PM -0700, Doug Ambrisko wrote:
| > On Fri, Apr 11, 2014 at 08:21:46PM -0500, Bryan Drewery wrote:
| > | On Tue, Nov 19 21:53:38 UTC 2013 Jilles Tjoelker wrote:
| > | > On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote:
| > | >> On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote:
| > | >> | I think that struct mount should have a const char * field where the
| > | >> | non-trimmed path is stored and used for match at unmount. f_mntonname
| > | >> | truncation would be only unfortunate user interface glitch.
| 
| > | >> Note that we are not storing the path in mount structure so no structures
| > | >> have changed which is nice since then we haven't introduced any real
| > | >> ABI breakage.  So we could MFC this.  The match isn't critical since
| > | >> umount will fall back to fsid and work.  One thing that might be good to
| > | >> do is change umount to try to umount via fsid first and then do the
| > | >> match if the fsid failed versus the other way round that it does now.
| 
| > | >> The problem I see is if someone tries to do things based on the parsed
| > | >> output of mount/df then that will fail since the output is truncated.
| 
| > | > As noted in comments in sbin/umount/umount.c, the statfs() call is
| > | > deliberately after the mount list checks because it may block forever
| > | > for unresponsive NFS servers. It would be unfortunate if hung NFS
| > | > filesystems would have to be forcibly unmounted by copy/pasting the fsid
| > | > from 'mount -v'.
| 
| > | From a user perspective, I'd love to see this get committed and MFC'd.
| > | It's very odd to have ENAMETOOLONG errors while traversing .zfs/snapshot.
| 
| > I have a new patch at:
| > 	http://people.freebsd.org/~ambrisko/mount_bigger_2.patch
| > that I tested against head.  This should be pretty close to commiting
| > unless people find some issues with it.
| 
| In sys/kern/vfs_mount.c:
| +		mp->mnt_path = malloc(strlen(fspath), M_MOUNT, M_WAITOK);
| +		strlcpy((char *)mp->mnt_path, fspath, strlen(fspath));
| 
| This always strips the last byte off the fspath.
| 
| I like that this only touches the kernel, so it does not break anything
| regarding mount/umount of filesystems with short paths, including
| (NFS) filesystems that do not respond.
| 
| The patch does not enlarge f_mntfromname which may be a problem for
| nullfs. It is certainly a step forwards for poudriere but [ENAMETOOLONG]
| errors could still occur in more extreme situations.

Good point on nullfs.  I'll look at fixing that.  To do that I'm
changing mnt_path to mnt_topath so then I can have a mnt_frompath.
I'll add nullfs to my test cases.  I'll need to run through the uses
of f_mntfromname.  It was pretty easy with f_mntonname since it was
only allocated in one place just used a bunch of other place.  I assume
that mount root would be short.
 
| > | However, this would make the situation worse for poudriere, which is
| > | what this particular thread was started on. It does exactly what you
| > | worry about, it parses mount(1) output and umounts all descendants for a
| > | given path. We do the same thing at my work for our base build system,
| > | and I believe FreeNAS is doing something like this. So it's not uncommon.
| 
| > | Or did the situation improve with the latest patch to show the full
| > | mount path in mount(1)?
| 
| > Not yet but could be address in a subsequent enhancement.  The framework
| > in the kernel to handle longer paths and track the full path name is
| > there now.  Changes will need to be done in the structure passed to mount.
| > This can be done if we implement a statvfs structure that can pass this
| > data back.  Since we don't have statvfs really defined we can implement
| > it to deal with longer paths.  Then mount can be updated to show the full
| > path.
| 
| > | We could implement umount -r, but I'm not convinced that is adequate due
| > | to unknown use of mount(1) output. We really need the real path exported
| > | to userland somehow, and preferably to mount(1) by default to not break
| > | scripts.
| 
| > | This may not be a clean solution, but couldn't we add another syscall,
| > | say getfsmntpath(2), and have mount(1) use both statfs(2) and
| > | getfsmntpath(2) to show a proper output?
| 
| > I think we can do this with statvfs to give full path names.
| 
| I agree that a new syscall or similar is needed in a later patch but
| disagree that it should be statvfs(). To call statvfs(), the
| filesystem's path must already be known and the filesystem (and all
| parent filesystems) must respond. In fact, struct statvfs does not
| include any pathnames at all.
| 
| If it is acceptable that only root can query the pathnames, a simple
| sysctl can map f_fsid to mnt_path (and the extended f_mntfromname). This
| can then be used together with statfs(2), fstatfs(2), getfsstat(2) or
| getmntinfo(3) (but why would one use the latter?).
| 
| It is possible to extend getfsstat(2) (but not statfs(2)/fstatfs(2)) to
| use an f_spare field to point to long pathnames stored elsewhere in the
| buffer (using the flags argument to enable the new behaviour).
| 
| Simply increasing MNAMELEN would make struct statfs over 2 kilobytes
| large, which would lead to a rather large result from getfsstat(2) on a
| machine with many filesystems mounted.

We don't want to change statfs since that causes other ripple effects
due to binary layout changes.  From what I remember when I last looked
(which was quite a while ago) is that our statvfs is a stub and not a full
structure like NetBSD has.  NetBSD switched from statfs to statvfs and
made things bigger.  Atleast that is what I remember when I looked.

It's good to start thinking about for phase 2.

Thanks,

Doug A.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140415233133.GA14686>