Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Nov 2011 08:56:03 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Ed Schouten <ed@80386.nl>
Cc:        arch@freebsd.org
Subject:   Re: The strangeness called `sbin'
Message-ID:  <CAGE5yCr3BzWzwOAqo7wifgUTRC%2BG=2o4bDmk9H-%2BCxr=zJqYmw@mail.gmail.com>
In-Reply-To: <20111110123919.GF2164@hoeg.nl>
References:  <20111110123919.GF2164@hoeg.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 10, 2011 at 4:39 AM, Ed Schouten <ed@80386.nl> wrote:
> Hi all,
>
> I suspect this email could be one of the last emails I'm sending before
> one of you hire an assassin to get rid of me, but here it goes.
>
> A couple of days ago someone on IRC pointed me to the following
> discussion that is taking place at Fedora right now:
>
> =A0 =A0 =A0 =A0http://thread.gmane.org/gmane.linux.redhat.fedora.devel/15=
5511/focus%3D155792
>
> Even though I tend to disagree with Lennart's opinions here and there,
> especially on point (h), where he explains there's no advantage of
> decomposing the system into a separate / and /usr, I do agree with the
> fact that `sbin' is a pretty weird thing.
>
> Nowadays the rule of thumb behind `sbin' is that it contains
> applications that are normally only needed by system administrators, but
> there are many tools in FreeBSD that contradict this rule:
>
> - md5(1) should be placed in /bin or /usr/bin, while it is stored in
> =A0/sbin. It even has a man page in category 1. Very odd.
> - last(1) and w(1) are placed in /usr/bin, while lastlogin(8) and ac(8)
> =A0are placed in /usr/sbin.
> - Tools like sysctl(8) and ifconfig(8) are usable by non-root users.
> - ...
>
> Now that we're (hopefully) heading into an era where permissions in the
> operating systems become more fine-grained, the distinction between bin
> and sbin will become even more vague.
>
> Similar to the entire bin <-> sbin thing, I think /usr/games is also a
> bit nonsensical, because the games -- including the fortune(6) database
> -- only account for about 3.4 MB and FreeBSD 9 will ship with a clang(1)
> binary that is a factor 8 or so larger. If people really want to get rid
> of the games, they'd be better off running `make delete-old
> WITHOUT_GAMES=3D' in /usr/src after the installation.
>
> My proposal is as follows:
>
> - Move everything in /sbin to /bin and turn it into a symbolic link
> =A0pointing to /bin.
> - Move everything in /usr/sbin to /usr/bin and turn it into a symbolic
> =A0link pointing to /usr/bin.
> - Move everything in /usr/games to /usr/bin and turn it into a symbolic
> =A0link pointing to /usr/bin.

[Argh, damn it! I had a huge reply here based on a misunderstanding of
what you were proposing.  I've collected some of the comments that are
still even remotely relevant and tried to correct them, please forgive
any I missed.  I thought you were proposing a symlink farm rather than
linking the dirs.]

There's multiple factors at work, some still relevant, some no longer so.

Once upon a time, there was this thing called $PATH.

For each command you typed at a shell prompt, shells used to do this:

foreach $dir (split(":", $PATH)) {
  execve($dir + $cmd, $args);
}

Indeed, execvp, execlp etc in libc still do precisely this.  The more
pathname components you jammed into $PATH, the more system calls and
file system operations were involved in every single system(), shell
script, etc.

There was also not much in the way of a vfs pathname component cache
back then.  A lookup of a component required scanning directories in
the usual case.  They were usually in the buffer cache, but it still
required scans.  Scanning large directory files took longer than
shorter ones, naturally enough, because it had to iterate across every
entry and do a strcmp().  So, keeping directories smaller sped up
shell scripts.

The bin vs sbin split was done for multiple reasons.  The general rule
was that things that were only interesting or useful to admins, or
required priviliges, or were related to system operation, generally
went into sbin.  There was this other evil thing of putting binaries
in /etc..   eg: /etc/ifconfig, /etc/fsck and horrors like that.  All
of that crap was rounded up into sbin.

The goal for bin was to have it have stuff that was relevant to end
users and shell scripts etc, to make them faster.

Take a random machine and directory contents that VOP_LOOKUP() has to
process on a name cache miss:
bin: 47
sbin: 122
usr/bin: 457
usr/sbin: 266

bin + /usr/bin =3D 504
bin + sbin + /usr/bin + /usr/sbin: 892


A name cache miss for something at the end of /usr/bin directory file
causes a 504 node scan.  If sbin is merged, that's 892 dirents
scanned, at the worse case.

We do have a decent name cache, but my recent time with it makes me
wonder about a few things.  We do a cache_enter() once we find a vnode
to correspond with a name.  Vnodes have to be resident in order to
have a cached name.  On machines with working sets in the order of
millions of files, I don't imagine the vnodes for /bin and /usr/bin to
hang around too long, and therefore the cache to purge quickly. There
is also UFS_DIRHASH as a band-aid that would minimize the cost of a
bunch of this, for the UFS case.

There is also another user visible effect.  Filename completion in
shells is affected by this.

If I use a shell with a basic user path (_PATH_DEFPATH
"/usr/bin:/bin", which is trashed by login.conf) and hit 'm<tab>', it
offers me 17 possible completions.  If I add both sbin dirs to it,
'm<tab>' turns into 61 options.

Of course, that pales in comparison to the impact of adding
/usr/local/bin to the path, but it does show this does have potential
user visibility.  And there's also the issue that most most users add
every possible directory to their $PATH anyway.

Also.. there is still an extra impact of hitting symlinks.  Suppose a
$PATH has sbin earlier than bin.  execlp("md5", "foo", 0) will find
/sbin/md5 via the symlink before the real binary that moved to
/bin/md5.  Path evaluation would look a little like this (read the
code in vfs_lookup.c:namei()):
VOP_LOOKUP("/sbin/md5")
-> namei() discovers sbin is a link and drops out the end of the loop.
-> namei() does a VOP_READLINK() on sbin in "/", which ufs implements
as an inode open, read, close
-> namei() resets the path to /bin/md5 and jumps back to the top of
the loop and starts again with locking the root vnode and iterating
again to find "bin" in "/"

Having said all that... There are reasons why it was done that way.  I
suspect the costs of the change are something we can eat and will be
lost in other noise in the system.

It's worth keeping in mind though that lots of incremental "small
costs" add up over time when they're done in many many places.

Is it really worth it though?  Perhaps fix the couple of oddball cases
instead? (eg: md5, lastlogin and friends). ac used to require access
to privileged files due to privacy concerns on shared user systems.
--=20
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGE5yCr3BzWzwOAqo7wifgUTRC%2BG=2o4bDmk9H-%2BCxr=zJqYmw>