From owner-freebsd-arch@FreeBSD.ORG Thu Nov 10 16:56:05 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AEB671065672 for ; Thu, 10 Nov 2011 16:56:05 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3A1B28FC14 for ; Thu, 10 Nov 2011 16:56:04 +0000 (UTC) Received: by eyd10 with SMTP id 10so3314972eyd.13 for ; Thu, 10 Nov 2011 08:56:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=X3Bv1i0RzHGM1mHYJoVtQ86G87knkSHnARmTfQL/758=; b=yKS8dmoZw40WZUvVXnRJxAVigTpPnJuGUZ7om7ePc8gzLpC07MZS6EnePWQ9AYxp1c sH1GXezcCJiye+tlZAefZjHt6mroB/7sjLMCggswkMkovBWl7DBlKv/M0HPV+7ufi3pc 6ti1qPO4EZQmnLlOqs0VN3rhq1Rp/6C/KLNjc= MIME-Version: 1.0 Received: by 10.68.24.1 with SMTP id q1mr9297918pbf.29.1320944163123; Thu, 10 Nov 2011 08:56:03 -0800 (PST) Received: by 10.68.50.226 with HTTP; Thu, 10 Nov 2011 08:56:03 -0800 (PST) In-Reply-To: <20111110123919.GF2164@hoeg.nl> References: <20111110123919.GF2164@hoeg.nl> Date: Thu, 10 Nov 2011 08:56:03 -0800 Message-ID: From: Peter Wemm To: Ed Schouten Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org Subject: Re: The strangeness called `sbin' X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Nov 2011 16:56:05 -0000 On Thu, Nov 10, 2011 at 4:39 AM, Ed Schouten wrote: > Hi all, > > I suspect this email could be one of the last emails I'm sending before > one of you hire an assassin to get rid of me, but here it goes. > > A couple of days ago someone on IRC pointed me to the following > discussion that is taking place at Fedora right now: > > =A0 =A0 =A0 =A0http://thread.gmane.org/gmane.linux.redhat.fedora.devel/15= 5511/focus%3D155792 > > Even though I tend to disagree with Lennart's opinions here and there, > especially on point (h), where he explains there's no advantage of > decomposing the system into a separate / and /usr, I do agree with the > fact that `sbin' is a pretty weird thing. > > Nowadays the rule of thumb behind `sbin' is that it contains > applications that are normally only needed by system administrators, but > there are many tools in FreeBSD that contradict this rule: > > - md5(1) should be placed in /bin or /usr/bin, while it is stored in > =A0/sbin. It even has a man page in category 1. Very odd. > - last(1) and w(1) are placed in /usr/bin, while lastlogin(8) and ac(8) > =A0are placed in /usr/sbin. > - Tools like sysctl(8) and ifconfig(8) are usable by non-root users. > - ... > > Now that we're (hopefully) heading into an era where permissions in the > operating systems become more fine-grained, the distinction between bin > and sbin will become even more vague. > > Similar to the entire bin <-> sbin thing, I think /usr/games is also a > bit nonsensical, because the games -- including the fortune(6) database > -- only account for about 3.4 MB and FreeBSD 9 will ship with a clang(1) > binary that is a factor 8 or so larger. If people really want to get rid > of the games, they'd be better off running `make delete-old > WITHOUT_GAMES=3D' in /usr/src after the installation. > > My proposal is as follows: > > - Move everything in /sbin to /bin and turn it into a symbolic link > =A0pointing to /bin. > - Move everything in /usr/sbin to /usr/bin and turn it into a symbolic > =A0link pointing to /usr/bin. > - Move everything in /usr/games to /usr/bin and turn it into a symbolic > =A0link pointing to /usr/bin. [Argh, damn it! I had a huge reply here based on a misunderstanding of what you were proposing. I've collected some of the comments that are still even remotely relevant and tried to correct them, please forgive any I missed. I thought you were proposing a symlink farm rather than linking the dirs.] There's multiple factors at work, some still relevant, some no longer so. Once upon a time, there was this thing called $PATH. For each command you typed at a shell prompt, shells used to do this: foreach $dir (split(":", $PATH)) { execve($dir + $cmd, $args); } Indeed, execvp, execlp etc in libc still do precisely this. The more pathname components you jammed into $PATH, the more system calls and file system operations were involved in every single system(), shell script, etc. There was also not much in the way of a vfs pathname component cache back then. A lookup of a component required scanning directories in the usual case. They were usually in the buffer cache, but it still required scans. Scanning large directory files took longer than shorter ones, naturally enough, because it had to iterate across every entry and do a strcmp(). So, keeping directories smaller sped up shell scripts. The bin vs sbin split was done for multiple reasons. The general rule was that things that were only interesting or useful to admins, or required priviliges, or were related to system operation, generally went into sbin. There was this other evil thing of putting binaries in /etc.. eg: /etc/ifconfig, /etc/fsck and horrors like that. All of that crap was rounded up into sbin. The goal for bin was to have it have stuff that was relevant to end users and shell scripts etc, to make them faster. Take a random machine and directory contents that VOP_LOOKUP() has to process on a name cache miss: bin: 47 sbin: 122 usr/bin: 457 usr/sbin: 266 bin + /usr/bin =3D 504 bin + sbin + /usr/bin + /usr/sbin: 892 A name cache miss for something at the end of /usr/bin directory file causes a 504 node scan. If sbin is merged, that's 892 dirents scanned, at the worse case. We do have a decent name cache, but my recent time with it makes me wonder about a few things. We do a cache_enter() once we find a vnode to correspond with a name. Vnodes have to be resident in order to have a cached name. On machines with working sets in the order of millions of files, I don't imagine the vnodes for /bin and /usr/bin to hang around too long, and therefore the cache to purge quickly. There is also UFS_DIRHASH as a band-aid that would minimize the cost of a bunch of this, for the UFS case. There is also another user visible effect. Filename completion in shells is affected by this. If I use a shell with a basic user path (_PATH_DEFPATH "/usr/bin:/bin", which is trashed by login.conf) and hit 'm', it offers me 17 possible completions. If I add both sbin dirs to it, 'm' turns into 61 options. Of course, that pales in comparison to the impact of adding /usr/local/bin to the path, but it does show this does have potential user visibility. And there's also the issue that most most users add every possible directory to their $PATH anyway. Also.. there is still an extra impact of hitting symlinks. Suppose a $PATH has sbin earlier than bin. execlp("md5", "foo", 0) will find /sbin/md5 via the symlink before the real binary that moved to /bin/md5. Path evaluation would look a little like this (read the code in vfs_lookup.c:namei()): VOP_LOOKUP("/sbin/md5") -> namei() discovers sbin is a link and drops out the end of the loop. -> namei() does a VOP_READLINK() on sbin in "/", which ufs implements as an inode open, read, close -> namei() resets the path to /bin/md5 and jumps back to the top of the loop and starts again with locking the root vnode and iterating again to find "bin" in "/" Having said all that... There are reasons why it was done that way. I suspect the costs of the change are something we can eat and will be lost in other noise in the system. It's worth keeping in mind though that lots of incremental "small costs" add up over time when they're done in many many places. Is it really worth it though? Perhaps fix the couple of oddball cases instead? (eg: md5, lastlogin and friends). ac used to require access to privileged files due to privacy concerns on shared user systems. --=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell