From owner-freebsd-stable@FreeBSD.ORG Sun Jun 6 16:32:04 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2DAFE16A4CE for ; Sun, 6 Jun 2004 16:32:04 -0700 (PDT) Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB6A843D4C for ; Sun, 6 Jun 2004 16:32:03 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp2.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i56NW2IX009836; Sun, 6 Jun 2004 19:32:03 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <200406040806.i5486UYG081359@pele.r.caley.org.uk> References: <200406040806.i5486UYG081359@pele.r.caley.org.uk> Date: Sun, 6 Jun 2004 19:32:01 -0400 To: Richard Caley , freebsd-stable@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) Subject: Re: Find / cd mount bug? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jun 2004 23:32:04 -0000 At 9:06 AM +0100 6/4/04, Richard Caley wrote: >I thought this weird behaviour might be of interest to anyone >working on the relevent bits of code. ... > > $ find /cdrom -name \*.mp3 | wc > 49 49 3387 > > $ find /cdrom -name \*.mp3 -type f | wc > 77 77 5214 > >Ie adding an extra restriction increases the number of results. >The first one isn't giving an error at the point where it stops. This is almost certainly an optimization in `find', where it expects that the link-count for a directory is equal to the number of sub-directories + 2. The +2 is assumed to be '.' and '..'. But some file systems do not really have links for '.' and '..'. What happens is that something in `find' believes that once it has found "link-count - 2" directories, than it must have found all the real sub-directories in that directory. By making this assumption, it can avoid of doing a lot of unnecessary stat() calls. But on file systems which do *not* have links for '.' and '..', this will result in `find' skipping the last two real sub-directories -- without thinking that any error has occurred. By adding the extra restriction, `find' *must* do the extra stat() calls anyway, so it can not perform this optimization. Note that in some situations, this optimization can result in a pretty significant performance boost, so it is worth doing (when it works correctly... :-). I know this problem shows up on some CD-ROM file systems, but I don't know if it happens on all of them. I also haven't looked into `find' specifically, but I know I ran into this on some other programs which perform this optimization. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu