Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 03 Oct 2014 22:08:55 -0600
From:      Ian Lepore <ian@FreeBSD.org>
To:        NGie Cooper <yaneurabeya@gmail.com>
Cc:        "src-committers@freebsd.org" <src-committers@freebsd.org>, svn-src-stable-10@freebsd.org, svn-src-stable@freebsd.org, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, Glen Barber <gjb@freebsd.org>, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r272372 - stable/10/bin/rm
Message-ID:  <1412395735.12052.104.camel@revolution.hippie.lan>
In-Reply-To: <CAGHfRMCLO-i%2BFpmrTF6RzAPb1owECKG9Joh6cRZd9c8Cc0%2BJRw@mail.gmail.com>
References:  <201410011618.s91GIfR5071251@svn.freebsd.org> <20141002141656.Y1807@besplex.bde.org> <20141002061628.GO1275@hub.FreeBSD.org> <CAGHfRMCLO-i%2BFpmrTF6RzAPb1owECKG9Joh6cRZd9c8Cc0%2BJRw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2014-10-01 at 23:25 -0700, NGie Cooper wrote:
> On Wed, Oct 1, 2014 at 11:16 PM, Glen Barber <gjb@freebsd.org> wrote:
> > On Thu, Oct 02, 2014 at 02:56:05PM +1000, Bruce Evans wrote:
> >> On Wed, 1 Oct 2014, Glen Barber wrote:
> >>
> >> >Log:
> >> > MFC r268376 (imp):
> >> >
> >> >   rm -rf can fail sometimes with an error from fts_read. Make it
> >> >   honor fflag to ignore fts_read errors, but stop deleting from
> >> >   that directory because no further progress can be made.
> >>
> >> I asked for this to be backed out in -current.  It is not suitable for MFC.
> >>
> >
> > It fixes an immediate issue that prevents high-concurrent make(1) jobs
> > from stomping over each other.
> 
> The real problem is noted in this bug:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=192490 ; the
> SUBDIR_PARALLEL logic is executing multiple instances of the
> clean/cleandir .PHONY targets.
> 
> I agree with bde@ that this commit is papering over a bigger problem,
> but it's annoying enough and causes enough false positives that I
> understand why gjb@ MFCed the commit imp@ did in head.
> 
> Thank you..
> 

I agree that the change to rm only papers over the real problem.  I
think bug 192490 should be re-opened so that we can address the actual
build system problem, but I don't know if that's allowed for in the new
bugzilla workflows, or if we need to open a new report now.

I've been digging into the actual build system problem this evening, and
I'm starting to think that all the reported failures that contain enough
of the log to be useful show that the build failed in a directory that
has subdirectories.  That is, one of the failures appeared to be caused
by rm -rf running concurrently in usr.bin/lex and usr.bin/lex/lib.
Another failure involves modules/aic7xxx and modules/aic7xxx/ahc.  In
another log it appeared that ata/atapci/chipsets was being deleted
simulataneously with ata/atapci/chipsets/ataacard and several other
subdirs under chipsets/.

If this is indeed the cause of the problem I'm too tired right now to
think of a fix, but I at least wanted to get what I found in writing
before I sleep and completely forget it. :)

BTW, I didn't see any evidence that the exact same path was being
multiply deleted at the same time.  That is, no duplicated entries in
SUBDIR lists or accidentally processing the entire sys/modules hiearchy
twice in parallel somehow through two different parent paths or anything
like that.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1412395735.12052.104.camel>