Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Feb 2007 13:16:54 -0800
From:      Brian Somers <brian@Awfulhak.org>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Brian Somers <brian@FreeBSD.org>, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/ufs/ffs ffs_alloc.c ffs_softdep.c
Message-ID:  <20070223131654.489d5ae6@conflict.ca.sophos.com>
In-Reply-To: <20070223204112.GA88584@xor.obsecurity.org>
References:  <200702232023.l1NKNaeZ086158@repoman.freebsd.org> <20070223204112.GA88584@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 23 Feb 2007 15:41:12 -0500
Kris Kennaway <kris@obsecurity.org> wrote:

> On Fri, Feb 23, 2007 at 08:23:36PM +0000, Brian Somers wrote:
> > brian       2007-02-23 20:23:36 UTC
> > 
> >   FreeBSD src repository
> > 
> >   Modified files:
> >     sys/ufs/ffs          ffs_alloc.c ffs_softdep.c 
> >   Log:
> >   Account for di_blocks allocations when IN_SPACECOUNTED is set in an
> >   inode's i_flag.
> >   
> >   It's possible that after ufs_infactive() calls softdep_releasefile(),
> >   i_nlink stays >0 for a considerable amount of time (> 60 seconds here).
> >   During this period, any ffs allocation routines that alter di_blocks
> >   must also account for the blocks in the filesystem's fs_pendingblocks
> >   value.
> >   
> >   This change fixes an eventual df/du discrepency that will happen as
> >   the result of fs_pendingblocks being reduced to <0.
> >   
> >   The only manifestation of this that people may recognise is the
> >   following message on boot:
> >   
> >       /somefs: update error: blocks -N files M
> >   
> >   at which point the negative pending block count is adjusted to zero.
> 
> \o/ I hate that bug!

As do I!  As a result of the bug, all Sophos mail appliance
customers had to suffer bi-weekly reboots for the past year
(well, it was hardly the fault of this bug initially!).

It took weeks to fix -- I have never been able to reproduce
the problem on demand and had to resort to inserting copious
amounts of diagnostics on several machines then sitting around
'till ffsinfo said one of the machines had "bitten".

Until recently, even that strategy didn't work as our test
machines just wouldn't see the problem.  We recently released
a less powerful version of the appliance, and only then did
we see the problem reasonably frequently (between 4 and 24
hours usually).

-- 
Brian Somers                                       <brian@Awfulhak.org>
Don't _EVER_ lose your sense of humour !            <brian@FreeBSD.org>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070223131654.489d5ae6>