Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Jan 2007 16:20:48 -0500
From:      Kris Kennaway <kris@obsecurity.org>
To:        Doug Ambrisko <ambrisko@ambrisko.com>
Cc:        Scott Oertel <freebsd@scottevil.com>, Willem Jan Withagen <wjw@digiware.nl>, freebsd-stable@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: running mksnap_ffs
Message-ID:  <20070116212048.GA1041@xor.obsecurity.org>
In-Reply-To: <200701162117.l0GLHXOS062816@ambrisko.com>
References:  <20070116203739.GA343@xor.obsecurity.org> <200701162117.l0GLHXOS062816@ambrisko.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--qMm9M+Fa2AknHoGS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 16, 2007 at 01:17:33PM -0800, Doug Ambrisko wrote:
> Kris Kennaway writes:
> | On Tue, Jan 16, 2007 at 09:26:47PM +0100, Willem Jan Withagen wrote:
> | > Doug Ambrisko wrote:
> | > >| > or things can get wedged.  We have some other patches as well th=
at=20
> | > >might
> | > >| > be required.  As a hack on a local server we have been using sna=
p shots
> | > >| > to do a "hot" back-up of a data base each morning.  This is base=
d on
> | > >| > 6.x.
> | > >|
> | > >| What do you mean by "get wedged"?  Are you seeing a deadlock, and =
if
> | > >| so then what are the details?  When you say 6.x, do you mean
> | > >| up-to-date RELENG_6?  There were various snapshot deadlock fixes
> | > >| committed over the past year including some in the past few months.
> | > >
> | > >The file-system would come to a stop, processes stuck on bio, snap-s=
hots
> | > >not finishing etc.  This was caused by the system running out of usa=
ble
> | > >buffers.  The change forces them to be flushed every so often.  This=
 is
> | > >independant of locking.  10 might be to aggresive.  Some scaling of
> | > >nbuf would probably be better.
> | >=20
> | > When I run mksnap_ffs it runs to the point where ANY access to the=20
> | > filesystem gives that process a lockup.
> |=20
> | Yes, that is expected.  Actually it begins when something accesses the
> | directory in which the snapshot is being made, since that causes the
> | parent directory to be locked...then something tries to access the
> | parent directory, which eventually cascades back to the root.
> |=20
> | > Getting the file system back is only thru "hard reboot". Trying to do=
 it=20
> | > the gentle way locks the whole system.
> |=20
> | Or waiting until the snapshot operation finishes.  You (still) haven't
> | determined that it's actually hanging as opposed to just waiting for
> | the snapshot operation to finish.
>=20
> In my case is was easy to see that all the buffers were exhausted and
> the system was churning waiting for some to become available.  Since they
> were all used up it never recovered.  By sync'ing the buffers they got
> cleaned up and then the system never ran out.  The snap shot was then
> able to finish.  Via the debugger you can see this happen.  I traced
> this problem in the debugger.  There are other issues with the buffer
> deamon as well.  We hit these since we run with a relatively low
> nbuf.  The buffers can be get frag'ed so bad that it can't flush
> things since it can't get a full-size buffer.  Another problem is that
> it can end up waiting on itself since the current code can't use
> it's emergency space to flush stuff.  You can see this via ps etc.
> It's not a good thing if the buffer daemon is waiting on itself :-(
>=20
> We have patches to this as well but they need some more work.  I was
> working with Tor, on this but then I got swamped at work with our 4.X -> =
6.X
> and platform transition.  All I can say is that we don't suffer from
> these problems now :-)  I have printf's the log this stuff when some of
> these bugs are hit.  Now the system survives those lock-up points.

Thanks for clarifying.  Hopefully you and Tor can get something
committed soon!

Kris

--qMm9M+Fa2AknHoGS
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFrUGwWry0BWjoQKURArg/AJ0dUnhnHUtm7zB8IZut5UEbeEf7fwCgl4kP
N9uy1f2iov1VWR6rqKtwuAk=
=H6Yy
-----END PGP SIGNATURE-----

--qMm9M+Fa2AknHoGS--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070116212048.GA1041>