Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Oct 2006 15:57:49 -0400
From:      Kris Kennaway <kris@obsecurity.org>
To:        Vivek Khera <vivek@khera.org>
Cc:        stable@freebsd.org
Subject:   Re: ffs snapshot lockup
Message-ID:  <20061004195748.GA37978@xor.obsecurity.org>
In-Reply-To: <488B45BD-BBAA-4E5F-B94E-9DF27BFA1F3A@khera.org>
References:  <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> <20061003084315.GA89654@deviant.kiev.zoral.com.ua> <DFEA4E5F-2337-4383-8765-F5901BDA49E9@khera.org> <20061004140808.GD89654@deviant.kiev.zoral.com.ua> <20061004163944.GA35412@xor.obsecurity.org> <BB1FAD7A-1114-49D6-BC2E-C1B4B9D0C807@khera.org> <20061004194148.GA37672@xor.obsecurity.org> <488B45BD-BBAA-4E5F-B94E-9DF27BFA1F3A@khera.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--AhhlLboLdkugWU4S
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Oct 04, 2006 at 03:53:54PM -0400, Vivek Khera wrote:
>=20
> On Oct 4, 2006, at 3:41 PM, Kris Kennaway wrote:
>=20
> >>from what i read in the output from kgdb, it seems that something
> >>locked the kernel and we broke to debugger from the watchdog timeout
> >>(I enable software watchdog).
> >
> >Hmm, be careful with that - if you set the timeout too low (and note
> >that for some workloads O(minutes) may even be too low) then you'll
> >get a lot of false positives.
>=20
> hmmm... the man page for watchdogd doesn't specify what the default =20
> timeout is, but that's what we've got running.   [tappity-tapptity-=20
> tap...] source seems to indicate 16seconds timeout.  interesting.

Yes, that's probably way too low.  e.g. when creating a snapshot (as
in your workload) your machine may be unresponsive for up to a few
minutes depending on your filesystem size and I/O load.

> so we could be getting hit with a bge interrupt storm and timing =20
> out.  i'll turn off fido and see what happens.
>=20
> at this point, though, i think i have two separate issues.  one with =20
> bge and watchdog timeout, and one with locking of the filesystem with =20
> mksnap_ffs, as the symptoms are different.

That sounds plausible.  Many people are reporting issues involving NIC
interrupts, but they're proving elusive to characterize so far (there
may be multiple problems).

kris




--AhhlLboLdkugWU4S
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFFJBI8Wry0BWjoQKURAmO7AJ464zbu+sYaMLDI+hZy8EPL5lkNggCgnkuT
reC626GaJOnFtV/BrkV39HA=
=tI2v
-----END PGP SIGNATURE-----

--AhhlLboLdkugWU4S--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061004195748.GA37978>