Date: Wed, 4 Oct 2006 15:57:49 -0400 From: Kris Kennaway <kris@obsecurity.org> To: Vivek Khera <vivek@khera.org> Cc: stable@freebsd.org Subject: Re: ffs snapshot lockup Message-ID: <20061004195748.GA37978@xor.obsecurity.org> In-Reply-To: <488B45BD-BBAA-4E5F-B94E-9DF27BFA1F3A@khera.org> References: <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> <20061003084315.GA89654@deviant.kiev.zoral.com.ua> <DFEA4E5F-2337-4383-8765-F5901BDA49E9@khera.org> <20061004140808.GD89654@deviant.kiev.zoral.com.ua> <20061004163944.GA35412@xor.obsecurity.org> <BB1FAD7A-1114-49D6-BC2E-C1B4B9D0C807@khera.org> <20061004194148.GA37672@xor.obsecurity.org> <488B45BD-BBAA-4E5F-B94E-9DF27BFA1F3A@khera.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--AhhlLboLdkugWU4S Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 04, 2006 at 03:53:54PM -0400, Vivek Khera wrote: >=20 > On Oct 4, 2006, at 3:41 PM, Kris Kennaway wrote: >=20 > >>from what i read in the output from kgdb, it seems that something > >>locked the kernel and we broke to debugger from the watchdog timeout > >>(I enable software watchdog). > > > >Hmm, be careful with that - if you set the timeout too low (and note > >that for some workloads O(minutes) may even be too low) then you'll > >get a lot of false positives. >=20 > hmmm... the man page for watchdogd doesn't specify what the default =20 > timeout is, but that's what we've got running. [tappity-tapptity-=20 > tap...] source seems to indicate 16seconds timeout. interesting. Yes, that's probably way too low. e.g. when creating a snapshot (as in your workload) your machine may be unresponsive for up to a few minutes depending on your filesystem size and I/O load. > so we could be getting hit with a bge interrupt storm and timing =20 > out. i'll turn off fido and see what happens. >=20 > at this point, though, i think i have two separate issues. one with =20 > bge and watchdog timeout, and one with locking of the filesystem with =20 > mksnap_ffs, as the symptoms are different. That sounds plausible. Many people are reporting issues involving NIC interrupts, but they're proving elusive to characterize so far (there may be multiple problems). kris --AhhlLboLdkugWU4S Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFJBI8Wry0BWjoQKURAmO7AJ464zbu+sYaMLDI+hZy8EPL5lkNggCgnkuT reC626GaJOnFtV/BrkV39HA= =tI2v -----END PGP SIGNATURE----- --AhhlLboLdkugWU4S--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061004195748.GA37978>