From owner-freebsd-stable@FreeBSD.ORG Wed Oct 4 19:57:58 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4445F16A40F for ; Wed, 4 Oct 2006 19:57:58 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id E271843D91 for ; Wed, 4 Oct 2006 19:57:49 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id C8DE01A4D82; Wed, 4 Oct 2006 12:57:49 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 3743E511E6; Wed, 4 Oct 2006 15:57:49 -0400 (EDT) Date: Wed, 4 Oct 2006 15:57:49 -0400 From: Kris Kennaway To: Vivek Khera Message-ID: <20061004195748.GA37978@xor.obsecurity.org> References: <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> <20061003084315.GA89654@deviant.kiev.zoral.com.ua> <20061004140808.GD89654@deviant.kiev.zoral.com.ua> <20061004163944.GA35412@xor.obsecurity.org> <20061004194148.GA37672@xor.obsecurity.org> <488B45BD-BBAA-4E5F-B94E-9DF27BFA1F3A@khera.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AhhlLboLdkugWU4S" Content-Disposition: inline In-Reply-To: <488B45BD-BBAA-4E5F-B94E-9DF27BFA1F3A@khera.org> User-Agent: Mutt/1.4.2.2i Cc: stable@freebsd.org Subject: Re: ffs snapshot lockup X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 19:57:58 -0000 --AhhlLboLdkugWU4S Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 04, 2006 at 03:53:54PM -0400, Vivek Khera wrote: >=20 > On Oct 4, 2006, at 3:41 PM, Kris Kennaway wrote: >=20 > >>from what i read in the output from kgdb, it seems that something > >>locked the kernel and we broke to debugger from the watchdog timeout > >>(I enable software watchdog). > > > >Hmm, be careful with that - if you set the timeout too low (and note > >that for some workloads O(minutes) may even be too low) then you'll > >get a lot of false positives. >=20 > hmmm... the man page for watchdogd doesn't specify what the default =20 > timeout is, but that's what we've got running. [tappity-tapptity-=20 > tap...] source seems to indicate 16seconds timeout. interesting. Yes, that's probably way too low. e.g. when creating a snapshot (as in your workload) your machine may be unresponsive for up to a few minutes depending on your filesystem size and I/O load. > so we could be getting hit with a bge interrupt storm and timing =20 > out. i'll turn off fido and see what happens. >=20 > at this point, though, i think i have two separate issues. one with =20 > bge and watchdog timeout, and one with locking of the filesystem with =20 > mksnap_ffs, as the symptoms are different. That sounds plausible. Many people are reporting issues involving NIC interrupts, but they're proving elusive to characterize so far (there may be multiple problems). kris --AhhlLboLdkugWU4S Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFJBI8Wry0BWjoQKURAmO7AJ464zbu+sYaMLDI+hZy8EPL5lkNggCgnkuT reC626GaJOnFtV/BrkV39HA= =tI2v -----END PGP SIGNATURE----- --AhhlLboLdkugWU4S--