Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Jul 2011 13:06:18 -0700
From:      David P Discher <dpd@bitgravity.com>
To:        "Steven Hartland" <killing@multiplay.co.uk>
Cc:        freebsd-fs@FreeBSD.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: zfs process hang on pool access
Message-ID:  <3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E@bitgravity.com>
In-Reply-To: <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk>
References:  <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk> <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> <4E302204.2030009@FreeBSD.org> <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com> <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
I've actually found a second issue that my working theory is related to =
the *fix* of LBOLT, in zio_wait()/txg_delay() when calling =
_cv_wait()/_cv_timedwait().  This maybe aggravated by setting =
vfs.zfs.txg.timeout=3D1.  And in fact these functions are using using =
LBOLT with signed 32bit ints.=20

I got some cores, and ideas, and will dig into the debugging this week.  =
And of course will post my findings (and pleads for help) here on =
freebsd-fs@.

Rolling back the two patches I posted early for the 26+ day and 106+ =
days bugs, seemed to avoid the new issue.

---
David P. Discher
dpd@bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com

On Jul 31, 2011, at 12:50 PM, Steven Hartland wrote:

> Is there a PR related to this so we can track progress. Having to =
reboot machines
> every 100+ days to ensure they don't break is a bit of a PITA when =
you've got hundreds
> of machines :(
>=20
> ----- Original Message ----- From: "David P Discher" =
<dpd@bitgravity.com>
> To: "Steven Hartland" <killing@multiplay.co.uk>
> Cc: <freebsd-fs@FreeBSD.org>; "Andriy Gapon" <avg@freebsd.org>
> Sent: Wednesday, July 27, 2011 9:41 PM
> Subject: Re: zfs process hang on pool access
>=20
>=20
> The way I found this was breaking into the debugger, do some back =
traces, continue, break in again, do some more back traces on the hung =
processes ... see what is going on, then walk through the code.
>=20
> Then what I had specific loops and code locations, asking the higher =
powers of the freebsd kernel world.
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E>