Date: Sun, 31 Jul 2011 13:06:18 -0700 From: David P Discher <dpd@bitgravity.com> To: "Steven Hartland" <killing@multiplay.co.uk> Cc: freebsd-fs@FreeBSD.org, Andriy Gapon <avg@freebsd.org> Subject: Re: zfs process hang on pool access Message-ID: <3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E@bitgravity.com> In-Reply-To: <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk> References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk> <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> <4E302204.2030009@FreeBSD.org> <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com> <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
I've actually found a second issue that my working theory is related to = the *fix* of LBOLT, in zio_wait()/txg_delay() when calling = _cv_wait()/_cv_timedwait(). This maybe aggravated by setting = vfs.zfs.txg.timeout=3D1. And in fact these functions are using using = LBOLT with signed 32bit ints.=20 I got some cores, and ideas, and will dig into the debugging this week. = And of course will post my findings (and pleads for help) here on = freebsd-fs@. Rolling back the two patches I posted early for the 26+ day and 106+ = days bugs, seemed to avoid the new issue. --- David P. Discher dpd@bitgravity.com * AIM: bgDavidDPD BITGRAVITY * http://www.bitgravity.com On Jul 31, 2011, at 12:50 PM, Steven Hartland wrote: > Is there a PR related to this so we can track progress. Having to = reboot machines > every 100+ days to ensure they don't break is a bit of a PITA when = you've got hundreds > of machines :( >=20 > ----- Original Message ----- From: "David P Discher" = <dpd@bitgravity.com> > To: "Steven Hartland" <killing@multiplay.co.uk> > Cc: <freebsd-fs@FreeBSD.org>; "Andriy Gapon" <avg@freebsd.org> > Sent: Wednesday, July 27, 2011 9:41 PM > Subject: Re: zfs process hang on pool access >=20 >=20 > The way I found this was breaking into the debugger, do some back = traces, continue, break in again, do some more back traces on the hung = processes ... see what is going on, then walk through the code. >=20 > Then what I had specific loops and code locations, asking the higher = powers of the freebsd kernel world. >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E>