Date: Wed, 7 Nov 2007 12:57:11 +0100 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Peter Schuller <peter.schuller@infidyne.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: reproducable inability to accesss a pool (process hangs; other pools fine) Message-ID: <20071107115711.GM15618@garage.freebsd.pl> In-Reply-To: <20071022153521.GB27594@hyperion.scode.org> References: <20071022153521.GB27594@hyperion.scode.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--jozmn01XJZjDjM3N Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Oct 22, 2007 at 05:35:21PM +0200, Peter Schuller wrote: > Hello, >=20 > On the same system I recently posted about on -stable, with RELENG_7 > from a few days ago, I am now running a SiL 3114 on a raidz2 in > degraded mode with one disk missing (it is degraded by design because > I wanted to create a 5 disk array but only had 4). >=20 > For the purpose of discovering any stability issues with the 3114 > controller I did some stress tests that have yet to reveil controller > problems, but has triggered what appears to be a ZFS problem. >=20 > Test case: >=20 > /promraid - root of the pool in question > /promraid/ports - copy of /usr/ports tree from my machine > /promraid/1 - empty directory > /promraid/2 - empty directory >=20 > I now run concurrently in two shells: >=20 > while [ 1 ] ; do rsync -a /promraid/ports /promraid/1/pp ; rm -rf /promra= id/1/pp ; done >=20 > and: >=20 > while [ 1 ] ; do rsync -a /promraid/ports /promraid/2/pp ; rm -rf /promra= id/2/pp ; done >=20 > This runs fine for some hours, but eventually I end up with hung > rsyncs in "zfs" state according to op. Attempting to e.g. ls /promraid > hangs as well. Yet ZFS continues working (another pool is entirely > fine), and there are no errors in dmesg. >=20 > iostat -x does NOT indicate that it is perpetually waiting on I/O from > a disk or something likethat (0% utilization). The processes are > unkillable, even by SIGKILL. >=20 > I should have this environment for a few more days, so can hopefully > reproduce this again. It has happened at least twice already (the > first time I was in X and X hung; I thought I had a panic so re-ran > the tests in the console; these two times I didn't get a panic but I > am unsure whether the failure case is different). >=20 > Does anyone have suggestions for what to do to produce the best > information possible? Given that there are no errors, no panic, etc. >=20 > One obvious bit is to ktrace them I realize, if that gives me anything > (the size of the trace if I were to trace it from the beginning would, > I suspect, be prohibitive). Will do that next time. I've found a deadlock recently. Can you enter DDB, find spa_zio_intr_X threads, run 'tr <pid>' on theirs PIDs and send me the output? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --jozmn01XJZjDjM3N Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHMagXForvXbEpPzQRAl1kAJ9qLT7H8mmJdsrgwKDV3HmCZ3CEbwCgprXQ tAha09rlzRo4K9UtUjyxeYI= =+8sB -----END PGP SIGNATURE----- --jozmn01XJZjDjM3N--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071107115711.GM15618>