Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Nov 2007 12:57:11 +0100
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Peter Schuller <peter.schuller@infidyne.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: reproducable inability to accesss a pool (process hangs; other pools fine)
Message-ID:  <20071107115711.GM15618@garage.freebsd.pl>
In-Reply-To: <20071022153521.GB27594@hyperion.scode.org>
References:  <20071022153521.GB27594@hyperion.scode.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--jozmn01XJZjDjM3N
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 22, 2007 at 05:35:21PM +0200, Peter Schuller wrote:
> Hello,
>=20
> On the same system I recently posted about on -stable, with RELENG_7
> from a few days ago, I am now running a SiL 3114 on a raidz2 in
> degraded mode with one disk missing (it is degraded by design because
> I wanted to create a 5 disk array but only had 4).
>=20
> For the purpose of discovering any stability issues with the 3114
> controller I did some stress tests that have yet to reveil controller
> problems, but has triggered what appears to be a ZFS problem.
>=20
> Test case:
>=20
> /promraid       - root of the pool in question
> /promraid/ports - copy of /usr/ports tree from my machine
> /promraid/1     - empty directory
> /promraid/2     - empty directory
>=20
> I now run concurrently in two shells:
>=20
> while [ 1 ] ; do rsync -a /promraid/ports /promraid/1/pp ; rm -rf /promra=
id/1/pp ; done
>=20
> and:
>=20
> while [ 1 ] ; do rsync -a /promraid/ports /promraid/2/pp ; rm -rf /promra=
id/2/pp ; done
>=20
> This runs fine for some hours, but eventually I end up with hung
> rsyncs in "zfs" state according to op. Attempting to e.g. ls /promraid
> hangs as well. Yet ZFS continues working (another pool is entirely
> fine), and there are no errors in dmesg.
>=20
> iostat -x does NOT indicate that it is perpetually waiting on I/O from
> a disk or something likethat (0% utilization). The processes are
> unkillable, even by SIGKILL.
>=20
> I should have this environment for a few more days, so can hopefully
> reproduce this again. It has happened at least twice already (the
> first time I was in X and X hung; I thought I had a panic so re-ran
> the tests in the console; these two times I didn't get a panic but I
> am unsure whether the failure case is different).
>=20
> Does anyone have suggestions for what to do to produce the best
> information possible? Given that there are no errors, no panic, etc.
>=20
> One obvious bit is to ktrace them I realize, if that gives me anything
> (the size of the trace if I were to trace it from the beginning would,
> I suspect, be prohibitive). Will do that next time.

I've found a deadlock recently. Can you enter DDB, find spa_zio_intr_X
threads, run 'tr <pid>' on theirs PIDs and send me the output?

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--jozmn01XJZjDjM3N
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHMagXForvXbEpPzQRAl1kAJ9qLT7H8mmJdsrgwKDV3HmCZ3CEbwCgprXQ
tAha09rlzRo4K9UtUjyxeYI=
=+8sB
-----END PGP SIGNATURE-----

--jozmn01XJZjDjM3N--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071107115711.GM15618>