Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Oct 2007 17:35:21 +0200
From:      Peter Schuller <peter.schuller@infidyne.com>
To:        freebsd-fs@freebsd.org
Subject:   ZFS: reproducable inability to accesss a pool (process hangs; other pools fine)
Message-ID:  <20071022153521.GB27594@hyperion.scode.org>

next in thread | raw e-mail | index | archive | help

--WhfpMioaduB5tiZL
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

On the same system I recently posted about on -stable, with RELENG_7
=66rom a few days ago, I am now running a SiL 3114 on a raidz2 in
degraded mode with one disk missing (it is degraded by design because
I wanted to create a 5 disk array but only had 4).

For the purpose of discovering any stability issues with the 3114
controller I did some stress tests that have yet to reveil controller
problems, but has triggered what appears to be a ZFS problem.

Test case:

/promraid       - root of the pool in question
/promraid/ports - copy of /usr/ports tree from my machine
/promraid/1     - empty directory
/promraid/2     - empty directory

I now run concurrently in two shells:

while [ 1 ] ; do rsync -a /promraid/ports /promraid/1/pp ; rm -rf /promraid=
/1/pp ; done

and:

while [ 1 ] ; do rsync -a /promraid/ports /promraid/2/pp ; rm -rf /promraid=
/2/pp ; done

This runs fine for some hours, but eventually I end up with hung
rsyncs in "zfs" state according to op. Attempting to e.g. ls /promraid
hangs as well. Yet ZFS continues working (another pool is entirely
fine), and there are no errors in dmesg.

iostat -x does NOT indicate that it is perpetually waiting on I/O from
a disk or something likethat (0% utilization). The processes are
unkillable, even by SIGKILL.

I should have this environment for a few more days, so can hopefully
reproduce this again. It has happened at least twice already (the
first time I was in X and X hung; I thought I had a panic so re-ran
the tests in the console; these two times I didn't get a panic but I
am unsure whether the failure case is different).

Does anyone have suggestions for what to do to produce the best
information possible? Given that there are no errors, no panic, etc.

One obvious bit is to ktrace them I realize, if that gives me anything
(the size of the trace if I were to trace it from the beginning would,
I suspect, be prohibitive). Will do that next time.

--=20
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


--WhfpMioaduB5tiZL
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHHMM4DNor2+l1i30RApmKAKCjtvR5O6TIh7RBFderKc1cZElg3gCdFIMm
bFT0M9YWhc5avTYUxnhI3uw=
=qSJW
-----END PGP SIGNATURE-----

--WhfpMioaduB5tiZL--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071022153521.GB27594>