Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Oct 2014 04:02:29 +0000
From:      Steve Wills <swills@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        fs@freebsd.org, current@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: zfs hang
Message-ID:  <20141010040228.GI79158@mouf.net>
In-Reply-To: <F93FC06BE5854556BF1F4318690C728C@multiplay.co.uk>
References:  <20141008004045.GA24762__48659.9047123038$1412728878$gmane$org@mouf.net> <5434D1CE.8010801@FreeBSD.org> <20141010012724.GD79158@mouf.net> <F93FC06BE5854556BF1F4318690C728C@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 10, 2014 at 02:35:14AM +0100, Steven Hartland wrote:
> 
> ----- Original Message ----- 
> From: "Steve Wills" <swills@freebsd.org>
> To: "Andriy Gapon" <avg@freebsd.org>
> Cc: <current@freebsd.org>; <fs@freebsd.org>
> Sent: Friday, October 10, 2014 2:27 AM
> Subject: Re: zfs hang
> 
> 
> > On Wed, Oct 08, 2014 at 08:55:26AM +0300, Andriy Gapon wrote:
> >> On 08/10/2014 03:40, Steve Wills wrote:
> >> > Hi,
> >> > 
> >> > Not sure which thread this belongs to, but I have a zfs hang on one of my boxes
> >> > running r272152. Running procstat -kka looks like:
> >> > 
> >> > http://pastebin.com/szZZP8Tf
> >> > 
> >> > My zpool commands seem to be hung in spa_errlog_lock while others are hung in
> >> > zfs_lookup. Suggestions?
> >> 
> >> There are several threads in zio_wait.  If this is their permanent state then
> >> there is some problem with I/O somewhere below ZFS.
> > 
> > Thanks for the feedback. It seems one of my disks is dying, I rebooted and it
> > came up OK, but today I got:
> > 
> >  panic: I/O to pool 'rpool' appears to be hung on vdev guid ..... at '/dev/ada0p3'
> > 
> > I have screenshots and backtrace if anyone is interested. Dying drives
> > shouldn't cause panic, right?
> 
> Its the deadman timer kicking in so yes, thats expected.
> 
> The following sysctls control this behaviour if you want to try and recover:
> vfs.zfs.deadman_synctime_ms: 1000000
> vfs.zfs.deadman_checktime_ms: 5000
> vfs.zfs.deadman_enabled: 1

Ah, ok. This pool has two disks, mirrored. I think one of them is dying, the
BIOS gives a SMART error on startup, but it still uses the disk fine. From what
I read of the zfs deadman design, it's for when the controller is acting up. So
I'm confused. Maybe this means both disks are dying?

Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141010040228.GI79158>