From owner-freebsd-stable@FreeBSD.ORG Thu May 7 12:05:16 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 49AA2746 for ; Thu, 7 May 2015 12:05:16 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 025581BBE for ; Thu, 7 May 2015 12:05:16 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1YqKY9-00020s-1m; Thu, 07 May 2015 15:05:09 +0300 Date: Thu, 7 May 2015 15:05:08 +0300 From: Slawa Olhovchenkov To: Steven Hartland Cc: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk Message-ID: <20150507120508.GX62239@zxy.spb.ru> References: <20150507080749.GB1394@zxy.spb.ru> <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru> <554B40B6.6060902@multiplay.co.uk> <20150507104655.GT62239@zxy.spb.ru> <554B53E8.4000508@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <554B53E8.4000508@multiplay.co.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2015 12:05:16 -0000 On Thu, May 07, 2015 at 01:00:40PM +0100, Steven Hartland wrote: > > > On 07/05/2015 11:46, Slawa Olhovchenkov wrote: > > On Thu, May 07, 2015 at 11:38:46AM +0100, Steven Hartland wrote: > > > >>>>> How I can cancel this 24 requst? > >>>>> Why this requests don't timeout (3 hours already)? > >>>>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). > >>>>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18? > >>>>> > >>>> If they are in mirrors, in theory you can just pull the disk, isci will > >>>> report to cam and cam will report to ZFS which should all recover. > >>> Yes, zmirror with da18. > >>> I am surprise that ZFS don't use da18. All zpool fully stuck. > >> A single low level request can only be handled by one device, if that > >> device returns an error then ZFS will use the other device, but not until. > > Why next requests don't routed to da18? > > Current request stuck on da19 (unlikely, but understund), but why > > stuck all pool? > > Its still waiting for the request from the failed device to complete. As > far as ZFS currently knows there is nothing wrong with the device as its > had no failures. Can you explain some more? One requst waiting, understand. I am do next request. Some information need from vdev with failed disk. Failed disk more busy (queue long), why don't routed to mirror disk? Or, for metadata, to less busy vdev? > You didn't say which FreeBSD version you where running? 10-STABLE, r281264.