From owner-freebsd-stable@FreeBSD.ORG Thu May 7 10:46:57 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8AA1DB5A for ; Thu, 7 May 2015 10:46:57 +0000 (UTC) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 45CF11162 for ; Thu, 7 May 2015 10:46:57 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1YqJKR-0000ZF-AI; Thu, 07 May 2015 13:46:55 +0300 Date: Thu, 7 May 2015 13:46:55 +0300 From: Slawa Olhovchenkov To: Steven Hartland Cc: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk Message-ID: <20150507104655.GT62239@zxy.spb.ru> References: <20150507080749.GB1394@zxy.spb.ru> <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru> <554B40B6.6060902@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <554B40B6.6060902@multiplay.co.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2015 10:46:57 -0000 On Thu, May 07, 2015 at 11:38:46AM +0100, Steven Hartland wrote: > >>> How I can cancel this 24 requst? > >>> Why this requests don't timeout (3 hours already)? > >>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). > >>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18? > >>> > >> If they are in mirrors, in theory you can just pull the disk, isci will > >> report to cam and cam will report to ZFS which should all recover. > > Yes, zmirror with da18. > > I am surprise that ZFS don't use da18. All zpool fully stuck. > A single low level request can only be handled by one device, if that > device returns an error then ZFS will use the other device, but not until. Why next requests don't routed to da18? Current request stuck on da19 (unlikely, but understund), but why stuck all pool?