From owner-freebsd-stable@FreeBSD.ORG Thu May 7 10:39:00 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 325837DE for ; Thu, 7 May 2015 10:39:00 +0000 (UTC) Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BBB451FE3 for ; Thu, 7 May 2015 10:38:59 +0000 (UTC) Received: by wgiu9 with SMTP id u9so39046558wgi.3 for ; Thu, 07 May 2015 03:38:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=CU4oywXV9SydiVNGIORVUd/Gdgr5ra2ts28i5BeugRM=; b=F4qyEzK/q/pDLGykXcSUGsgvPJcsjvbLEG1h5xve4opnmpVOy5iL8OFF6JHo5VfhrV fzPA7ooi//HrNBl48H2pxlKTkP/eCqAhBahSwCEXOw4QJ190Sh/1SzLT5/YvCcGRC3lo uM/PJU7zBMrofwchPwP9JRnJrMNAw9gZorDwy65jipDvwVkUoTLfTUdYI/dDnz2UL5Cq tJg4fAUSwinDBlnKlnOivlIZy5H7SYbDB2z4xmSMxZim5UmHzQLrZLE/FnQRiUppESBL UwuoyrKghFFIV0K4ifZW7PxXubJfbc2c3uRGSjvYBKOqAmnDkJoWjVoQzPjE8y/VaCeI P8sg== X-Gm-Message-State: ALoCoQmzuOgXuYWGYbSjqyZ7QSMXnEBHGLq6F7VGGy4QU5k2utYDF2szAamAS4rcDIcj53ZgAH+D X-Received: by 10.180.99.39 with SMTP id en7mr5392040wib.31.1430995132493; Thu, 07 May 2015 03:38:52 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id mc20sm3265848wic.15.2015.05.07.03.38.51 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 May 2015 03:38:51 -0700 (PDT) Message-ID: <554B40B6.6060902@multiplay.co.uk> Date: Thu, 07 May 2015 11:38:46 +0100 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Slawa Olhovchenkov CC: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk References: <20150507080749.GB1394@zxy.spb.ru> <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru> In-Reply-To: <20150507095048.GC1394@zxy.spb.ru> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2015 10:39:00 -0000 On 07/05/2015 10:50, Slawa Olhovchenkov wrote: > On Thu, May 07, 2015 at 09:41:43AM +0100, Steven Hartland wrote: > >> On 07/05/2015 09:07, Slawa Olhovchenkov wrote: >>> I have zpool of 12 vdev (zmirrors). >>> One disk in one vdev out of service and stop serving reuquest: >>> >>> dT: 1.036s w: 1.000s >>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >>> 0 0 0 0 0.0 0 0 0.0 0.0| ada0 >>> 0 0 0 0 0.0 0 0 0.0 0.0| ada1 >>> 1 0 0 0 0.0 0 0 0.0 0.0| ada2 >>> 0 0 0 0 0.0 0 0 0.0 0.0| ada3 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da1 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da2 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da3 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da4 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da5 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da6 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da7 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da8 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da9 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da10 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da11 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da12 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da13 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da14 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da15 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da16 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da17 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da18 >>> 24 0 0 0 0.0 0 0 0.0 0.0| da19 >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> 0 0 0 0 0.0 0 0 0.0 0.0| da20 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da21 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da22 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da23 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da24 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da25 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da26 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da27 >>> >>> As result zfs operation on this pool stoped too. >>> `zpool list -v` don't worked. >>> `zpool detach tank da19` don't worked. >>> Application worked with this pool sticking in `zfs` wchan and don't killed. >>> >>> # camcontrol tags da19 -v >>> (pass19:isci0:0:3:0): dev_openings 7 >>> (pass19:isci0:0:3:0): dev_active 25 >>> (pass19:isci0:0:3:0): allocated 25 >>> (pass19:isci0:0:3:0): queued 0 >>> (pass19:isci0:0:3:0): held 0 >>> (pass19:isci0:0:3:0): mintags 2 >>> (pass19:isci0:0:3:0): maxtags 255 >>> >>> How I can cancel this 24 requst? >>> Why this requests don't timeout (3 hours already)? >>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). >>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18? >>> >> If they are in mirrors, in theory you can just pull the disk, isci will >> report to cam and cam will report to ZFS which should all recover. > Yes, zmirror with da18. > I am surprise that ZFS don't use da18. All zpool fully stuck. A single low level request can only be handled by one device, if that device returns an error then ZFS will use the other device, but not until. > >> With regards to not timing out this could be a default issue, but having > I am understand, no universal acceptable timeout for all cases: good > disk, good saturated disk, tape, tape library, failed disk, etc. > In my case -- failed disk. This model already failed (other specimen) > with same symptoms). > > May be exist some tricks for cancel/aborting all request in queue and > removing disk from system? Unlikely tbh, pulling the disk however should. > >> a very quick look that's not obvious in the code as >> isci_io_request_construct etc do indeed set a timeout when >> CAM_TIME_INFINITY hasn't been requested. >> >> The sysctl hw.isci.debug_level may be able to provide more information, >> but be aware this can be spammy. > I am already have this situation, what command interesting after > setting hw.isci.debug_level? I'm afraid I'm not familiar isci I'm afraid possibly someone else who is can chime in. Regards Steve