From owner-freebsd-stable@FreeBSD.ORG  Thu May  7 10:39:00 2015
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 325837DE
 for <freebsd-stable@freebsd.org>; Thu,  7 May 2015 10:39:00 +0000 (UTC)
Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BBB451FE3
 for <freebsd-stable@freebsd.org>; Thu,  7 May 2015 10:38:59 +0000 (UTC)
Received: by wgiu9 with SMTP id u9so39046558wgi.3
 for <freebsd-stable@freebsd.org>; Thu, 07 May 2015 03:38:52 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=CU4oywXV9SydiVNGIORVUd/Gdgr5ra2ts28i5BeugRM=;
 b=F4qyEzK/q/pDLGykXcSUGsgvPJcsjvbLEG1h5xve4opnmpVOy5iL8OFF6JHo5VfhrV
 fzPA7ooi//HrNBl48H2pxlKTkP/eCqAhBahSwCEXOw4QJ190Sh/1SzLT5/YvCcGRC3lo
 uM/PJU7zBMrofwchPwP9JRnJrMNAw9gZorDwy65jipDvwVkUoTLfTUdYI/dDnz2UL5Cq
 tJg4fAUSwinDBlnKlnOivlIZy5H7SYbDB2z4xmSMxZim5UmHzQLrZLE/FnQRiUppESBL
 UwuoyrKghFFIV0K4ifZW7PxXubJfbc2c3uRGSjvYBKOqAmnDkJoWjVoQzPjE8y/VaCeI
 P8sg==
X-Gm-Message-State: ALoCoQmzuOgXuYWGYbSjqyZ7QSMXnEBHGLq6F7VGGy4QU5k2utYDF2szAamAS4rcDIcj53ZgAH+D
X-Received: by 10.180.99.39 with SMTP id en7mr5392040wib.31.1430995132493;
 Thu, 07 May 2015 03:38:52 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by mx.google.com with ESMTPSA id mc20sm3265848wic.15.2015.05.07.03.38.51
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 07 May 2015 03:38:51 -0700 (PDT)
Message-ID: <554B40B6.6060902@multiplay.co.uk>
Date: Thu, 07 May 2015 11:38:46 +0100
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
CC: freebsd-stable@freebsd.org
Subject: Re: zfs, cam sticking on failed disk
References: <20150507080749.GB1394@zxy.spb.ru>
 <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru>
In-Reply-To: <20150507095048.GC1394@zxy.spb.ru>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 May 2015 10:39:00 -0000


On 07/05/2015 10:50, Slawa Olhovchenkov wrote:
> On Thu, May 07, 2015 at 09:41:43AM +0100, Steven Hartland wrote:
>
>> On 07/05/2015 09:07, Slawa Olhovchenkov wrote:
>>> I have zpool of 12 vdev (zmirrors).
>>> One disk in one vdev out of service and stop serving reuquest:
>>>
>>> dT: 1.036s  w: 1.000s
>>>    L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>>       0      0      0      0    0.0      0      0    0.0    0.0| ada0
>>>       0      0      0      0    0.0      0      0    0.0    0.0| ada1
>>>       1      0      0      0    0.0      0      0    0.0    0.0| ada2
>>>       0      0      0      0    0.0      0      0    0.0    0.0| ada3
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da0
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da1
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da2
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da3
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da4
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da5
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da6
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da7
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da8
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da9
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da10
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da11
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da12
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da13
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da14
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da15
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da16
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da17
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da18
>>>      24      0      0      0    0.0      0      0    0.0    0.0| da19
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da20
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da21
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da22
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da23
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da24
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da25
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da26
>>>       0      0      0      0    0.0      0      0    0.0    0.0| da27
>>>
>>> As result zfs operation on this pool stoped too.
>>> `zpool list -v` don't worked.
>>> `zpool detach tank da19` don't worked.
>>> Application worked with this pool sticking in `zfs` wchan and don't killed.
>>>
>>> # camcontrol tags da19 -v
>>> (pass19:isci0:0:3:0): dev_openings  7
>>> (pass19:isci0:0:3:0): dev_active    25
>>> (pass19:isci0:0:3:0): allocated     25
>>> (pass19:isci0:0:3:0): queued        0
>>> (pass19:isci0:0:3:0): held          0
>>> (pass19:isci0:0:3:0): mintags       2
>>> (pass19:isci0:0:3:0): maxtags       255
>>>
>>> How I can cancel this 24 requst?
>>> Why this requests don't timeout (3 hours already)?
>>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`).
>>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18?
>>>
>> If they are in mirrors, in theory you can just pull the disk, isci will
>> report to cam and cam will report to ZFS which should all recover.
> Yes, zmirror with da18.
> I am surprise that ZFS don't use da18. All zpool fully stuck.
A single low level request can only be handled by one device, if that 
device returns an error then ZFS will use the other device, but not until.
>
>> With regards to not timing out this could be a default issue, but having
> I am understand, no universal acceptable timeout for all cases: good
> disk, good saturated disk, tape, tape library, failed disk, etc.
> In my case -- failed disk. This model already failed (other specimen)
> with same symptoms).
>
> May be exist some tricks for cancel/aborting all request in queue and
> removing disk from system?
Unlikely tbh, pulling the disk however should.
>
>> a very quick look that's not obvious in the code as
>> isci_io_request_construct etc do indeed set a timeout when
>> CAM_TIME_INFINITY hasn't been requested.
>>
>> The sysctl hw.isci.debug_level may be able to provide more information,
>> but be aware this can be spammy.
> I am already have this situation, what command interesting after
> setting hw.isci.debug_level?
I'm afraid I'm not familiar isci I'm afraid possibly someone else who is 
can chime in.

     Regards
     Steve