From owner-freebsd-scsi@freebsd.org  Wed Sep  2 17:23:48 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F4E59C880F
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Wed,  2 Sep 2015 17:23:48 +0000 (UTC)
 (envelope-from sbruno@freebsd.org)
Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0ED7094A
 for <freebsd-scsi@freebsd.org>; Wed,  2 Sep 2015 17:23:47 +0000 (UTC)
 (envelope-from sbruno@freebsd.org)
Received: from [192.168.200.200] (unknown [50.136.155.142])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 (Authenticated sender: sbruno@ignoranthack.me)
 by mail.ignoranthack.me (Postfix) with ESMTPSA id 3EE8A1939FD
 for <freebsd-scsi@freebsd.org>; Wed,  2 Sep 2015 17:23:41 +0000 (UTC)
Subject: Re: da2:ciss1:0:0:0): Periph destroyed
To: freebsd-scsi@freebsd.org
References: <55E72440.8070507@intersonic.se>
From: Sean Bruno <sbruno@freebsd.org>
Message-ID: <55E7309C.8010406@freebsd.org>
Date: Wed, 2 Sep 2015 10:23:40 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <55E72440.8070507@intersonic.se>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Sep 2015 17:23:48 -0000



On 09/02/15 09:30, Per olof Ljungmark wrote:
> Hi,
> 
> Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 via a P812
> controller, 7TB capacity as one volume, ZFS.
> 
> If I pull a drive from the array, the following occurs and I am not sure
> about the logic here because the array is still intact and no data loss
> occurs.
> 
> Despite that the volume is gone.
> 
> # zpool clear imap
> cannot clear errors for imap: I/O error
> 
> # zpool online imap da2
> cannot online da2: pool I/O is currently suspended
> 
> Only a reboot helped and then the pool came up just fine, no errors, but
> that is not exactly what you want on a production box.
> 
> Did I miss something?
> 
> Would
> geli_autodetach="NO"
> help?
> 
> syslog output:
> 
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Hot-plug drive
> removed, Port=1E Box=1 Bay=2 SN=            Z4Z2S9SD
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** Physical drive
> failure, Port=1E Box=1 Bay=2
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** State change, logical
> drive 0, new state=REGENING
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
> changed status OK->interim recovery, spare status 0x21<configured>
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: *** State change, logical
> drive 0, new state=NEEDS_REBUILD
> Sep  2 17:55:19 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
> changed status interim recovery->ready for recovery, spare status
> 0x11<configured,available>
> Sep  2 17:55:19 <kern.crit> str kernel: da2 at ciss1 bus 0 scbus2 target
> 0 lun 0
> Sep  2 17:55:19 <kern.crit> str kernel: da2: <HP RAID 1(1+0) read> s/n
> PAGXQ0BRH1W0WA detached
> Sep  2 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): Periph destroyed
> Sep  2 17:55:19 <user.notice> str devd: Executing 'logger -p kern.notice
> -t ZFS 'vdev is removed, pool_guid=13539160044045520113
> vdev_guid=1325849881310347579''
> Sep  2 17:55:19 <user.notice> str ZFS: vdev is removed,
> pool_guid=13539160044045520113 vdev_guid=1325849881310347579
> Sep  2 17:55:19 <kern.crit> str kernel: (da2:ciss1:0:0:0): fatal error,
> could not acquire reference count
> Sep  2 17:55:23 <kern.crit> str kernel: ciss1: *** State change, logical
> drive 0, new state=REBUILDING
> Sep  2 17:55:23 <kern.crit> str kernel: ciss1: logical drive 0 (da2)
> changed status ready for recovery->recovering, spare status
> 0x13<configured,rebuilding,available>
> Sep  2 17:55:23 <kern.crit> str kernel: cam_periph_alloc: attempt to
> re-allocate valid device da2 rejected flags 0x18 refcount 1
> Sep  2 17:55:23 <kern.crit> str kernel: daasync: Unable to attach to new
> device due to status 0x6
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
> 


This looks like a bug I introduced at r249170.  Now that I stare deeply
into the abyss of ciss(4), I think the entire change is wrong.

Do you want to try and revert that change from your kernel and rebuild
for a test?  I don't have access to ciss(4) hardware anylonger and
cannot verify.

sean

ref
https://svnweb.freebsd.org/base/head/sys/dev/ciss/ciss.c?r1=249170&r2=249169&pathrev=249170