From owner-freebsd-scsi@freebsd.org Wed Sep 2 17:23:48 2015 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F4E59C880F for ; Wed, 2 Sep 2015 17:23:48 +0000 (UTC) (envelope-from sbruno@freebsd.org) Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0ED7094A for ; Wed, 2 Sep 2015 17:23:47 +0000 (UTC) (envelope-from sbruno@freebsd.org) Received: from [192.168.200.200] (unknown [50.136.155.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sbruno@ignoranthack.me) by mail.ignoranthack.me (Postfix) with ESMTPSA id 3EE8A1939FD for ; Wed, 2 Sep 2015 17:23:41 +0000 (UTC) Subject: Re: da2:ciss1:0:0:0): Periph destroyed To: freebsd-scsi@freebsd.org References: <55E72440.8070507@intersonic.se> From: Sean Bruno Message-ID: <55E7309C.8010406@freebsd.org> Date: Wed, 2 Sep 2015 10:23:40 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55E72440.8070507@intersonic.se> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Sep 2015 17:23:48 -0000 On 09/02/15 09:30, Per olof Ljungmark wrote: > Hi, > > Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 via a P812 > controller, 7TB capacity as one volume, ZFS. > > If I pull a drive from the array, the following occurs and I am not sure > about the logic here because the array is still intact and no data loss > occurs. > > Despite that the volume is gone. > > # zpool clear imap > cannot clear errors for imap: I/O error > > # zpool online imap da2 > cannot online da2: pool I/O is currently suspended > > Only a reboot helped and then the pool came up just fine, no errors, but > that is not exactly what you want on a production box. > > Did I miss something? > > Would > geli_autodetach="NO" > help? > > syslog output: > > Sep 2 17:55:19 str kernel: ciss1: *** Hot-plug drive > removed, Port=1E Box=1 Bay=2 SN= Z4Z2S9SD > Sep 2 17:55:19 str kernel: ciss1: *** Physical drive > failure, Port=1E Box=1 Bay=2 > Sep 2 17:55:19 str kernel: ciss1: *** State change, logical > drive 0, new state=REGENING > Sep 2 17:55:19 str kernel: ciss1: logical drive 0 (da2) > changed status OK->interim recovery, spare status 0x21 > Sep 2 17:55:19 str kernel: ciss1: *** State change, logical > drive 0, new state=NEEDS_REBUILD > Sep 2 17:55:19 str kernel: ciss1: logical drive 0 (da2) > changed status interim recovery->ready for recovery, spare status > 0x11 > Sep 2 17:55:19 str kernel: da2 at ciss1 bus 0 scbus2 target > 0 lun 0 > Sep 2 17:55:19 str kernel: da2: s/n > PAGXQ0BRH1W0WA detached > Sep 2 17:55:19 str kernel: (da2:ciss1:0:0:0): Periph destroyed > Sep 2 17:55:19 str devd: Executing 'logger -p kern.notice > -t ZFS 'vdev is removed, pool_guid=13539160044045520113 > vdev_guid=1325849881310347579'' > Sep 2 17:55:19 str ZFS: vdev is removed, > pool_guid=13539160044045520113 vdev_guid=1325849881310347579 > Sep 2 17:55:19 str kernel: (da2:ciss1:0:0:0): fatal error, > could not acquire reference count > Sep 2 17:55:23 str kernel: ciss1: *** State change, logical > drive 0, new state=REBUILDING > Sep 2 17:55:23 str kernel: ciss1: logical drive 0 (da2) > changed status ready for recovery->recovering, spare status > 0x13 > Sep 2 17:55:23 str kernel: cam_periph_alloc: attempt to > re-allocate valid device da2 rejected flags 0x18 refcount 1 > Sep 2 17:55:23 str kernel: daasync: Unable to attach to new > device due to status 0x6 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > This looks like a bug I introduced at r249170. Now that I stare deeply into the abyss of ciss(4), I think the entire change is wrong. Do you want to try and revert that change from your kernel and rebuild for a test? I don't have access to ciss(4) hardware anylonger and cannot verify. sean ref https://svnweb.freebsd.org/base/head/sys/dev/ciss/ciss.c?r1=249170&r2=249169&pathrev=249170