From owner-freebsd-scsi@freebsd.org Wed Sep 2 19:05:50 2015 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D74219C9799 for ; Wed, 2 Sep 2015 19:05:50 +0000 (UTC) (envelope-from sbruno@freebsd.org) Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BBDC6CE4 for ; Wed, 2 Sep 2015 19:05:50 +0000 (UTC) (envelope-from sbruno@freebsd.org) Received: from [192.168.10.69] (guest-wifi.isc.org [149.20.53.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sbruno@ignoranthack.me) by mail.ignoranthack.me (Postfix) with ESMTPSA id 5386E193A19 for ; Wed, 2 Sep 2015 19:05:49 +0000 (UTC) Subject: Re: da2:ciss1:0:0:0): Periph destroyed To: freebsd-scsi@freebsd.org References: <55E72440.8070507@intersonic.se> <55E7309C.8010406@freebsd.org> <55E73900.5080302@intersonic.se> <55E742B9.1060002@freebsd.org> <55E747AC.6020302@intersonic.se> From: Sean Bruno Message-ID: <55E7488C.90405@freebsd.org> Date: Wed, 2 Sep 2015 12:05:48 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55E747AC.6020302@intersonic.se> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Sep 2015 19:05:51 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 09/02/15 12:02, Per olof Ljungmark wrote: > On 2015-09-02 20:40, Sean Bruno wrote: >> >> >> On 09/02/15 10:59, Per olof Ljungmark wrote: >>> On 2015-09-02 19:23, Sean Bruno wrote: >>>> >>>> >>>> On 09/02/15 09:30, Per olof Ljungmark wrote: >>>>> Hi, >>>>> >>>>> Recent 10-STABLE, HP D2600 with 12 SATA drives in RAID10 >>>>> via a P812 controller, 7TB capacity as one volume, ZFS. >>>>> >>>>> If I pull a drive from the array, the following occurs and >>>>> I am not sure about the logic here because the array is >>>>> still intact and no data loss occurs. >>>>> >>>>> Despite that the volume is gone. >>>>> >>>>> # zpool clear imap cannot clear errors for imap: I/O error >>>>> >>>>> # zpool online imap da2 cannot online da2: pool I/O is >>>>> currently suspended >>>>> >>>>> Only a reboot helped and then the pool came up just fine, >>>>> no errors, but that is not exactly what you want on a >>>>> production box. >>>>> >>>>> Did I miss something? >>>>> >>>>> Would geli_autodetach="NO" help? >>>>> >>>>> syslog output: >>>>> >>>>> Sep 2 17:55:19 str kernel: ciss1: *** Hot-plug >>>>> drive removed, Port=1E Box=1 Bay=2 SN= Z4Z2S9SD Sep 2 >>>>> 17:55:19 str kernel: ciss1: *** Physical drive >>>>> failure, Port=1E Box=1 Bay=2 Sep 2 17:55:19 >>>>> str kernel: ciss1: *** State change, logical drive 0, new >>>>> state=REGENING Sep 2 17:55:19 str kernel: >>>>> ciss1: logical drive 0 (da2) changed status OK->interim >>>>> recovery, spare status 0x21 Sep 2 17:55:19 >>>>> str kernel: ciss1: *** State change, logical >>>>> drive 0, new state=NEEDS_REBUILD Sep 2 17:55:19 >>>>> str kernel: ciss1: logical drive 0 (da2) >>>>> changed status interim recovery->ready for recovery, spare >>>>> status 0x11 Sep 2 17:55:19 >>>>> str kernel: da2 at ciss1 bus 0 scbus2 target 0 >>>>> lun 0 Sep 2 17:55:19 str kernel: da2: >>>> RAID 1(1+0) read> s/n PAGXQ0BRH1W0WA detached Sep 2 >>>>> 17:55:19 str kernel: (da2:ciss1:0:0:0): Periph >>>>> destroyed Sep 2 17:55:19 str devd: Executing >>>>> 'logger -p kern.notice -t ZFS 'vdev is removed, >>>>> pool_guid=13539160044045520113 >>>>> vdev_guid=1325849881310347579'' Sep 2 17:55:19 >>>>> str ZFS: vdev is removed, >>>>> pool_guid=13539160044045520113 >>>>> vdev_guid=1325849881310347579 Sep 2 17:55:19 >>>>> str kernel: (da2:ciss1:0:0:0): fatal error, could not >>>>> acquire reference count Sep 2 17:55:23 str >>>>> kernel: ciss1: *** State change, logical drive 0, new >>>>> state=REBUILDING Sep 2 17:55:23 str kernel: >>>>> ciss1: logical drive 0 (da2) changed status ready for >>>>> recovery->recovering, spare status >>>>> 0x13 Sep 2 17:55:23 >>>>> str kernel: cam_periph_alloc: attempt to >>>>> re-allocate valid device da2 rejected flags 0x18 refcount >>>>> 1 Sep 2 17:55:23 str kernel: daasync: Unable >>>>> to attach to new device due to status 0x6 >>>>> _______________________________________________ >>>>> freebsd-scsi@freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To >>>>> unsubscribe, send any mail to >>>>> "freebsd-scsi-unsubscribe@freebsd.org" >>>>> >>>> >>>> >>>> This looks like a bug I introduced at r249170. Now that I >>>> stare deeply into the abyss of ciss(4), I think the entire >>>> change is wrong. >>>> >>>> Do you want to try and revert that change from your kernel >>>> and rebuild for a test? I don't have access to ciss(4) >>>> hardware anylonger and cannot verify. >>>> >> >>> Yes, I can try. The installed rev is 281826 but I assume the >>> change can apply here too? >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To >>> unsubscribe, send any mail to >>> "freebsd-scsi-unsubscribe@freebsd.org" >> >> >> >> yeah, I think a "svn merge -c -249170" from /usr/src should do >> it if you are managing your system from svn >> > > Sep 2 20:54:05 str kernel: ciss1: *** Hot-plug drive > removed, Port=1E Box=1 Bay=3 SN= W4Z1G4BD Sep 2 > 20:54:05 str kernel: ciss1: *** Physical drive > failure, Port=1E Box=1 Bay=3 Sep 2 20:54:50 str > kernel: ciss1: *** Hot-plug drive inserted, Port=1E Box=1 Bay=3 > SN= WD-WMC1P0F66XVC Sep 2 20:54:50 str kernel: ciss1: > *** HP Array Controller Firmware Ver = 6.64, Build Num = 0 > > > Right, this time it survived, the volume did not detach after > reverting. > > If this change does not cause any other problems do you think it > can go into -STABLE? > > Thanks! > > //per _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi To > unsubscribe, send any mail to > "freebsd-scsi-unsubscribe@freebsd.org" > Definitely. I'll yank it out today and setup a 3 day MFC sean -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJV50iKXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5k/cUIAKwf3SitFqiXrW8ophSd8D2F PHMAIUbRnH6vzAK6yGFmly/4oCaTfCj966hrFRFcCdzKbUAUge89O1ewdbuiSgY+ oF0Wkb6175ucZSYaiEzayp0N1dgewxVZGAFjhO+OXGMXftgR6yYmQDCuE3eFdaRE zA4A+VwE0gKnQxOVBbrhzf8ezEfml+iDvYd/NxCciDhlNMrWhXUCgq9B4RBM6aU2 oYt1qNxrqkVvL9hV8u2/WAJd8Q6sDcaJnv2IcKoU8i/XzhQtsMtCk9juFAvGHQQb HRI4iJpqtBwlhBLSzesIYKzMtfd1RRRLLOG8PHZZFl3RrinOSS02SbbxCa8lFrM= =/2d0 -----END PGP SIGNATURE-----