From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 12:22:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 39E5BF6 for ; Fri, 7 Dec 2012 12:22:22 +0000 (UTC) (envelope-from mattblists@icritical.com) Received: from mail2.icritical.com (mail2.icritical.com [212.57.248.50]) by mx1.freebsd.org (Postfix) with SMTP id 7D0D78FC14 for ; Fri, 7 Dec 2012 12:22:21 +0000 (UTC) Received: (qmail 18775 invoked from network); 7 Dec 2012 12:15:46 -0000 Received: from localhost (127.0.0.1) by mail2.icritical.com with SMTP; 7 Dec 2012 12:15:46 -0000 Received: (qmail 18764 invoked by uid 599); 7 Dec 2012 12:15:45 -0000 Received: from unknown (HELO PDC002.icritical.int) (212.57.254.146) by mail2.icritical.com (qpsmtpd/0.28) with ESMTP; Fri, 07 Dec 2012 12:15:45 +0000 Message-ID: <50C1DDE8.9030503@icritical.com> Date: Fri, 7 Dec 2012 12:15:36 +0000 From: Matt Burke User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120906 Thunderbird/15.0 MIME-Version: 1.0 To: Subject: Re: ZFS hang References: <50C1CB34.3000308@icritical.com> In-Reply-To: <50C1CB34.3000308@icritical.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-TLS-Incoming: YES X-Virus-Scanned: by iCritical at mail2.icritical.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 12:22:22 -0000 After rebooting the box, I've just seen this on the console (after 'Setting hostid'): (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 53 8 0 1 0 0 (da8:isci0:0:0:0): CAM status: SCSI Status Error (da8:isci0:0:0:0): SCSI status: Check Condition (da8:isci0:0:0:0): SCSI sense: MEDIUM ERROR asc:ffffffff,ffffffff (Reserved ASC/ASCQ pair) (da8:isci0:0:0:0): Info: 0x4215378 (da8:isci0:0:0:0): Retrying command (per sense data) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 53 8 0 1 0 0 (da8:isci0:0:0:0): CAM status: SCSI Status Error (da8:isci0:0:0:0): SCSI status: Check Condition (da8:isci0:0:0:0): SCSI sense: MEDIUM ERROR asc:ffffffff,ffffffff (Reserved ASC/ASCQ pair) (da8:isci0:0:0:0): Info: 0x4215378 (da8:isci0:0:0:0): Retrying command (per sense data) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 53 8 0 1 0 0 (da8:isci0:0:0:0): CAM status: SCSI Status Error (da8:isci0:0:0:0): SCSI status: Check Condition (da8:isci0:0:0:0): SCSI sense: MEDIUM ERROR asc:ffffffff,ffffffff (Reserved ASC/ASCQ pair) (da8:isci0:0:0:0): Info: 0x4215378 (da8:isci0:0:0:0): Retrying command (per sense data) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 53 8 0 1 0 0 (da8:isci0:0:0:0): CAM status: SCSI Status Error (da8:isci0:0:0:0): SCSI status: Check Condition (da8:isci0:0:0:0): SCSI sense: MEDIUM ERROR asc:ffffffff,ffffffff (Reserved ASC/ASCQ pair) (da8:isci0:0:0:0): Info: 0x4215378 (da8:isci0:0:0:0): Retrying command (per sense data) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 53 8 0 1 0 0 (da8:isci0:0:0:0): CAM status: SCSI Status Error (da8:isci0:0:0:0): SCSI status: Check Condition (da8:isci0:0:0:0): SCSI sense: MEDIUM ERROR asc:ffffffff,ffffffff (Reserved ASC/ASCQ pair) (da8:isci0:0:0:0): Info: 0x4215378 (da8:isci0:0:0:0): Error 5, Retries exhausted and then again for the following: (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 65 8 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 75 8 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 21 76 8 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 82 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 83 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 84 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 94 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 95 58 0 1 0 0 (only 2 retries) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 9b 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 a4 58 0 1 0 0 (only 1 retry) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 a5 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 a6 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 b4 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 b5 58 0 1 0 0 (2 retries) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 b6 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 bc 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 c7 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 d7 58 0 1 0 0 (1 retry) (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 d8 58 0 1 0 0 (da8:isci0:0:0:0): READ(10). CDB: 28 0 4 1 e8 58 0 1 0 0 Obviously, the cause of my problems would seem to be a hosed disk. However the kernel msgbuf shows no complaints from the drive before reboot. da8 is a 60GB OCZ Agility 3 SSD (purchased prior to realising just how unreliable they are). According to the SMART data, it's had just 146GB of reads and 278GB writes over 3 power cycles with only 3 months power on time, similar to the others that have failed (~60% failure rate for ours) I can understand the drive failing, I just can't understand how it hung the system. I have had a similar thing happen on one of these machines before (with GENERIC and no dumpdev, so no debugging) with one of these disks on an Areca HBA. I've also had these drives fail on the onboard SATA controller, along with SAS drives on the SAS controllers, with no undesirable effects (other than having to swap it out). Could there be a problem with ATA devices on SCSI controllers which is causing failures to be silently dropped? Is ZFS lacking a timeout on IO calls? I'm going to move all these SSDs onto the SATA controller, and see if I can replicate the problem, but I'm not holding my breath over a conclusive result. -- Sorry for the below... The information contained in this message is confidential and intended for the addressee only. If you have received this message in error, or there are any problems with its content, please contact the sender. iCritical is a trading name of Critical Software Ltd. Registered in England: 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. This message has been scanned for security threats by iCritical. www.icritical.com