From owner-freebsd-questions@FreeBSD.ORG Mon Mar 10 23:59:35 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D323C1065670 for ; Mon, 10 Mar 2008 23:59:35 +0000 (UTC) (envelope-from josh@endries.org) Received: from www.endries.org (www.endries.org [216.230.164.2]) by mx1.freebsd.org (Postfix) with ESMTP id 4326B8FC19 for ; Mon, 10 Mar 2008 23:59:34 +0000 (UTC) (envelope-from josh@endries.org) Received: from localhost (localhost.endries.org [127.0.0.1]) by www.endries.org (Postfix) with ESMTP id 95052A664CE for ; Mon, 10 Mar 2008 19:40:52 -0400 (EDT) X-Virus-Scanned: amavisd-new at endries.org Received: from www.endries.org ([127.0.0.1]) by localhost (www.endries.org [127.0.0.1]) (amavisd-new, port 10025) with LMTP id bVi-DKwgDRli for ; Mon, 10 Mar 2008 19:40:50 -0400 (EDT) Received: from [10.20.30.3] (cpe-74-67-72-121.stny.res.rr.com [74.67.72.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by www.endries.org (Postfix) with ESMTP id 2945FA664AF for ; Mon, 10 Mar 2008 19:40:49 -0400 (EDT) Message-ID: <47D5C705.2030909@endries.org> Date: Mon, 10 Mar 2008 19:40:53 -0400 From: Josh Endries User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Questions about camcontrol, hot-swapping, ciss and Compaq SmartArray X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Mar 2008 23:59:35 -0000 Hello, Today I saw that one of my disks seems to be dead/dying in a RAID 5 array I have: http://pastebin.ca/937249 loki.domain.int ciss0: *** Fatal drive error, SCSI port 1 ID 0 loki.domain.int (da1:ciss0:0:1:0): WRITE(10). CDB: 2a 0 c ae 3f d0 0 0 20 0 loki.domain.int (da1:ciss0:0:1:0): CAM Status: SCSI Status Error loki.domain.int (da1:ciss0:0:1:0): SCSI Status: Check Condition loki.domain.int (da1:ciss0:0:1:0): MEDIUM ERROR asc:11,0 loki.domain.int (da1:ciss0:0:1:0): Unrecovered read error loki.domain.int (da1:ciss0:0:1:0): Retrying Command (per Sense Data) I see messages for port 0 only, but varying ID 0-3, and I'm not sure what that means (partition?). After a while the error messages "went away", though the disks were/are still being used. I found cciss_vol_status online but it says the volume is OK (not degraded), which doesn't really make sense to me: # cciss_vol_status /dev/ciss0 /dev/ciss0: (Smart Array 642) RAID 0 Volume 0(?) status: OK. /dev/ciss0: (Smart Array 642) RAID 5 Volume 1(?) status: OK. Is there a way I can tell which port/disk is bad from these messages? Assuming I can determine which disk it is, do I need to do anything in the OS before/after I swap out a drive? I've seen people talk about rescanning and running other camcontrol commands before... Any other tips? Thanks, Josh