From owner-freebsd-questions@FreeBSD.ORG Wed Oct 29 09:00:23 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E1481065690; Wed, 29 Oct 2008 09:00:23 +0000 (UTC) (envelope-from k0802647@telus.net) Received: from defout.telus.net (defout.telus.net [204.209.205.13]) by mx1.freebsd.org (Postfix) with ESMTP id 53F8D8FC2B; Wed, 29 Oct 2008 09:00:23 +0000 (UTC) (envelope-from k0802647@telus.net) Received: from priv-edmwaa07.telusplanet.net ([204.209.205.55]) by priv-edmwes25.telusplanet.net (InterMail vM.7.08.03.00 201-2186-126-20070710) with ESMTP id <20081029090022.LVJM6654.priv-edmwes25.telusplanet.net@priv-edmwaa07.telusplanet.net>; Wed, 29 Oct 2008 03:00:22 -0600 Received: from oliver.bc.lan (d75-157-26-132.bchsia.telus.net [75.157.26.132]) by priv-edmwaa07.telusplanet.net (BorderWare Security Platform) with ESMTP id E1701537383E8E62; Wed, 29 Oct 2008 03:00:22 -0600 (MDT) Received: from [10.111.111.112] (unknown [10.111.111.112]) by oliver.bc.lan (Postfix) with ESMTP id 0B189645D; Wed, 29 Oct 2008 02:00:22 -0700 (PDT) Message-ID: <49082625.7080804@telus.net> Date: Wed, 29 Oct 2008 02:00:21 -0700 From: Carl User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: Jeremy Chadwick References: <49067148.6080307@telus.net> <20081028024143.GA37131@icarus.home.lan> <20081028120407.G3326@wojtek.tensor.gdynia.pl> <20081028122013.GA49298@icarus.home.lan> <4907DB6B.8090000@telus.net> <20081029043314.GA66773@icarus.home.lan> In-Reply-To: <20081029043314.GA66773@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: gmirror slice insertion, "FAILURE - READ_DMA status=51" X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2008 09:00:23 -0000 Jeremy Chadwick wrote: > Seagate chooses to encode some raw data for some SMART attributes in a > custom format. The format is not publicly documented. This is why you > have to go off of the adjusted values shown in VALUE/WORST/THRESH. > "How am I supposed to know all of this?!" You aren't -- it comes with > experience. And yet my failing drive's VALUE numbers are still all above their THRESH values, despite it being bad enough to cripple the system. One might argue those threshold values leave something to be desired. >> Is there anything I should know about this model of hard disk with >> regards to being known for problems? Also, is there a good test I can >> perform to hopefully flush out any problems before I put this thing into >> service? > > I'm confused: what gives you the impression there's a problem with > *this model* of hard disk? I've seen no evidence presented that > indicates such. What makes you ask that question? I don't have such an impression, thus far. In fact, Seagate drives have always been good to me prior to this. It's only a precautionary question because it's better to ask now than after I've committed a lot of real data and time to it and put it all into service. > Let's take a look at the SMART data. > >> # smartctl -a /dev/ad4 >> >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... >> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 ... > > To get an update on Attribute 198, you'd need to run a short offline > test ("smartctl -t short /dev/ad4"). You can safely do this while > the disk is in use; don't let the word "offline" make you think the > disk disappears. You can watch the status using smartctl -a, and > once its finished, you can compare the old value to the new. I'm > willing to bet it remains zero. I ran that test on both drives. ad6 failed immediately at 90% with a "read failure" - not surprising. ad4 completed without error and no change in it's values, just as you predicted. >> # smartctl -a /dev/ad6 >> >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... >> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 ... >> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1 ... >> 187 Reported_Uncorrect 0x0032 098 098 000 Old_age Always - 2 ... >> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2 >> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 2 ... > > And here we see the core of the problem. :-) > Advice is simple: replace this hard disk. > Hope this helps. It definitely did, Jeremy. Your explanations were most helpful. Thanks! Carl / K0802647