From owner-freebsd-questions@FreeBSD.ORG Fri Dec 19 19:44:01 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A068A1065678 for ; Fri, 19 Dec 2008 19:44:01 +0000 (UTC) (envelope-from fsb@thefsb.org) Received: from smtp114.iad.emailsrvr.com (smtp114.iad.emailsrvr.com [207.97.245.114]) by mx1.freebsd.org (Postfix) with ESMTP id 724718FC1D for ; Fri, 19 Dec 2008 19:44:01 +0000 (UTC) (envelope-from fsb@thefsb.org) Received: from relay1.r1.iad.emailsrvr.com (localhost [127.0.0.1]) by relay1.r1.iad.emailsrvr.com (SMTP Server) with ESMTP id C260244C1B5; Fri, 19 Dec 2008 14:44:00 -0500 (EST) Received: by relay1.r1.iad.emailsrvr.com (Authenticated sender: fsb-AT-thefsb.org) with ESMTPSA id 0A91444C0CE; Fri, 19 Dec 2008 14:43:58 -0500 (EST) User-Agent: Microsoft-Entourage/12.10.0.080409 Date: Fri, 19 Dec 2008 14:43:52 -0500 From: Tom Worster To: Kirk Strauser , FreeBSD Questions ML Message-ID: Thread-Topic: Clearing SMART errors I don't care about? Thread-Index: AcliEh6nUJUyCNf80EinxjiUWydUMw== In-Reply-To: <654A1341-DB2B-4370-81BD-C910E8E14031@strauser.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Cc: Subject: Re: Clearing SMART errors I don't care about? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Dec 2008 19:44:01 -0000 On 12/19/08 12:38 PM, "Kirk Strauser" wrote: > > I beg to differ. "smartctl -H /dev/ad8" says that it passes its self- > assessment and doesn't expect the drive to flat-out die any day soon. > I'd still like to know if the error count increased, or if it started > to detect imminent failure. there are plenty of hdd failure modes that smart can't predict. google's monitoring of over 100,000 hdds over 9 months showed more than a third of failures had no warning from smart. http://storagemojo.com/2007/02/19/googles-disk-failure-experience/ so if my data matters, i need a robust hdd failure-tolerant system anyway, i.e. raid (even if it's just gmirror, which i use for non-critical servers) plus data snapshots to a remote site. now, with that in place, what do i do with a smart warning? given that smart algorithms are also prone to false positives, is there any benefit in replacing the hdd now rather than waiting for it to fail and replacing it then? not a great deal in my view. but perhaps my raid array can't tolerate more than one hdd failure. i'd be exposed to a second disk failure during the time to repair. if hdd failures are independent (which i guess might not always be true) this isn't a big concern. less of a concern than, for example, the chance of raid controller failure, which i've seen happen. one time when that happened, the controller corrupted all the disks in the array and when it was replaced rebuild was impossible.