From owner-freebsd-questions@FreeBSD.ORG Mon Feb 5 07:13:26 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9D59516A401 for ; Mon, 5 Feb 2007 07:13:25 +0000 (UTC) (envelope-from ceo@l-i-e.com) Received: from o2.hostbaby.com (o2.hostbaby.com [67.139.134.202]) by mx1.freebsd.org (Postfix) with SMTP id 7629313C441 for ; Mon, 5 Feb 2007 07:13:25 +0000 (UTC) (envelope-from ceo@l-i-e.com) Received: (qmail 87866 invoked by uid 98); 5 Feb 2007 07:13:32 -0000 Received: from 127.0.0.1 by o2.hostbaby.com (envelope-from , uid 1013) with qmail-scanner-1.25 (clamdscan: 0.88.7/2522. Clear:RC:1(127.0.0.1):. Processed in 0.116126 secs); 05 Feb 2007 07:13:32 -0000 X-Qmail-Scanner-Mail-From: ceo@l-i-e.com via o2.hostbaby.com X-Qmail-Scanner: 1.25 (Clear:RC:1(127.0.0.1):. Processed in 0.116126 secs) Received: from localhost (HELO l-i-e.com) (127.0.0.1) by localhost with SMTP; 5 Feb 2007 07:13:31 -0000 Received: from 67.184.122.32 (SquirrelMail authenticated user ceo@l-i-e.com) by www.l-i-e.com with HTTP; Mon, 5 Feb 2007 01:13:31 -0600 (CST) Message-ID: <2195.67.184.122.32.1170659611.squirrel@www.l-i-e.com> In-Reply-To: <3E64E786-E7A9-4914-BF29-DE89F25597E3@mac.com> References: <1398.216.230.84.67.1168982036.squirrel@www.l-i-e.com> <3E64E786-E7A9-4914-BF29-DE89F25597E3@mac.com> Date: Mon, 5 Feb 2007 01:13:31 -0600 (CST) From: "Richard Lynch" To: "Chuck Swiger" User-Agent: Hostbaby Webmail MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Cc: freebsd-questions@freebsd.org Subject: Re: READ_DMA48 error interpretation X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ceo@l-i-e.com List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Feb 2007 07:13:26 -0000 On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote: > On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote: >> I know the messages below mean the hard drive or IDE cards are >> having >> problems. But is this like RED ALERT or more like YELLOW or what? ... >> +ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007 >> +ad1: FAILURE - READ_DMA48 status=51 >> error=10 >> LBA=404955007 >> +g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5 > If you have current backups, it's a yellow alert. Otherwise... > >> And what do I do about it? >> >> umount and fsck everything a lot? >> swap cards/drives around until it stops? >> Ignore it and pray? > > Try installing the sysutils/smartmontools port and run a drive self- > test. That will give you a much better assessment of the state of > the drive and whether it is likely to completely fail in the next 24 > hours... I ran the short test on the problem drives, and it said everything was fine. I'll try the long test at a later date. Meanwhile, I turned on the smartd daemon, and am seeing two issues in the logs... #1. The drive temperatures seem ridiculously high to this naive reader, but what do I know?... 110 to 190 Celcius? Yikes... Or maybe that's normal? How hot is too hot? #2. Sequences like this show up a fair amount: Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed from 152 to 153 Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed from 153 to 152 Device: /dev/ad0, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 251 to 250 So is the real "problem" just that the drives are spun down and can't spin up fast enough? I can probably live with the consequences of that, and just go on with life -- The occasional HTTP request for an audio file will fail the first time, and they have to hit reload. This box is the fail-safe roll-over server for audio files that are all up online somewhere else managed by a professional (not me), so it's no surprise that the rare time-out on the real server also ends up with a drive spin up and failed request on the "backup". Kind of annoying, I guess, to an end user, but forcing the drives to always be spinning is probably not a Good Idea. Oh, here's a rather long excerpt of the log in case there's minutae within it that I've failed to include: http://l-i-e.com/smartd.log Any help in interpreting these results is most appreciated! THANKS!!! -- Some people have a "gift" link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So?