From owner-freebsd-stable@FreeBSD.ORG Sat Aug 20 19:57:05 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01657106564A for ; Sat, 20 Aug 2011 19:57:05 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id DA1778FC16 for ; Sat, 20 Aug 2011 19:57:04 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta07.emeryville.ca.mail.comcast.net with comcast id Njss1h0011bwxycA7jx06k; Sat, 20 Aug 2011 19:57:00 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id NjwX1h0051t3BNj8ejwXUX; Sat, 20 Aug 2011 19:56:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id A4ACA102C1A; Sat, 20 Aug 2011 12:57:02 -0700 (PDT) Date: Sat, 20 Aug 2011 12:57:02 -0700 From: Jeremy Chadwick To: Dan Langille Message-ID: <20110820195702.GA39109@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 19:57:05 -0000 Dan, sorry for the previous mail. Seems my schedule today has just unexpected changed; I had social events to deal with but as I found out a few minutes ago those events are cancelled, which means I have time today to look at your mail. On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: > On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: > > The SMART error log also indicates an LBA failure at the 26000 hour mark > > (which is 16 hours prior to when you did smartctl -a /dev/ad2). Whether > > that LBA is the remapped one or the suspect one is unknown. The LBA was > > 5566440. > > > > The SMART tests you did didn't really amount to anything; no surprise. > > short and long tests usually do not test the surface of the disk. There > > are some drives which do it on a long test, but as I said before, > > everything varies from drive to drive. > > > > Furthermore, on this model of drive, you cannot do a surface scans via > > SMART. Bummer. That's indicated in the "Offline data collection > > capabilities" section at the top, where it reads: > > > > No Selective Self-test supported. > > > > So you'll have to use the dd method. This takes longer than if surface > > scanning was supported by the drive, but is acceptable. I'll get to how > > to go about that in a moment. > > FWIW, I've done a dd read of the entire suspect disk already. Just two errors. Actually one error -- keep reading. > From the URL mentioned above: > > [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror > dd: /dev/ad2: Input/output error > 2717+0 records in > 2717+0 records out > 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) > dd: /dev/ad2: Input/output error > 38170+1 records in > 38170+1 records out > 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec) > [root@bast:~] # > > That seems to indicate two problems. Are those the values I should be using > with dd? The "values" you refer to are byte offsets, not LBAs. Furthermore, you used a block size of 1 megabyte (not sure why people keep doing this). LBA size on your drive is 512 bytes; asking for 1 megabyte in dd is going to make the drive try to read() 1MByte, and an I/O error could happen anywhere within that 1MByte range. (1024*1024) / 512 == 2048 LBAs make up 1MByte. Next, remember that the "noerror" attribute has some quirks associated with it that need to be kept in mind. The man page discusses these. Finally, I believe the last I/O error you see (at byte 40025063424) is normal given what you told dd to do. It was trying to use bs=1m, and your drive has a capacity limit of 40027029504 bytes. I'm left to believe you had a "short read" (less than 1MByte), so this is normal. 40027029504 / (1024*1024) == 38172.75, which is not a round number, hence the error. > I did some more precise testing: > > # time dd of=/dev/null if=/dev/ad2 bs=512 iseek=5566440 > dd: /dev/ad2: Input/output error > 9+0 records in > 9+0 records out > 4608 bytes transferred in 5.368668 secs (858 bytes/sec) > > real 0m5.429s > user 0m0.000s > sys 0m0.010s > > NOTE: that's 9 blocks later than mentioned in smarctl > > The above generated this in /var/log/messages: > > Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA status=51 error=40 LBA=5566449 Your dd command above is saying "use a block size of 512 bytes, and read indefinitely from /dev/ad2, starting with an lseek() on /dev/ad2 of 5566440". You then get an I/O error "somewhere" from where you start to when the device ends. You're assuming that the "number of bytes transferred" indicates where the actual error happened, which in my experience is not always true. What really needs to happen here is use of count=1, and you adjusting iseek manually per each LBA. Or you could use the script I wrote and let the computer do it for you. :-) I understand what you're getting at, re: "that's 9 blocks later". But the OS does some caching of I/O and so on sometimes, or aggregates block reads larger than physical LBA size, so that may be what's going on here. However, if you keep reading, you might find your answer is that you may (still unsure) have other LBAs which are now marked suspect. > > That said: > > > > http://jdc.parodius.com/freebsd/bad_block_scan > > > > If you run this on your ad2 drive, I'm hoping what you'll find are two > > LBAs which can't be read -- one will be the remapped LBA and one will be > > the "suspect" LBA. If you only get one LBA error then that's fine too, > > and will be the "suspect" LBA. > > > Once you have the LBA(s), you can submit writes to them to get the drive > > to re-analyse them (assuming they're "suspect"): > > > > dd if=/dev/zero of=/dev/XXX bs=512 count=1 seek=NNNNN > > > > Where XXX is the device and NNNNN is the LBA number. > > > > If this works properly, the dd command should sit there for a little bit > > (as the drive does its re-analysis magic) and then should complete. > > ad2 is part of a gmirror with ad0. Does this change things? > > I haven't tried the dd yet. It does not change things, but I don't know what's going to happen if you do write commands to the device directly while the drive is still attached in gmirror. When I encounter a disk that's behaving like this, I immediately remove it from the pool/mirror so I can work on it. I do not trust the OS to do things like not panic/crash/behave weirdly when doing these things. > > You'll want to check SMART stats after that; you should see > > Current_Pending_Sector drop to 0. If Offline_Uncorrectable incremented > > then the LBA could not be re-read/remapped. > > It did increment: > > 197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 2 > > [was 1] What this means is that you have *another* LBA the drive found and marked suspect. This could have happened any time; possibly during the above dd you did, possibly during normal read operation (assuming the drive is still handling I/O as part of your mirror). > > If Reallocated_Sector_Ct > > incremented then you now have a total of 2 LBAs which are remapped. > > It did increment: > > $ diff smarctl.1 smarctl.3 | grep Reallocated_Sector_Ct > < 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 1 > > 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 2 > > Full output of smartctl has been appended to http://beta.freebsddiary.org/smart-fixing-bad-sector.php But you didn't issue any writes to the drive (quote: "I haven't tried the dd yet"), so I cannot explain why this attribute would increment. Unless you *did* try the dd? I don't know; there's not enough information here for me to ascertain what may have happened between this paragraph and a couple paragraphs up. To me, this looks like a write to the drive was issued either manually (with the dd or if the drive is still in use for I/O by gmirror) and happened to hit an LBA which was previously marked suspect -- and induced a remap. Alternately -- and this is just as plausible as what I just described -- the drive may have a firmware quirk/bug/behavioural different from what I'm used to, where Current_Pending_Sector acts as a counter (e.g. it will never reset to zero). Maxtor "should" be using Reallocated_Event_Count for this (since that's what it's for; it indicates failures OR successes), but as I've said time and time again, the behaviour varies from drive to drive, model to model, and firmware to firmware. Also alternatively, there's the whole "smartctl -t offline" ordeal which might update the attribute data, but it's labelled Old_age not Offline, so I don't think this would be the case (unless there's a bug in the firmware or mislabeling of the attribute in the firmware for this drive). The thing about bad LBAs is that they often come in groups/bunches; dust on the drive, some region loses its magnetic integrity, etc... Your drive is ""old"" (27416 hours = 1142 days = 3.1 years) so it's understandable IMO. The only way to know for sure would be to do a surface scan on the drive and see if any more I/O errors show up. If they do, I would recommend just writing zeros from LBA 0 all the way to the end of the drive, then afterward see what the SMART attributes look like. "dd if=/dev/zero of=/dev/ad2 bs=64k" would do the trick (in this case 'bs' doesn't matter since all you're trying to do is zero the drive; doesn't matter if writes get aggregated or not). > > In > > the case of remapping, you get to deal with the UFS/FFS thing above. > > To get the stats to update in this situation you *might* (but probably > > not) have to run "smartctl -t offline /dev/XXX". > > I didn't try that... > > > You might also be wondering "that dd command writes 512 bytes of zero to > > that LBA; what about the old data that was there, in the case that the > > drive remaps the LBA?" This is a great question, and one I've never > > actually taken the time to answer because at this present time I have > > absolutely *no* bad disks in my possession. I'm under the impression > > that the write does in fact write zeros if the LBA is remapped, but that > > might not be true at all. I've been waiting to test this for quite some > > time and document it/write about it. > > > > I still suggest you replace the drive, although given its age I doubt > > you'll be able to find a suitable replacement. I tend to keep disks > > like this around for testing/experimental purposes and not for actual > > use. > > I have several unused 80GB HDD I can place into this system. I think that's > what I'll wind up doing. But I'd like to follow this process through and get it documented > for future reference. Yes, given the behaviour of the drive I would recommend you simply replace it at this point in time. What concerns me the most is Current_Pending_Sector incrementing, but it's impossible for me to determine if that incrementing means there are other LBAs which are bad, or if the drive is behaving how its firmware is designed. Keep the drive around for further experiments/tinkering if you're interested. Stuff like this is always interesting/fun as long as your data isn't at risk, so doing the replacement first would be best (especially if both drives in your mirror were bought at the same time from the same place and have similar manufacturing plants/dates on them). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |