Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Aug 2006 21:52:02 -0500
From:      Brooks Davis <brooks@one-eyed-alien.net>
To:        Antony Mawer <fbsd-stable@mawer.org>
Cc:        Kirk Strauser <kirk@daycos.com>, freebsd-stable@freebsd.org
Subject:   Re: The need for initialising disks before use?
Message-ID:  <20060819025202.GA11181@lor.one-eyed-alien.net>
In-Reply-To: <44E65027.6060605@mawer.org>
References:  <44E47092.7050104@mawer.org> <200608180919.04651.kirk@daycos.com> <20060818142925.GA2463@lor.one-eyed-alien.net> <44E65027.6060605@mawer.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--+HP7ph2BbKc20aGI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 18, 2006 at 01:41:27PM -1000, Antony Mawer wrote:
> On 18/08/2006 4:29 AM, Brooks Davis wrote:
> >On Fri, Aug 18, 2006 at 09:19:04AM -0500, Kirk Strauser wrote:
> >>On Thursday 17 August 2006 8:35 am, Antony Mawer wrote:
> >>
> >>>A quick question - is it recommended to initialise disks before using
> >>>them to allow the disks to map out any "bad spots" early on?
> >>Note: if you once you actually start seeing bad sectors, the drive is=
=20
> >>almost dead.  A drive can remap a pretty large number internally, but=
=20
> >>once that pool is exhausted (and the number of errors is still growing=
=20
> >>exponentially), there's not a lot of life left.
> >
> >There are some exceptions to this.  The drive can not remap a sector
> >which failes to read.  You must perform a write to cause the remap to
> >occur.  If you get a hard write failure it's gameover, but read failures
> >aren't necessicary a sign the disk is hopeless.  For example, the drive
> >I've had in my laptop for most of the last year developed a three sector=
[0]
> >error within a week or so of arrival.  After dd'ing zeros over the
> >problem sectors the problem sectors I've had no problems.
>=20
> This is what prompted it -- I've been seeing lots of drives that are=20
> showing up with huge numbers of read errors - for instance:
>=20
> >Aug 19 04:02:27 server kernel: ad0: FAILURE - READ_DMA=20
> >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D66293984
> >Aug 19 04:02:27 server kernel:=20
> >g_vfs_done():ad0s1f[READ(offset=3D30796791808, length=3D16384)]error =3D=
 5
> >Aug 19 04:02:31 server kernel: ad0: FAILURE - READ_DMA=20
> >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D47702304
> >Aug 19 04:02:31 server kernel:=20
> >g_vfs_done():ad0s1f[READ(offset=3D21277851648, length=3D16384)]error =3D=
 5
> >Aug 19 04:02:36 server kernel: ad0: FAILURE - READ_DMA=20
> >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D34943296
> >Aug 19 04:02:36 server kernel:=20
> >g_vfs_done():ad0s1f[READ(offset=3D14745239552, length=3D16384)]error =3D=
 5
> >Aug 19 04:03:08 server kernel: ad0: FAILURE - READ_DMA=20
> >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D45514848
> >Aug 19 04:03:08 server kernel:=20
> >g_vfs_done():ad0s1f[READ(offset=3D20157874176, length=3D16384)]error =3D=
 5
>=20
> I have /var/log/messages flooded with incidents of these "FAILURE -=20
> READ_DMA" messages. I've seen it on more than one machine with=20
> relatively "young" drives.
>=20
> I'm trying to determining of running a dd if=3D/dev/zero over the whole=
=20
> drive prior to use will help reduce the incidence of this, or if it is=20
> likely that these are developing after the initial install, in which=20
> case this will make negligible difference...

I really don't know.  The only way I can think of to find out is to own
a large number of machine and perform an experiment.  We (the general
computing public) don't have the kind of models needed to really say
anything definitive.  Drive are too darn opaque.

> Once I do start seeing these, is there an easy way to:
>=20
>     a) determine what file/directory entry might be affected?

Not easily, but this question has been asked and answered on the mailing
lists recently (I don't remember the answer, but I think there were some
ports that can help).

>     b) dd if=3D/dev/zero over the affected sectors only, in order to
>          trigger a sector remapping without nuking the whole drive

You can use src/tools/tools/recover disk to refresh all of the disk
except the parts that don't work and then use dd and the console error
output to do the rest.

>     c) depending on where that sector is allocated, I presume I'm
>          either going to end up with:
>         i) zero'd bytes within a file (how can I tell which?!)
>        ii) a destroyed inode
>       iii) ???

Presumably it will be one of i, ii or a mangled superblock.  I don't
know how you'd tell which off the top of my head.  This is one of the
reasons I think Sun is on the right track with zfs's checksum everything
approach.  At least that way you actually know when something goes
wrong.

> Any thoughts/comments/etc appreciated...
>=20
> How do other operating systems handle this - Windows, Linux, Solaris,=20
> MacOSX ...? I would have hoped this would be a condition the OS would=20
> make some attempt to trigger a sector remap... or are OSes typically=20
> ignorant of such things?

The OS is generally unaware of such events except to the extent that=20
they know a fatal read error occurred or that they read the SMART data
=66rom the drive in the case of write failures.

-- Brooks

--+HP7ph2BbKc20aGI
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFE5nzRXY6L6fI4GtQRAllDAKCfPYCVDPkoGz/l4NVQKxnhnfIGlQCgr3Hm
Py1uqPAS552Gj5nA5WKlq2Y=
=wM1Q
-----END PGP SIGNATURE-----

--+HP7ph2BbKc20aGI--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060819025202.GA11181>