Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Jul 2018 11:15:52 -0700 (PDT)
From:      "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Wojciech Puchar <wojtek@puchar.net>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, Stefan Blachmann <sblachmann@gmail.com>, Lev Serebryakov <lev@freebsd.org>, George Mitchell <george+freebsd@m5p.com>
Subject:   Re: Confusing smartd messages
Message-ID:  <201807051815.w65IFqsB048887@pdx.rh.CN85.dnsmgr.net>
In-Reply-To: <CAOtMX2goxJkv1CFAcoFsw0NxaYvmLDXV8CxWr2DuQ%2BD56w2vuw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Thu, Jul 5, 2018 at 11:03 AM, Wojciech Puchar <wojtek@puchar.net> wrote:
> 
> >
> >> Rewriting suspicious sectors is useless in this day and age.  HDDs and
> >> SSDs
> >> already do it internally and have for years.  Even healthy sectors get
> >>
> >
> > unreadable sectors cannot be rewritten by drive electronics as it doesn't
> > know what to rewrite. it may possibly remap it but still report read error
> > until some data will be written - unless giving no error and returning
> > meaningless data is an accepted behaviour.
> >
> 
> But if that disk is already managed by ZFS, the pool is redundant, and the
> bad sector is allocated by ZFS, then ZFS will immediately rewrite the
> unreadable sector.

ZFS, if it gets a re error, will rewrite the unreadable sector
to a DIFFERENT block, not over the top of the bad spot.

> > only on write it can be done properly.
> >
> > that the HDD/SSD won't fix itself would be a checksum error.  Those are
> >>
> >
> > yes and this will happen if you powerdown your disk on write. or get some
> > power spike or other source of noise that would affect electronic
> > components.
> >
> 
> It happens surprisingly rarely.  Even on a sudden power loss, the drive is
> usually able to finish its current write operation.  When you run into
> problems would be if the power loss were coincident with a mechanical shock
> that knocks the head off-track, or something like that.

I agree that "power failure" are rare causes of write errors, and an
idea of how often this might of happened is look at the emergency
retract counter, if your gettng lots of those you should try to find
out why and stop that.   Vibration has become a serious problem though,
at todays head flight hight drives are sensitive to this, you can
even cause a drive to do retires by yelling at it with a loud
voice :-)   Look at the "high fly" counter to see if your getting
this issue.

> > performing full disk rewrite (so not zfs rebuilds) and THEN looking at
> > smart stats and THEN performing regular smartctl -t long will tell the
> > truth.
> >
> > which usually is "drive is fine" in my practice. really faulty drive will
> > QUICKLY develop new problems.
> >
> 
> Yeah, that should make the error go away.  It takes a long time, though.
> With a SCSI drive, you can get the exact LBAs affected with a "READ
> DEFECTS" command.  But there isn't a vendor-independent equivalent for
> SATA, unfortunately.

My bitch exactly about ATA missing this.  Though there are vendor specific
commands to get it.

-- 
Rod Grimes                                                 rgrimes@freebsd.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201807051815.w65IFqsB048887>