Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Jun 2009 17:06:27 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Kip Macy <kip.macy@gmail.com>
Cc:        freebsd-fs@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Thomas Backman <serenity@exscape.org>
Subject:   Re: ZFS: Silent/hidden errors, nothing logged anywhere
Message-ID:  <20090613150627.GB1848@garage.freebsd.pl>
In-Reply-To: <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com>
References:  <920A69B1-4F06-477E-A13B-63CC22A13120@exscape.org> <3c1674c90906121401s19105167vf4535566321b45de@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--uQr8t48UFsdbeI+V
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 12, 2009 at 02:01:57PM -0700, Kip Macy wrote:
> On Fri, Jun 12, 2009 at 10:32 AM, Thomas Backman<serenity@exscape.org> wr=
ote:
> > OK, so I filed a PR late May (kern/135050):
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D135050=A0.
> > I don't know if this is a "feature" or a bug, but it really should be
> > considered the latter. The data could be repaired in the background wit=
hout
> > the user ever knowing - until the disk dies completely. I'd prefer to h=
ave
> > warning signs (i.e. checksum errors) so that I can buy a replacement dr=
ive
> > *before* that.
> >
> > Not only does this mean that errors can go unnoticed, but also that it's
> > impossible to figure out which disk is broken, if ZFS has *temporarily*
> > repaired the broken data! THAT is REALLY bad!
> > Is this something that we can expect to see changed before 8.0-RELEASE?
>=20
>=20
> I'm fairly certain that we've discussed this already. Solaris uses FMA
> - I don't think that I'll get to a "real fix" any time soon. The time
> that I do have will go to addressing stability problems (memory
> over-allocation, NFS interaction, control directory mounts) all of
> which cause panics. Maintaining them persistently in the label doesn't
> make sense  -  when do you drop them? Would a simple log message about
> the number of checksum errors suffice?

We do log such errors. Solaris uses FMA and for FreeBSD I use devd. You
can find the following entry in /etc/devd.conf:

notify 10 {
        match "system"          "ZFS";
        match "type"            "checksum";
        action "logger -p kern.warn 'ZFS: checksum mismatch, zpool=3D$pool =
path=3D$vdev_path offset=3D$zio_offset size=3D$zio_size'";
};

If you see nothing in your logs, there must be a bug with reporting the
problem somewhere or devd is not running (it should be enabled by
default).

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--uQr8t48UFsdbeI+V
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFKM8BzForvXbEpPzQRAm8qAKDtRdWdJdv/KSP3kCqtrqZU+0M8VQCaA14F
ii4K+WbYT01ZpKw58i1FItw=
=fJRK
-----END PGP SIGNATURE-----

--uQr8t48UFsdbeI+V--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090613150627.GB1848>