Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jan 2010 18:40:13 -0600 (CST)
From:      Wes Morgan <morganw@chemikals.org>
To:        Rich <rincebrain@gmail.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Errors on a file on a zpool: How to remove?
Message-ID:  <alpine.BSF.2.00.1001231814210.2160@ibyngvyr>
In-Reply-To: <5da0588e1001231541l246769eao410c5ea6ccca0de4@mail.gmail.com>
References:  <5da0588e1001222223m773648am907267235bdcf882@mail.gmail.com> <ed91d4a81001230011t7aef2da8h3be13d2494c06550@mail.gmail.com> <5da0588e1001230014k1b8a32f8v42046497265429ed@mail.gmail.com> <alpine.BSF.2.00.1001231519110.91898@ibyngvyr>  <5da0588e1001231415t403f29ceq6e8dcd16edb4a28@mail.gmail.com> <alpine.BSF.2.00.1001231733570.2160@ibyngvyr> <5da0588e1001231541l246769eao410c5ea6ccca0de4@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--3224958491-39306717-1264292171=:2160
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-15
Content-Transfer-Encoding: 8BIT
Content-ID: <alpine.BSF.2.00.1001231816171.2160@ibyngvyr>

On Sat, 23 Jan 2010, Rich wrote:

> I have no files named 0x0.
>
> I have a number of files which, on attempting to do anything to them
> (stat, mv, rm), EIO occurs, the checksum error number on three of the
> disks in that pool ticks up, and /var/log/messages reports what I
> reported in my initial post. (i discovered this due to FreeBSD's daily
> check-for-setuid-bits-in-strange-places find command reporting EIO on
> some files.)
>
> My original post in this thread is about how to resolve this.

Do these bad files show up on "zpool status -v" after a scrub?

This really sounds much more like an issue of corrupt metadata. ZFS keeps
multiple copies of filesystem metadata even on non-redundant pools (ditto
blocks). You said there was bad ram in this machine at one point, which
may mean that *all* of the metadata was corrupt.

In my encounter with a bad stick of ram, the data was correct but the
stored checksums were wrong. I was able to "recover" the data by simply
changing zfs_read() to not report EIO when it encounters an ECKSUM error
from the zfs layer -- essentially ignoring the checksum error. I have no
idea what this might do if the metadata itself is corrupt, so that could
be risky.

Another option is the zdb solution mentioned earlier.

>
> On Sat, Jan 23, 2010 at 6:34 PM, Wes Morgan <morganw@chemikals.org> wrote:
> > On Sat, 23 Jan 2010, Rich wrote:
> >
> >> On Sat, Jan 23, 2010 at 4:21 PM, Wes Morgan <morganw@chemikals.org> wrote:
> >> > On Sat, 23 Jan 2010, Rich wrote:
> >> >
> >> >> I already diagnosed the bad hardware - one of the two sticks of RAM
> >> >> had gone bad, and fails memtest in the other machine.
> >> >>
> >> >>   pool: rigatoni
> >> >>  state: ONLINE
> >> >> status: One or more devices has experienced an error resulting in data
> >> >>       corruption.  Applications may be affected.
> >> >> action: Restore the file in question if possible.  Otherwise restore the
> >> >>       entire pool from backup.
> >> >>    see: http://www.sun.com/msg/ZFS-8000-8A
> >> >>  scrub: scrub completed after 15h28m with 1 errors on Thu Jan 21 18:09:25 2010
> >> >> config:
> >> >>
> >> >>       NAME        STATE     READ WRITE CKSUM
> >> >>       rigatoni    ONLINE       0     0     1
> >> >>         da4       ONLINE       0     0     2
> >> >>         da5       ONLINE       0     0     2
> >> >>         da7       ONLINE       0     0     0
> >> >>         da6       ONLINE       0     0     0
> >> >>         da2       ONLINE       0     0     2
> >> >>
> >> >> errors: Permanent errors have been detected in the following files:
> >> >>
> >> >>         rigatoni/mirrors:<0x0>
> >> >
> >> > Can you post your entire pool filesystem structure? That message above
> >> > looks like an unreferenced block or corrupted metadata rather than an
> >> > actual file. Also, if it's part of a snapshot, you simply have to destroy
> >> > the snapshot.
> >> >
> >> > I had a pool become corrupted due to bad memory, and all of the files were
> >> > still able to be manipulated. The only time EIO popped up was on the
> >> > specific block that had a checksum error.
> >>
> >> # zfs list -r -t all rigatoni
> >> NAME                  USED  AVAIL  REFER  MOUNTPOINT
> >> rigatoni             5.73T   984G    19K  /rigatoni
> >> rigatoni/logs_bitch   269M   984G   269M  /rigatoni/logs_bitch
> >> rigatoni/mirrors     5.73T   984G  5.73T  /mirrors
> >>
> >> No snapshots here. :/
> >>
> >> EIO only pops up on the files I mentioned above - everything else in
> >> those directories, including renaming that directory, is fine.
> >
> > I must have missed it, what files is it showing besides the <0x0> address?
> > Or do you have a file named "<0x0>"?
>
>
>
>
--3224958491-39306717-1264292171=:2160--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1001231814210.2160>