FreeBSD Mail Archives

Date:      Sun, 06 May 2012 08:38:18 -0400
From:      "Simon" <simon@optinet.com>
To:        "Artem Belevich" <art@freebsd.org>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0
Message-ID:  <20120506123826.412881065672@hub.freebsd.org>
In-Reply-To: <CAFqOu6gz%2BFd-NvPivMz3nfeGCYz0a563yNBOpmsAyHZS_TQybQ@mail.gmail.com>


Are you suggesting that if a disk sector goes bad or memory corrupts few blocks
of data, the entire zpool is gonna go bust? can the same occur with a ZRAID?
I thought the ZFS was designed to overcome all these issues to begin with. Is
this not the case?

-Simon

On Sat, 5 May 2012 23:11:01 -0700, Artem Belevich wrote:

>I believe I've ran into this issue couple three times. In all cases
>the culprit was memory corruption. If were to guess, corruption
>damaged critical data *before* ZFS calculated checksum and was able to
>write it to disk. Once that happened, kernel would panic every time
>once the pool was in use. Crashes could happen as soon as zpool import
>or as late as after few days of uptime or next scheduled scrub. I even
>tried importing/scrubbing the pool on opensolaris without much success
>-- while solaris didn't crash outright, it failed to import the pool
>with internal assertion.

>On Sat, May 5, 2012 at 7:13 PM, Michael Richards <hackish@gmail.com> wrote:
>> Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a
>> UFS filesystem and (it's a mail server) the business part of the
>> operation is on ZFS.
>>
>> One day it crashed with an odd kernel panic. I assumed it was a memory
>> issue so I had more RAM installed. I tried to get a PAE kernel working
>> to use this extra ram but it was crashing every few hours.
>>
>> Suspecting a hardware issue all the hardware was replaced.

>Bad memory could indeed do that.

>> I had some difficulty trying to figure out how to mount my old ZFS
>> partition but eventually did so.
>...
>> zpool import -f -R /altroot 10433152746165646153 olddata
>> panics the kernel. Similar panic as seen in all the other kernel versions.


>> Gives a bit more info about things I've tried. Whatever it is seems to
>> affect a wide variety of kernels.

>Kernel is just a messenger here. The root cause is that while ZFS does
>go an extra mile or two in order to ensure data consistency, there's
>only so much it can do if RAM is bad. Once that kind of problem
>happened, it may leave the pool in a state that ZFS will not be able
>to deal with out of the box.

>Not everything may be lost, though.

>First of all -- make a copy of your pool, if it's feasible.
>Probability of screwing it up even more is rather high.

>ZFS internally keeps large number of uberblocks. Each uberblock is
>sort of a periodic checkpoint of the pool state after ZFS writes next
>transaction group (every 10-40 sec, depending on vfs.zfs.txg.timeout
>sysctl, more often if there are a lot of ongoing write activity).
>Basically you need to destroy the most recent uberblock to manually
>roll-back your ZFS pool. Hopefully, you'll only need to nuke few most
>recent ones to restore the pool to the point before corruption ruined
>it.

>Now, ZFS keeps multiple copies of uberblocks. You will need to nuke
>*all* instances of the most recent uberblock in order to roll pool
>state backwards.

>Solaris internals site seems to have a script to do that now (I wish I
>knew about it back when I needed it):
>http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script

>Good luck!

>--Artem
>_______________________________________________
>freebsd-fs@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120506123826.412881065672>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation