FreeBSD Mail Archives

Date:      Sun, 10 Nov 2013 11:15:31 +0100
From:      Benjamin Lutz <benjamin.lutz@biolab.ch>
To:        Artem Belevich <art@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, Andre Seidelt <andre.seidelt@biolab-technology.de>, Dirk Hoefle <dirk.hoefle@biolab.ch>
Subject:   Re: Ghost ZFS pool prevents mounting root fs
Message-ID:  <OFA0E2BD37.F9A7CBF8-ONC1257C1F.00385A3A-C1257C1F.00385A3D@biotronik.com>
In-Reply-To: <CAFqOu6gCuL6Hr86ceKkJTMhznCNtMStGNUWyz--U6TURUc1OCw@mail.gmail.com>
References:  <CAFqOu6gCuL6Hr86ceKkJTMhznCNtMStGNUWyz--U6TURUc1OCw@mail.gmail.com>,  <OF58789A8B.60B611D5-ONC1257C1D.0039CC23-C1257C1D.003BDD97@biotronik.com>

Hello Artem,

Thanks for the detailed explanation, I think I understand what I need to do=
. Since I left the disks with a couple GB of empty space at the end (in cas=
e I need to replace them with another model which happens to be a few MB sm=
aller), I'll try allocating that space in a partition then zeroing that, wh=
ich should prevent any mistakes or calculation errors. If GPT makes the par=
tition reach close enough to the end of the disk, that is. And if it doesn'=
t, we'll see how dd(1) works as a scalpel.

Annoyingly, I can't rely on ZFS here to repair any damage. Since the data s=
et is large, scrubs now take about 36 hours. Even if I modfied the 8 affect=
ed disks in pairs, it'd take too long. Unless renaming the active tank work=
s and resolves the ambiguity, then I can maybe do it while it's working nor=
mally.

Anyway, thanks again. And sorry about the email formatting, I can't figure =
out how to make Lotus Notes do mailing-list-appropriate replies :/

Cheers,
Benjamin
--
 Benjamin Lutz | Software Engineer | BIOLAB=A0Technology AG
 Dufourstr. 80 | CH-8008 Zurich | www.biolab.ch | benjamin.lutz@biolab.ch
 PHONE +41 44 295 97 13 | MOBILE +41 79 558 57 13 | FAX +41 44 295 97 19

-----artemb@gmail.com wrote: -----
To: Benjamin Lutz <benjamin.lutz@biolab.ch>
From: Artem Belevich=20
Sent by: artemb@gmail.com
Date: 2013-11-08 20:07
Cc: freebsd-fs <freebsd-fs@freebsd.org>, Andre Seidelt <andre.seidelt@biola=
b-technology.de>, Dirk Hoefle <dirk.hoefle@biolab.ch>
Subject: Re: Ghost ZFS pool prevents mounting root fs

On Fri, Nov 8, 2013 at 2:53 AM, Benjamin Lutz <benjamin.lutz@biolab.ch> wro=
te:
> Hello,
>
> I have a server here that after trying to reboot during the 9.2 update
> process refuses to mount the root file system, which is a ZFS (tank).
>
> The error message given is:
> =A0 Trying to mount root from zfs:tank []...
> =A0 Mounting from zfs:tank failed with error 5.
>
> Adding a mit more verbosity by setting vfs.zfs.debug=3D1 gives one
> additional crucial bit of information that probably explains why, it tries
> to find the disk /dev/label/disk7, but no such disk exists.

I ran into the same issue recently.
http://lists.freebsd.org/pipermail/freebsd-fs/2013-November/018496.html

> Can you tell me how to resolve the situation, i.e. how to make the ghost
> pool go away? I'd rather not recreate the pool or move the data to another
> system, since it's around 16TB and would take forever.

It should be doable, but usual "YMMV", "proceed at your own risk",
"here, there be dragons" warnings apply.

[snip]

> root@:~ # zdb -l /dev/da1
> --------------------------------------------
> LABEL 0
> --------------------------------------------
> failed to unpack label 0
> --------------------------------------------
> LABEL 1
> --------------------------------------------
> failed to unpack label 1
> --------------------------------------------
> LABEL 2
> --------------------------------------------
> =A0 =A0 version: 28
> =A0 =A0 name: 'tank'
> =A0 =A0 state: 2
> =A0 =A0 txg: 61
> =A0 =A0 pool=5Fguid: 4570073208211798611
> =A0 =A0 hostid: 1638041647
> =A0 =A0 hostname: 'blackhole'
> =A0 =A0 top=5Fguid: 5554077360160676751
> =A0 =A0 guid: 11488943812765429059
> =A0 =A0 vdev=5Fchildren: 1
> =A0 =A0 vdev=5Ftree:
> =A0 =A0 =A0 =A0 type: 'raidz'
> =A0 =A0 =A0 =A0 id: 0
> =A0 =A0 =A0 =A0 guid: 5554077360160676751
> =A0 =A0 =A0 =A0 nparity: 3
> =A0 =A0 =A0 =A0 metaslab=5Farray: 30
> =A0 =A0 =A0 =A0 metaslab=5Fshift: 37
> =A0 =A0 =A0 =A0 ashift: 12
> =A0 =A0 =A0 =A0 asize: 16003153002496
> =A0 =A0 =A0 =A0 is=5Flog: 0
> =A0 =A0 =A0 =A0 create=5Ftxg: 4
> =A0 =A0 =A0 =A0 children[0]:
> =A0 =A0 =A0 =A0 =A0 =A0 type: 'disk'
> =A0 =A0 =A0 =A0 =A0 =A0 id: 0
> =A0 =A0 =A0 =A0 =A0 =A0 guid: 7103686668495146668
> =A0 =A0 =A0 =A0 =A0 =A0 path: '/dev/label/disk0'
> =A0 =A0 =A0 =A0 =A0 =A0 phys=5Fpath: '/dev/label/disk0'
> =A0 =A0 =A0 =A0 =A0 =A0 whole=5Fdisk: 1
> =A0 =A0 =A0 =A0 =A0 =A0 create=5Ftxg: 4

The ghost labels are at the end of /dev/da1 (and, probably all other
drives that used to be part of that pool).
In my case I ended up manually zeroing out first sector of offending labels.

ZFS places two copies of the labels at 512K and 256K from the end of
the pool slice.
See ZFS on-disk specification here:
http://maczfs.googlecode.com/files/ZFSOnDiskFormat.pdf

It's fairly easy to find with:

#dd if=3D/dev/da1 bs=3D1m iseek=3D{disk size in mb -1} count=3D1 | hexdump =
-C
| grep version

Once you know where exactly it is, deleting it is simple. Watch out
for dd typos or, perhaps use some sort of disk editor to make sure
you're not overwriting wrong data.
It's a fairly risky operation as you have to make sure you don't nuke
anything else by accident.
If the disk portion with the labels is currently unallocated, then
things are relatively safe.
If it's currently used, then you'll need to figure out whether it's
safe to overwrite those labels directly or find another way to do it.
I.e. if the area with the labels is currently used for some other
filesystem, you may be able to get rid of the label by filling up that
filesystem with data which would hopefully overwrite labels with
something else. If the labels are within the area that is part of the
current pool, you are probably safe as it's either in unused area or
it's not been used by ZFS yet. In my case the ghost labels were in the
neighbourhood of the labels of the current pool and nuking them
produced zero errors on scrub.

Once you've established that manual label nuking is what you want,
here's the recipe:

* Make sure risking your data is *really* worth it. Consider erasing
drives one-by-one and let raidz repair the pool if you have any
doubts.

Now that that's out of the way, let's nuke them.

* offline one of the drives with the ghost labels or do the operation
on an unmounted pool (I've booted from MFSBSD CD).

Make sure that it is the right sector you're writing to (i.e. it's the
label with wrong disks):
* dd if=3D/dev/daN bs=3D512 iseek=3D<sector that has 'version' word in the
label> count=3D10 | hexdump -C

Nuke the ghost! Note: you only want to write *one* sector. Make sure
you don't forget to edit count if you use shell history and reuse the
commend above.

* dd if=3D/dev/zero of=3D/dev/daN bs=3D512 oseek=3D{sector that has 'versio=
n'
word in the label} count=3D1

* make sure "zdb -l /dev/daN" no longer shows ghost label.

* online the disk

* scrub the pool. In case you made a mistake and wrote to the wrong
place that may save your pool.
=FFI did the scrub only after I've erased label on the first drive to
make sure it didn't damage anything vital.

* repeat for all other disks with ghost labels.

* run the scrub after all ghost labels have been erased. Just in case.

Good luck.

--Artem

This e-mail and the information it contains including attachments are confi=
dential and meant
only for use by the intended recipient(s); disclosure or copying is strictl=
y prohibited. If you
are not addressed, but in the possession of this e-mail, please notify the =
sender immediately.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?OFA0E2BD37.F9A7CBF8-ONC1257C1F.00385A3A-C1257C1F.00385A3D>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation