Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Mar 2013 23:12:16 +0900
From:      DarkSoul <darksoul@darkbsd.org>
To:        freebsd-fs@freebsd.org
Subject:   Re: Panic loop on ZFS with 9.1-RELEASE
Message-ID:  <51408940.9000609@darkbsd.org>
In-Reply-To: <513B7555.1010701@darkbsd.org>
References:  <513B58B6.2090903@darkbsd.org> <513B6E1E.6080805@darkbsd.org> <513B7555.1010701@darkbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig2AE1FCAA1F9E474431CFBC3A
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

Just a very quick heads up.

I still haven't succeeded in importing the pool readwrite, but I have
succeeded in importing it readonly.

This has been confirmed as a bug by the ZFS illumos ML people.
Description :
You can't import readonly a pool that has cache devices, because the
import will try to send write IOs to auxiliary vdevs, and hit an
assert() call, thus provoking a panic.

Workaround :
Destroy cache devices before zpool import -o readonly=on -f <pool>.

Cheers,

On 03/10/2013 02:45 AM, Stephane LAPIE wrote:
> Pinpoint analysis of the zpool on the broken vdev gives the following
> information :
>
> # zdb -AAA -e -mm prana 1 33
>
> Metaslabs:
>     vdev          1
>     metaslabs   145   offset                spacemap          free     
>     ---------------   -------------------   ---------------   -------------
>     metaslab     33   offset  21000000000   spacemap    303   free    11.9G
> WARNING: zfs: allocating allocated segment(offset=2335563722752 size=1024)
>
> Assertion failed: sm->sm_space == space (0x2f927f400 == 0x2f927f800),
> file
> /usr/storage/tech/eirei-no-za.yomi.darkbsd.org/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
> line 353.
> pid 51 (zdb), uid 0: exited on signal 6 (core dumped)
> Abort trap (core dumped)
>
> Just in case, root vdev 1 is made of the following devices :
> children[1]:
>     type: 'raidz'
>     id: 1
>     guid: 1078755695237588414
>     nparity: 1
>     metaslab_array: 175
>     metaslab_shift: 36
>     ashift: 9
>     asize: 10001970626560
>     is_log: 0
>     children[0]:
>         type: 'disk'
>         id: 0
>         guid: 12900041001921590764
>         path: '/dev/da10'
>         phys_path: '/dev/da10'
>         whole_disk: 0
>         DTL: 4127
>     children[1]:
>         type: 'disk'
>         id: 1
>         guid: 7211789756938666186
>         path: '/dev/da3'
>         phys_path: '/dev/da3'
>         whole_disk: 1
>         DTL: 4119
>     children[2]:
>         type: 'disk'
>         id: 2
>         guid: 12094368820342087236
>         path: '/dev/da5'
>         phys_path: '/dev/da5'
>         whole_disk: 1
>         DTL: 212
>     children[3]:
>         type: 'disk'
>         id: 3
>         guid: 6868867539761908697
>         path: '/dev/da4'
>         phys_path: '/dev/da4'
>         whole_disk: 0
>         DTL: 4173
>     children[4]:
>         type: 'disk'
>         id: 4
>         guid: 3091570768700552191
>         path: '/dev/da6'
>         phys_path: '/dev/da6'
>         whole_disk: 0
>         DTL: 4182
>
> At this point I am nearly considering ripping these out and zpool
> importing while ignoring missing devices... :/
>
> On 03/10/2013 02:15 AM, Stephane LAPIE wrote:
>> Posting a quick update.
>>
>> I ran a "zdb -emm" command to figure out what was going on, and it blew
>> up in my face with an abort trap here :
>> - vdev 0 has 145 metaslabs, which are cleared without any problems.
>> - vdev 1 has 145 metaslabs, but fails in the middle :
>> metaslab     32   offset  20000000000   spacemap    289   free    1.64G
>>                   segments      19509   maxsize   41.7M   freepct    2%
>> metaslab     33   offset  21000000000   spacemap    303   free    11.9G
>> error: zfs: allocating allocated segment(offset=2335563722752 size=1024)
>> Abort trap(core dumped)
>>
>> Converting offset 2335563722752 from earlier kernel panic messages gives
>> : 21fca723000, which matches the broken metaslab found by zdb.
>>
>> Is there anything I can do at this point, using zdb?
>> It just sounds surrealistic I have ONE broken metaslab (seemingly?) and
>> that I can't recover anything...
>>
>> Cheers,
>>
>> On 03/10/2013 12:43 AM, Stephane LAPIE wrote:
>>> Hello list,
>>>
>>> I currently am faced with a sudden death case I can't understand at all,
>>> and I would be very appreciating of any explanation or assistance :(
>>>
>>> Here is my current kernel version :
>>> FreeBSD  9.1-STABLE FreeBSD 9.1-STABLE #5 r245055: Thu Jan 17 13:12:59
>>> JST 2013
>>> darksoul@eirei-no-za.yomi.darkbsd.org:/usr/obj/usr/storage/tech/eirei-no-za.yomi.darkbsd.org/usr/src/sys/DARK-2012KERN 
>>> amd64
>>> (Kernel is basically a lightened GENERIC kernel without VESA options and
>>> unneeded controllers removed)
>>>
>>> The pool is a set of 3x raidz1 (5 drives), + 2 cache devices + mirrored
>>> transaction log
>>>
>>> Booting and trying to import the pool is met with :
>>> Solaris(panic): zfs: panic: allocating allocated
>>> segment(offset=2335563722752 size=1024)
>>>
>>> Booting single mode on my emergency flash card with a base OS and zpool
>>> import -o readonly=on is met with :
>>> panic: solaris assert: zio->io_type != ZIO_TYPE_WRITE ||
>>> spa_writeable(spa), file:
>>> /usr/storage/tech/eirei-no-za.yomi.darkbsd.org/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c,
>>> line: 2461
>>>
>>> I tried zpool import -F -f, zpool import -F -f -m after removing the
>>> mirrored transaction log devices, but after 40s of trying to import, it
>>> just blows up.
>>>
>>> I am currently running "zdb -emm" as per the procedure suggested here :
>>> http://simplex.swordsaint.net/?p=199 if only to get some debug information.
>>>
>>> Thanks in advance for your time.
>>>
>>> Cheers,
>>>
>>>
>>> -- 
>>> Stephane LAPIE, EPITA SRS, Promo 2005
>>> "Even when they have digital readouts, I can't understand them."
>>> --MegaTokyo
>> -- 
>> Stephane LAPIE, EPITA SRS, Promo 2005
>> "Even when they have digital readouts, I can't understand them."
>> --MegaTokyo

-- 
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo


--------------enig2AE1FCAA1F9E474431CFBC3A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iF4EAREIAAYFAlFAiUEACgkQDJ4OK7D3FWQcogEAs/t505xibhP4EWsRGiAF8+qO
NDV/kBSdgU7Cd/UB118BALK603KdwiW4fxn/NnGBZa4T0k5NhUWUrwQ/YgjgUZWO
=6itv
-----END PGP SIGNATURE-----

--------------enig2AE1FCAA1F9E474431CFBC3A--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51408940.9000609>