Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Aug 2019 15:15:58 +1000
From:      MJ <mafsys1234@gmail.com>
To:        Victor Sudakov <vas@mpeks.tomsk.su>, freebsd-questions@freebsd.org
Subject:   Re: Kernel panic and ZFS corruption on 11.3-RELEASE
Message-ID:  <2964dd94-ad99-d0b8-c5d8-5d276cf02d06@gmail.com>
In-Reply-To: <20190828025728.GA1441@admin.sibptus.ru>
References:  <20190828025728.GA1441@admin.sibptus.ru>

next in thread | previous in thread | raw e-mail | index | archive | help


On 28/08/2019 12:57 pm, Victor Sudakov wrote:
> Dear Colleagues,
> 
> Shortly after upgrading to 11.3-RELEASE I had a kernel panic:
> 
> Aug 28 00:01:40 vas kernel: panic: solaris assert: dmu_buf_hold_array(os, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) == 0 (0x5 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c, line: 1022
> Aug 28 00:01:40 vas kernel: cpuid = 0
> Aug 28 00:01:40 vas kernel: KDB: stack backtrace:
> Aug 28 00:01:40 vas kernel: #0 0xffffffff80b4c4d7 at kdb_backtrace+0x67
> Aug 28 00:01:40 vas kernel: #1 0xffffffff80b054ee at vpanic+0x17e
> Aug 28 00:01:40 vas kernel: #2 0xffffffff80b05363 at panic+0x43
> Aug 28 00:01:40 vas kernel: #3 0xffffffff8260322c at assfail3+0x2c
> Aug 28 00:01:40 vas kernel: #4 0xffffffff822a9585 at dmu_write+0xa5
> Aug 28 00:01:40 vas kernel: #5 0xffffffff82302b38 at space_map_write+0x188
> Aug 28 00:01:40 vas kernel: #6 0xffffffff822e31fd at metaslab_sync+0x41d
> Aug 28 00:01:40 vas kernel: #7 0xffffffff8230b63b at vdev_sync+0xab
> Aug 28 00:01:40 vas kernel: #8 0xffffffff822f776b at spa_sync+0xb5b
> Aug 28 00:01:40 vas kernel: #9 0xffffffff82304420 at txg_sync_thread+0x280
> Aug 28 00:01:40 vas kernel: #10 0xffffffff80ac8ac3 at fork_exit+0x83
> Aug 28 00:01:40 vas kernel: #11 0xffffffff80f69d6e at fork_trampoline+0xe
> Aug 28 00:01:40 vas kernel: Uptime: 14d3h42m57s
> 
> after which the ZFS pool became corrupt:
> 
>    pool: d02
>   state: FAULTED
> status: The pool metadata is corrupted and the pool cannot be opened.
> action: Recovery is possible, but will result in some data loss.
> 	Returning the pool to its state as of вторник, 27 августа 2019 г. 23:51:20
> 	should correct the problem.  Approximately 9 minutes of data
> 	must be discarded, irreversibly.  Recovery can be attempted
> 	by executing 'zpool clear -F d02'.  A scrub of the pool
> 	is strongly recommended after recovery.
>     see: http://illumos.org/msg/ZFS-8000-72
>    scan: resilvered 423K in 0 days 00:00:05 with 0 errors on Sat Sep 30 04:12:20 2017
> config:
> 
> 	NAME	    STATE     READ WRITE CKSUM
> 	d02	    FAULTED	 0     0     2
> 	  ada2.eli  ONLINE	 0     0    12
> 
> However, "zpool clear -F d02" results in error:
> cannot clear errors for d02: I/O error
> 
> Do you know if there is a way to recover the data, or should I say farewell to several hundred Gb of anime?
> 
> PS I think I do have the vmcore file if someone is interested to debug the panic.

Do you have a backup? Then restore it.

If you don't, have you tried
zpool import -F d02

Some references you might like to read:
https://docs.oracle.com/cd/E19253-01/819-5461/gbctt/index.html
Take note of this section:
"If the damaged pool is in the zpool.cache file, the problem is discovered when the system is booted, and the damaged pool is reported in the zpool status command. If the pool isn't in the zpool.cache file, it won't successfully import or open and you'll see the damaged pool messages when you attempt to import the pool."

I've not had your exact error, but in the case of disk corruption/failure, I've used import as the sledgehammer approach.

Regards,
Mark.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2964dd94-ad99-d0b8-c5d8-5d276cf02d06>