Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Sep 2017 19:25:14 +0200
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        freebsd-stable@freebsd.org
Subject:   Re: panic: Solaris(panic): blkptr invalid CHECKSUM1
Message-ID:  <59CFD37A.8080009@omnilan.de>
In-Reply-To: <59CFC6A6.6030600@omnilan.de>
References:  <59CFC6A6.6030600@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
 Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime):
>  Bad surprise.
> Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down
> that (byhve(8)) guest – jhb@ helped my identifying this as the root
> cause for sever memory corruptions I regularly had (on stable-11).
>
> Now this time, corruption affected ZFS's RAM area, obviously.
>
> What I haven't expected is the panic.
> The machine has memory disk as root, so luckily I still can boot (from
> ZFS, –> mdpreload rootfs) into single user mode, but early rc stage
> (most likely mounting ZFS datasets) leads to the following panic:
>
> Trying to mount root from ufs:/dev/ufs/cetusROOT []...
> panic: Solaris(panic): blkptr at 0xfffffe0005b6b000 has invalid CHECKSUM 1
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff805e3837 at kdb_backtrace+0x67
> #1 0xffffffff805a2286 at vpanic+0x186
> #2 0xffffffff805a20f3 at panic+0x43
> #3 0xffffffff81570192 at vcmn_err+0xc2
> #4 0xffffffff812d7dda at zfs_panic_recover+0x5a
> #5 0xffffffff812ff49b at zfs_blkptr_verify+0x8b
> #6 0xffffffff812ff72c at zio_read+0x2c
> #7 0xffffffff812761de at arc_read+0x6de
> #8 0xffffffff81298b4d at traverse_prefetch_metadata+0xbd
> #9 0xffffffff812980ed at traverse_visitbp+0x39d
> #10 0xffffffff81298c27 at traverse_dnode+0xc7
> #11 0xffffffff812984a3 at traverse_visitbp+0x753
> #12 0xffffffff8129788b at traverse_impl+0x22b
> #13 0xffffffff81297afc at traverse_pool+0x5c
> #14 0xffffffff812cce06 at spa_load+0x1c06
> #15 0xffffffff812cc302 at spa_load+0x1102
> #16 0xffffffff812cac6e at spa_load_best+0x6e
> #17 0xffffffff812c73a1 at spa_open_common+0x101
> Uptime: 37s
> Dumping 1082 out of 15733 MB:..2%..…
> Dump complete
> mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12
> mps0: Incrementing SSU count
> …
>
> Haven't done any scrub attempts yet – expectation is to get all datasets
> of the striped mirror pool back...
>
> Any hints highly appreciated.

Now it seems I'm in really big trouble.
Regular import doesn't work (also not if booted from cd9660).
I get all pools listed, but trying to import (unmounted) leads to the
same panic as initialy reported – because rc is just doning the same.

I booted into single user mode (which works since the bootpool isn't
affected and root is a memory disk from the bootpool)
and set vfs.zfs.recover=1.
But this time I don't even get the list of pools to import 'zpool'
import instantaniously leads to that panic:

Solaris: WARNING: blkptr at 0xfffffe0005a8e000 has invalid CHECKSUM 1
Solaris: WARNING: blkptr at 0xfffffe0005a8e000 has invalid COMPRESS 0
Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 0 has invalid VDEV
2337865727
Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 1 has invalid VDEV
289407040
Solaris: WARNING: blkptr at 0xfffffe0005a8e000 DVA 2 has invalid VDEV
3959586324


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x50
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff812de904
stack pointer           = 0x28:0xfffffe043f6bcbc0
frame pointer           = 0x28:0xfffffe043f6bcbc0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 44 (zpool)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff805e3837 at kdb_backtrace+0x67
#1 0xffffffff805a2286 at vpanic+0x186
#2 0xffffffff805a20f3 at panic+0x43
#3 0xffffffff808a4922 at trap_fatal+0x322
#4 0xffffffff808a4979 at trap_pfault+0x49
#5 0xffffffff808a41f8 at trap+0x298
#6 0xffffffff80889fb1 at calltrap+0x8
#7 0xffffffff812e58a3 at vdev_mirror_child_select+0x53
#8 0xffffffff812e535e at vdev_mirror_io_start+0x2ee
#9 0xffffffff81303aa1 at zio_vdev_io_start+0x161
#10 0xffffffff8130054c at zio_execute+0xac
#11 0xffffffff812ffe7b at zio_nowait+0xcb
#12 0xffffffff812761f3 at arc_read+0x6f3
#13 0xffffffff81298b4d at traverse_prefetch_metadata+0xbd
#14 0xffffffff812980ed at traverse_visitbp+0x39d
#15 0xffffffff81298c27 at traverse_dnode+0xc7
#16 0xffffffff812984a3 at traverse_visitbp+0x753
#17 0xffffffff8129788b at traverse_impl+0x22b

Now I hope any ZFS guru can help me out. Needless to mention that the
bits on this mirrored pool are important for me – no productive data,
but lots of intermediate...

Thanks,

-harry




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59CFD37A.8080009>