Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 May 2010 03:22:17 +0200
From:      =?iso-8859-1?Q?St=E5le?= Kristoffersen <staale@kristoffersen.ws>
To:        freebsd-fs@freebsd.org
Subject:   Bad hardware + zfs = panic
Message-ID:  <20100506012217.GA41806@putsch.kolbu.ws>

next in thread | raw e-mail | index | archive | help
I've been debugging a hardware error for the past few days, and I think it
was the CPU and that it is now fixed. But reading a file that was written to a
zfs-pool when stuff got corrupted still triggered a panic in ZFS code:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8106f2d3
stack pointer           = 0x28:0xffffff80774914e0
frame pointer           = 0x28:0xffffff8077491510
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1350 (smbd)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 2m53s

The lines in the backtrace that got my attention was:
#6  0xffffffff80847c73 in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:224
#7  0xffffffff8106f2d3 in vdev_is_dead (vd=0x0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1847
#8  0xffffffff8106f2ed in vdev_readable (vd=0x0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1854

The complete bt is available here:
http://heim.ifi.uio.no/staalebk/zfs-panic.txt

As you can see vd=0x0, and I think that caused the panic, since it
tried to follow that pointer:
 return (vd->vdev_state < VDEV_STATE_DEGRADED);

I then tried to remove the file and I got this:
Solaris: WARNING: metaslab_free_dva(): bad DVA
199476166:1296607792756162560
Solaris: WARNING: metaslab_free_dva(): bad DVA 4236221:7256850009726709760
Solaris: WARNING: metaslab_free_dva(): bad DVA
935912721:16480078061480073216

Maybe there should be a test to check if vd was zero, and
throw an io-error or something, instead of panicing?

I'm new to debugging kernels, so if what I'm typing makes no sense, just
tell me.

Kernel version is:
FreeBSD fs2 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan
5 21:11:58 UTC 2010
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

-- 
Ståle Kristoffersen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100506012217.GA41806>