Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 04 Aug 2009 17:38:57 -0400
From:      Boris Kochergin <spawk@acm.poly.edu>
To:        freebsd-fs@freebsd.org
Subject:   ZFS RAID-Z panic on vdev failure + subsequent panics and hangs
Message-ID:  <4A78AA71.9050107@acm.poly.edu>

next in thread | raw e-mail | index | archive | help
Ahoy. I have a seven-disk RAID-Z pool in a 8-BETA2/amd64 machine. One of 
the disks (ad13) failed to write something today, and the system 
proceeded to panic. I couldn't get a dump or any otherwise useful 
information, but the panic made reference to "vdev_is_dead". Upon 
reboot, it panics again, probably when "zfs mount" is called by its rc.d 
script:

Fatal trap 9: general protection fault while in kernel mode
instruction pointer     = 0x20:0xffffffff807cbdbb
stack pointer           = 0x28:0xffffff8077bf54c0
frame pointer           = 0x28:0xffffff8077bf54d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 82 (zfs)
panic: from debugger
Uptime: 13s
Physical memory: 4081 MB
Dumping 1245 MB: 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 
1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 
782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 
494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 
206 190 174 158 142 126 110 94 78 62 46 30 14

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
#0  doadump () at pcpu.h:223
223     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:223
#1  0xffffffff8058ff11 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:419
#2  0xffffffff805902eb in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:575
#3  0xffffffff801d9997 in db_panic (addr=Variable "addr" is not available.
) at /usr/src/sys/ddb/db_command.c:478
#4  0xffffffff801d9da1 in db_command (last_cmdp=0xffffffff80bd5120, 
cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5  0xffffffff801d9ff0 in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:498
#6  0xffffffff801dbf79 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0xffffffff805bbd94 in kdb_trap (type=9, code=0, tf=Variable "tf" is 
not available.
) at /usr/src/sys/kern/subr_kdb.c:534
#8  0xffffffff8086dc5d in trap_fatal (frame=0xffffff8077bf5410, eva=0) 
at /usr/src/sys/amd64/amd64/trap.c:847
#9  0xffffffff8086e74d in trap (frame=0xffffff8077bf5410) at 
/usr/src/sys/amd64/amd64/trap.c:639
#10 0xffffffff80857403 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:224
#11 0xffffffff807cbdbb in slab_alloc_item (zone=Variable "zone" is not 
available.
) at /usr/src/sys/vm/uma_core.c:2300
#12 0xffffffff807ce80e in zone_alloc_item (zone=0xffffff00dffae000, 
udata=0x0, flags=259) at /usr/src/sys/vm/uma_core.c:2475
#13 0xffffffff807cee03 in keg_alloc_slab (keg=0xffffff00dffad460, 
zone=0xffffff00dffac380, wait=259) at /usr/src/sys/vm/uma_core.c:826
#14 0xffffffff807cf177 in keg_fetch_slab (keg=0xffffff00dffad460, 
zone=0xffffff00dffac380, flags=259) at /usr/src/sys/vm/uma_core.c:2152
#15 0xffffffff807cf21e in zone_fetch_slab (zone=0xffffff00dffac380, 
keg=0xffffff00dffad460, flags=259) at /usr/src/sys/vm/uma_core.c:2212
#16 0xffffffff807d05eb in uma_zalloc_arg (zone=0xffffff00dffac380, 
udata=0x0, flags=259) at /usr/src/sys/vm/uma_core.c:2381
#17 0xffffffff8057e727 in malloc (size=Variable "size" is not available.
) at uma.h:305
#18 0xffffffff81060365 in metaslab_init (mg=0xffffff0004472980, 
smo=0xffffff8077bf5730, start=530428461056, size=2147483648, txg=0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:294
#19 0xffffffff81071b3e in vdev_metaslab_init (vd=0xffffff0001ecf800, 
txg=0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:796
#20 0xffffffff81071da5 in vdev_load (vd=0xffffff0001ecf800) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1531
#21 0xffffffff81071c75 in vdev_load (vd=0xffffff0001ed1800) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1526
#22 0xffffffff8106539c in spa_load (spa=0xffffff0001ff0000, 
config=Variable "config" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1361
#23 0xffffffff81064ee1 in spa_load (spa=0xffffff0001ff0000, 
config=Variable "config" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1189
#24 0xffffffff810658fd in spa_open_common (pool=Variable "pool" is not 
available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1474
#25 0xffffffff81065a52 in spa_get_stats (name=0xffffff0001ff5000 "home", 
config=0xffffff8077bf59e0, altroot=0xffffff0001ff5400 "", buflen=1024)
    at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1671
#26 0xffffffff81093e7c in zfs_ioc_pool_stats (zc=0xffffff0001ff5000) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:914
#27 0xffffffff810941c4 in zfsdev_ioctl (dev=Variable "dev" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3022
#28 0xffffffff80511c76 in devfs_ioctl_f (fp=0xffffff0001f4bc80, 
com=3425196549, data=0xffffff0001ff5000, cred=Variable "cred" is not 
available.
) at /usr/src/sys/fs/devfs/devfs_vnops.c:659
#29 0xffffffff805cb166 in kern_ioctl (td=0xffffff0001f0c390, fd=3, 
com=3425196549, data=0xffffff0001ff5000 "home") at file.h:262
#30 0xffffffff805cb38e in ioctl (td=0xffffff0001f0c390, 
uap=0xffffff8077bf5bf0) at /usr/src/sys/kern/sys_generic.c:678
#31 0xffffffff8086e28f in syscall (frame=0xffffff8077bf5c80) at 
/usr/src/sys/amd64/amd64/trap.c:984
#32 0xffffffff808576e1 in Xfast_syscall () at 
/usr/src/sys/amd64/amd64/exception.S:373
#33 0x0000000800fe1d0c in ?? ()

Booting the system without the disk causes any "zfs" or "zpool" commands 
to hang the system after a while. Breaking to DDB doesn't work using a 
keyboard and VGA (I don't have any other kind of gear here). In case it 
is relevant, the pool started life as version 6 and was upgraded using 
7.2-STABLE shortly after the version 13 MFC. The output of "zdb" with 
all disks connected:

home
    version=13
    name='home'
    state=0
    txg=16061492
    pool_guid=14089219607492705674
    hostid=413956888
    hostname='unset'
    vdev_tree
        type='root'
        id=0
        guid=14089219607492705674
        children[0]
                type='raidz'
                id=0
                guid=17899218839424019335
                nparity=1
                metaslab_array=14
                metaslab_shift=31
                ashift=9
                asize=2800585539584
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=15839907043443901501
                        path='/dev/ad4'
                        devid='ad:3QK08728'
                        whole_disk=0
                        DTL=389
                children[1]
                        type='disk'
                        id=1
                        guid=13623369126078337737
                        path='/dev/ad16'
                        devid='ad:9QH04HJN'
                        whole_disk=0
                        DTL=391
                children[2]
                        type='disk'
                        id=2
                        guid=15619490422714555908
                        path='/dev/ad14'
                        devid='ad:5NF1DDXR'
                        whole_disk=0
                        DTL=390
                children[3]
                        type='disk'
                        id=3
                        guid=6995275135550350664
                        path='/dev/ad15'
                        devid='ad:9QG93JHX'
                        whole_disk=0
                        DTL=386
                children[4]
                        type='disk'
                        id=4
                        guid=10651992494569677081
                        path='/dev/ad13'
                        devid='ad:9QH04GTY'
                        whole_disk=0
                        DTL=388
                children[5]
                        type='disk'
                        id=5
                        guid=10503557489947490214
                        path='/dev/ad18'
                        devid='ad:5NF1DDVB'
                        whole_disk=0
                        DTL=387
                children[6]
                        type='disk'
                        id=6
                        guid=17574056058658811312
                        path='/dev/ad12'
                        devid='ad:9QG90QA2'
                        whole_disk=0
                        DTL=392

Can anyone help? I would be content to at least have access to the 
filesystem in degraded mode.

-Boris



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A78AA71.9050107>