Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 04 Aug 2009 18:01:22 -0400
From:      Boris Kochergin <spawk@acm.poly.edu>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS RAID-Z panic on vdev failure + subsequent panics and hangs
Message-ID:  <4A78AFB2.10103@acm.poly.edu>
In-Reply-To: <4A78AA71.9050107@acm.poly.edu>
References:  <4A78AA71.9050107@acm.poly.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Boris Kochergin wrote:
> Ahoy. I have a seven-disk RAID-Z pool in a 8-BETA2/amd64 machine. One 
> of the disks (ad13) failed to write something today, and the system 
> proceeded to panic. I couldn't get a dump or any otherwise useful 
> information, but the panic made reference to "vdev_is_dead". Upon 
> reboot, it panics again, probably when "zfs mount" is called by its 
> rc.d script:
>
> Fatal trap 9: general protection fault while in kernel mode
> instruction pointer     = 0x20:0xffffffff807cbdbb
> stack pointer           = 0x28:0xffffff8077bf54c0
> frame pointer           = 0x28:0xffffff8077bf54d0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 82 (zfs)
> panic: from debugger
> Uptime: 13s
> Physical memory: 4081 MB
> Dumping 1245 MB: 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 
> 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 
> 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 
> 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 
> 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14
>
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols 
> from /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> #0  doadump () at pcpu.h:223
> 223     pcpu.h: No such file or directory.
>        in pcpu.h
> (kgdb) where
> #0  doadump () at pcpu.h:223
> #1  0xffffffff8058ff11 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:419
> #2  0xffffffff805902eb in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:575
> #3  0xffffffff801d9997 in db_panic (addr=Variable "addr" is not 
> available.
> ) at /usr/src/sys/ddb/db_command.c:478
> #4  0xffffffff801d9da1 in db_command (last_cmdp=0xffffffff80bd5120, 
> cmd_table=Variable "cmd_table" is not available.
> ) at /usr/src/sys/ddb/db_command.c:445
> #5  0xffffffff801d9ff0 in db_command_loop () at 
> /usr/src/sys/ddb/db_command.c:498
> #6  0xffffffff801dbf79 in db_trap (type=Variable "type" is not available.
> ) at /usr/src/sys/ddb/db_main.c:229
> #7  0xffffffff805bbd94 in kdb_trap (type=9, code=0, tf=Variable "tf" 
> is not available.
> ) at /usr/src/sys/kern/subr_kdb.c:534
> #8  0xffffffff8086dc5d in trap_fatal (frame=0xffffff8077bf5410, eva=0) 
> at /usr/src/sys/amd64/amd64/trap.c:847
> #9  0xffffffff8086e74d in trap (frame=0xffffff8077bf5410) at 
> /usr/src/sys/amd64/amd64/trap.c:639
> #10 0xffffffff80857403 in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:224
> #11 0xffffffff807cbdbb in slab_alloc_item (zone=Variable "zone" is not 
> available.
> ) at /usr/src/sys/vm/uma_core.c:2300
> #12 0xffffffff807ce80e in zone_alloc_item (zone=0xffffff00dffae000, 
> udata=0x0, flags=259) at /usr/src/sys/vm/uma_core.c:2475
> #13 0xffffffff807cee03 in keg_alloc_slab (keg=0xffffff00dffad460, 
> zone=0xffffff00dffac380, wait=259) at /usr/src/sys/vm/uma_core.c:826
> #14 0xffffffff807cf177 in keg_fetch_slab (keg=0xffffff00dffad460, 
> zone=0xffffff00dffac380, flags=259) at /usr/src/sys/vm/uma_core.c:2152
> #15 0xffffffff807cf21e in zone_fetch_slab (zone=0xffffff00dffac380, 
> keg=0xffffff00dffad460, flags=259) at /usr/src/sys/vm/uma_core.c:2212
> #16 0xffffffff807d05eb in uma_zalloc_arg (zone=0xffffff00dffac380, 
> udata=0x0, flags=259) at /usr/src/sys/vm/uma_core.c:2381
> #17 0xffffffff8057e727 in malloc (size=Variable "size" is not available.
> ) at uma.h:305
> #18 0xffffffff81060365 in metaslab_init (mg=0xffffff0004472980, 
> smo=0xffffff8077bf5730, start=530428461056, size=2147483648, txg=0) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:294 
>
> #19 0xffffffff81071b3e in vdev_metaslab_init (vd=0xffffff0001ecf800, 
> txg=0) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:796 
>
> #20 0xffffffff81071da5 in vdev_load (vd=0xffffff0001ecf800) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1531 
>
> #21 0xffffffff81071c75 in vdev_load (vd=0xffffff0001ed1800) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1526 
>
> #22 0xffffffff8106539c in spa_load (spa=0xffffff0001ff0000, 
> config=Variable "config" is not available.
> ) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1361 
>
> #23 0xffffffff81064ee1 in spa_load (spa=0xffffff0001ff0000, 
> config=Variable "config" is not available.
> ) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1189 
>
> #24 0xffffffff810658fd in spa_open_common (pool=Variable "pool" is not 
> available.
> ) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1474 
>
> #25 0xffffffff81065a52 in spa_get_stats (name=0xffffff0001ff5000 
> "home", config=0xffffff8077bf59e0, altroot=0xffffff0001ff5400 "", 
> buflen=1024)
>    at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1671 
>
> #26 0xffffffff81093e7c in zfs_ioc_pool_stats (zc=0xffffff0001ff5000) 
> at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:914 
>
> #27 0xffffffff810941c4 in zfsdev_ioctl (dev=Variable "dev" is not 
> available.
> ) at 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3022 
>
> #28 0xffffffff80511c76 in devfs_ioctl_f (fp=0xffffff0001f4bc80, 
> com=3425196549, data=0xffffff0001ff5000, cred=Variable "cred" is not 
> available.
> ) at /usr/src/sys/fs/devfs/devfs_vnops.c:659
> #29 0xffffffff805cb166 in kern_ioctl (td=0xffffff0001f0c390, fd=3, 
> com=3425196549, data=0xffffff0001ff5000 "home") at file.h:262
> #30 0xffffffff805cb38e in ioctl (td=0xffffff0001f0c390, 
> uap=0xffffff8077bf5bf0) at /usr/src/sys/kern/sys_generic.c:678
> #31 0xffffffff8086e28f in syscall (frame=0xffffff8077bf5c80) at 
> /usr/src/sys/amd64/amd64/trap.c:984
> #32 0xffffffff808576e1 in Xfast_syscall () at 
> /usr/src/sys/amd64/amd64/exception.S:373
> #33 0x0000000800fe1d0c in ?? ()
>
> Booting the system without the disk causes any "zfs" or "zpool" 
> commands to hang the system after a while. Breaking to DDB doesn't 
> work using a keyboard and VGA (I don't have any other kind of gear 
> here). In case it is relevant, the pool started life as version 6 and 
> was upgraded using 7.2-STABLE shortly after the version 13 MFC. The 
> output of "zdb" with all disks connected:
>
> home
>    version=13
>    name='home'
>    state=0
>    txg=16061492
>    pool_guid=14089219607492705674
>    hostid=413956888
>    hostname='unset'
>    vdev_tree
>        type='root'
>        id=0
>        guid=14089219607492705674
>        children[0]
>                type='raidz'
>                id=0
>                guid=17899218839424019335
>                nparity=1
>                metaslab_array=14
>                metaslab_shift=31
>                ashift=9
>                asize=2800585539584
>                is_log=0
>                children[0]
>                        type='disk'
>                        id=0
>                        guid=15839907043443901501
>                        path='/dev/ad4'
>                        devid='ad:3QK08728'
>                        whole_disk=0
>                        DTL=389
>                children[1]
>                        type='disk'
>                        id=1
>                        guid=13623369126078337737
>                        path='/dev/ad16'
>                        devid='ad:9QH04HJN'
>                        whole_disk=0
>                        DTL=391
>                children[2]
>                        type='disk'
>                        id=2
>                        guid=15619490422714555908
>                        path='/dev/ad14'
>                        devid='ad:5NF1DDXR'
>                        whole_disk=0
>                        DTL=390
>                children[3]
>                        type='disk'
>                        id=3
>                        guid=6995275135550350664
>                        path='/dev/ad15'
>                        devid='ad:9QG93JHX'
>                        whole_disk=0
>                        DTL=386
>                children[4]
>                        type='disk'
>                        id=4
>                        guid=10651992494569677081
>                        path='/dev/ad13'
>                        devid='ad:9QH04GTY'
>                        whole_disk=0
>                        DTL=388
>                children[5]
>                        type='disk'
>                        id=5
>                        guid=10503557489947490214
>                        path='/dev/ad18'
>                        devid='ad:5NF1DDVB'
>                        whole_disk=0
>                        DTL=387
>                children[6]
>                        type='disk'
>                        id=6
>                        guid=17574056058658811312
>                        path='/dev/ad12'
>                        devid='ad:9QG90QA2'
>                        whole_disk=0
>                        DTL=392
>
> Can anyone help? I would be content to at least have access to the 
> filesystem in degraded mode.
>
> -Boris
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
In a subsequent attempt at "zfs mount -a", the following panic happened:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xffffffff813dadb5
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff805951a5
stack pointer           = 0x28:0xffffff8077eb3360
frame pointer           = 0x28:0xffffff8077eb3370
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 832 (zfs)
panic: from debugger
Uptime: 2m32s
Physical memory: 4082 MB
Dumping 1282 MB: 1267 1251 1235 1219 1203 1187 1171 1155 1139 1123 1107 
1091 1075 1059 1043 1027 1011 995 979 963 947 931 915 899 883 867 851 
835 819 803 787 771 755 739 723 707 691 675 659 643 627 611 595 579 563 
547 531 515 499 483 467 451 435 419 403 387 371 355 339 323 307 291 275 
259 243 227 211 195 179 163 147 131 115 99 83 67 51 35 19 3

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
#0  doadump () at pcpu.h:223
223     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:223
#1  0xffffffff8058d881 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:419
#2  0xffffffff8058dc5b in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:575
#3  0xffffffff801d9767 in db_panic (addr=Variable "addr" is not available.
) at /usr/src/sys/ddb/db_command.c:478
#4  0xffffffff801d9b71 in db_command (last_cmdp=0xffffffff80bd2120, 
cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5  0xffffffff801d9dc0 in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:498
#6  0xffffffff801dbd49 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0xffffffff805b9704 in kdb_trap (type=12, code=0, tf=Variable "tf" is 
not available.
) at /usr/src/sys/kern/subr_kdb.c:534
#8  0xffffffff8086b5cd in trap_fatal (frame=0xffffff8077eb32b0, 
eva=18446744071582887349) at /usr/src/sys/amd64/amd64/trap.c:847
#9  0xffffffff8086b994 in trap_pfault (frame=0xffffff8077eb32b0, 
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:768
#10 0xffffffff8086c16b in trap (frame=0xffffff8077eb32b0) at 
/usr/src/sys/amd64/amd64/trap.c:494
#11 0xffffffff80854d73 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:224
#12 0xffffffff805951a5 in _sx_xlock (sx=0xffffffff813dad9d, opts=0, 
file=0xffffffff810f57f0 
"/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c", 
line=967) at atomic.h:147
#13 0xffffffff810392e5 in add_reference (ab=0xffffff002c03b340, 
hash_lock=Variable "hash_lock" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:967
#14 0xffffffff8103d377 in arc_buf_add_ref (buf=0xffffff0003ee87e0, 
tag=0xffffff002c046c40) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1208
#15 0xffffffff8103fe0d in dbuf_hold_impl (dn=0xffffff0003eec300, 
level=Variable "level" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1633
#16 0xffffffff81040ddb in dbuf_hold (dn=Variable "dn" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1689
#17 0xffffffff8104d5bc in dnode_hold_impl (os=0xffffff0003a01400, 
object=754, flag=1, tag=Variable "tag" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:584
#18 0xffffffff81042c5a in dmu_bonus_hold (os=Variable "os" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:147
#19 0xffffffff81071bb7 in vdev_metaslab_init (vd=0xffffff00036dc800, 
txg=0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:787
#20 0xffffffff81071da5 in vdev_load (vd=0xffffff00036dc800) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1531
#21 0xffffffff81071c75 in vdev_load (vd=0xffffff00036db800) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1526
#22 0xffffffff8106539c in spa_load (spa=0xffffff00034f7000, 
config=Variable "config" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1361
#23 0xffffffff81064ee1 in spa_load (spa=0xffffff00034f7000, 
config=Variable "config" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1189
#24 0xffffffff810658fd in spa_open_common (pool=Variable "pool" is not 
available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1474
#25 0xffffffff810512af in dsl_dir_open_spa (spa=0x0, name=Variable 
"name" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:314
#26 0xffffffff8105627b in dsl_dataset_hold (name=Variable "name" is not 
available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:571
#27 0xffffffff8104867f in dmu_objset_open (name=0xffffff0003013000 
"home", type=DMU_OST_ANY, mode=9, osp=0xffffff8077eb39e0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:349
#28 0xffffffff810936e2 in zfs_ioc_objset_stats (zc=0xffffff0003013000) 
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1231
#29 0xffffffff810941c4 in zfsdev_ioctl (dev=Variable "dev" is not available.
) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3022
#30 0xffffffff80511a46 in devfs_ioctl_f (fp=0xffffff00034eb230, 
com=3425196561, data=0xffffff0003013000, cred=Variable "cred" is not 
available.
) at /usr/src/sys/fs/devfs/devfs_vnops.c:659
#31 0xffffffff805c8ad6 in kern_ioctl (td=0xffffff00037ac720, fd=3, 
com=3425196561, data=0xffffff0003013000 "home") at file.h:262
#32 0xffffffff805c8cfe in ioctl (td=0xffffff00037ac720, 
uap=0xffffff8077eb3bf0) at /usr/src/sys/kern/sys_generic.c:678
#33 0xffffffff8086bbff in syscall (frame=0xffffff8077eb3c80) at 
/usr/src/sys/amd64/amd64/trap.c:984
#34 0xffffffff80855051 in Xfast_syscall () at 
/usr/src/sys/amd64/amd64/exception.S:373
#35 0x0000000800fe1d0c in ?? ()

This isn't related to the problems described in the "zfs: Fatal trap 12: 
page fault while in kernel mode" thread, is it? I've poked around on it 
and the panics look different.

-Boris



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A78AFB2.10103>