From owner-freebsd-fs@FreeBSD.ORG Mon Mar 31 12:06:59 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DCED106574C for ; Mon, 31 Mar 2008 12:06:57 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id C9C898FC17 for ; Mon, 31 Mar 2008 12:06:56 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from [172.16.129.140] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id 32DC6AF422 for ; Mon, 31 Mar 2008 13:51:13 +0200 (CEST) Message-ID: <47F0D02B.8060504@fsn.hu> Date: Mon, 31 Mar 2008 13:51:07 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 12:06:59 -0000 Hello, On my desktop machine I use a ZFS pool for everything but the swap and the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is on gmirror of two-two partitions). The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of two partitions from two SATA disks, which were encrypted with GELI individually. After using the system for some weeks (without any higher number of IO activity, just working on the machine), the first hang came: I couldn't move the mouse under X, but remote sessions were alive and also the clock app still counted the time fine. I couldn't log into the machine via ssh, the port was open, but I haven't got the banner. I've done a portupgrade at that time. After this (and several other) hangs, I decided to remove GELI from the equation without success. Then one partition (disk) instead of two. And now, I am running amd64 instead of i386 and the problem still persists. I've attached my notebook to the machine and here is what I have during the hang (currently, I am in the process of upgrading some ports and now a configure tries to run, but the machine has stopped): KDB: enter: manual escape to debugger [thread pid 23 tid 100022 ] Stopped at kdb_enter+0x31: leave db> bt Tracing pid 23 tid 100022 td 0xffffff000127c350 kdb_enter() at kdb_enter+0x31 scgetc() at scgetc+0x461 sckbdevent() at sckbdevent+0xa4 kbdmux_intr() at kbdmux_intr+0x43 kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 taskqueue_run() at taskqueue_run+0x9f ithread_loop() at ithread_loop+0x180 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 --- db> ps pid ppid pgrp uid state wmesg wchan cmd 77873 77871 76757 0 S+ piperd 0xffffff0010036ba0 as 77872 77871 76757 0 S+ zfs:(&zi 0xffffff0020ba0298 cc1plus 77871 77870 76757 0 S+ wait 0xffffff0001446000 c++ 77870 77321 76757 0 S+ wait 0xffffff000eba7468 sh 77321 76882 76757 0 S+ wait 0xffffff00396448d0 sh 76882 76881 76757 0 S+ wait 0xffffff000eba5468 sh 76881 76757 76757 0 S+ wait 0xffffff00014b28d0 sh 76757 76755 76757 0 Ss+ wait 0xffffff001b369000 make 76755 62725 62725 0 S+ select 0xffffffff80a89d50 script 62725 817 62725 0 S+ wait 0xffffff0001f43000 initial thread 86757 86674 86757 1001 SL+ pfault 0xffffffff80a9a79c ssh 86674 86672 86674 1001 Ss+ wait 0xffffff0001f428d0 bash 86672 86670 86670 1001 S select 0xffffffff80a89d50 sshd 86670 721 86670 0 Ss sbwait 0xffffff0001d2015c sshd 62788 802 62788 0 S+ ttyin 0xffffff00013bac10 csh 46310 801 46310 0 ?+ csh 817 800 817 0 S+ pause 0xffffff0001f420c0 csh 807 1 807 0 Ss+ ttyin 0xffffff000139e810 getty 806 1 806 0 Ss+ ttyin 0xffffff00013bb410 getty 805 1 805 0 Ss+ ttyin 0xffffff00013ba810 getty 804 1 804 0 Ss+ ttyin 0xffffff00013ba010 getty 803 1 803 0 Ss+ ttyin 0xffffff00013b9410 getty 802 1 802 0 Ss+ wait 0xffffff00014b08d0 login 801 1 801 0 Ss+ wait 0xffffff0001567468 login 800 1 800 0 Ss+ wait 0xffffff00014b0468 login 737 1 737 0 ?s cron 731 1 731 25 Ss pause 0xffffff00015650c0 sendmail 727 1 727 0 ?s sendmail 721 1 721 0 Ss select 0xffffffff80a89d50 sshd 688 687 687 123 ? ntpd 687 1 687 0 Ss select 0xffffffff80a89d50 ntpd 668 1 668 0 Ss select 0xffffffff80a89d50 powerd 559 1 559 0 ?s syslogd 488 1 488 0 Ss select 0xffffffff80a89d50 devd 440 1 440 0 Ss select 0xffffffff80a89d50 moused 248 1 248 0 Ss pause 0xffffff00015640c0 adjkerntz 175 0 0 0 SL zfs:(&tq 0xffffff0001583080 [zil_clean] 174 0 0 0 SL zfs:(&tq 0xffffff00015831c0 [zil_clean] 173 0 0 0 SL zfs:(&tq 0xffffff0001583300 [zil_clean] 172 0 0 0 SL zfs:(&tq 0xffffff0001583440 [zil_clean] 171 0 0 0 SL zfs:(&tq 0xffffff0001583580 [zil_clean] 170 0 0 0 SL zfs:(&tq 0xffffff00015836c0 [zil_clean] 168 0 0 0 SL zfs:(&tx 0xffffff000146c590 [txg_thread_enter] 167 0 0 0 SL zfs:(&zi 0xffffff000c717d58 [txg_thread_enter] 166 0 0 0 SL zfs:(&tx 0xffffff00014e0a40 [txg_thread_enter] 165 0 0 0 SL vgeom:io 0xffffff000145c410 [vdev:worker ad0s1d] 164 0 0 0 SL zfs:(&tq 0xffffff000158e300 [spa_zio_intr_5] 163 0 0 0 SL zfs:(&tq 0xffffff000158e300 [spa_zio_intr_5] 162 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 [spa_zio_issue_5] 161 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 [spa_zio_issue_5] 160 0 0 0 SL zfs:(&tq 0xffffff0001227d00 [spa_zio_intr_4] 159 0 0 0 SL zfs:(&tq 0xffffff0001227d00 [spa_zio_intr_4] 158 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 [spa_zio_issue_4] 157 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 [spa_zio_issue_4] 156 0 0 0 SL zfs:(&tq 0xffffff0001227a80 [spa_zio_intr_3] 155 0 0 0 SL zfs:(&tq 0xffffff0001227a80 [spa_zio_intr_3] 154 0 0 0 SL zfs:(&tq 0xffffff0001227940 [spa_zio_issue_3] 153 0 0 0 SL zfs:(&tq 0xffffff0001227940 [spa_zio_issue_3] 152 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [spa_zio_intr_2] 151 0 0 0 SL vmwait 0xffffffff80a9a79c [spa_zio_intr_2] 150 0 0 0 SL vmwait 0xffffffff80a9a79c [spa_zio_issue_2] 149 0 0 0 SL vmwait 0xffffffff80a9a79c [spa_zio_issue_2] 148 0 0 0 SL zfs:(&tq 0xffffff0001227580 [spa_zio_intr_1] 147 0 0 0 SL zfs:(&tq 0xffffff0001227580 [spa_zio_intr_1] 146 0 0 0 SL zfs:(&tq 0xffffff0001227440 [spa_zio_issue_1] 145 0 0 0 SL zfs:(&tq 0xffffff0001227440 [spa_zio_issue_1] 144 0 0 0 SL zfs:(&tq 0xffffff00012271c0 [spa_zio_intr_0] 143 0 0 0 SL zfs:(&tq 0xffffff00012271c0 [spa_zio_intr_0] 142 0 0 0 SL zfs:(&tq 0xffffff0001227300 [spa_zio_issue_0] 141 0 0 0 SL zfs:(&tq 0xffffff0001227300 [spa_zio_issue_0] 87 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[1] mirror/swa] 86 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[0] mirror/swa] 53 0 0 0 SL sdflush 0xffffffff80a99d88 [softdepflush] 52 0 0 0 SL vlruwt 0xffffff0001448000 [vnlru] 51 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [syncer] 50 0 0 0 SL psleep 0xffffffff80a8a55c [bufdaemon] 49 0 0 0 SL pgzero 0xffffffff80a9b804 [pagezero] 48 0 0 0 SL psleep 0xffffffff80a9ab48 [vmdaemon] 47 0 0 0 SL wswbuf0 0xffffffff80a9a004 [pagedaemon] 46 0 0 0 SL m:w1 0xffffff0001401200 [g_mirror swap] 45 0 0 0 SL m:w1 0xffffff00013c3800 [g_mirror root] 44 0 0 0 SL zfs:(&ar 0xffffffff80c746b0 [arc_reclaim_thread] 43 0 0 0 SL waiting_ 0xffffffff80a8dc88 [sctp_iterator] 42 0 0 0 WL [swi0: sio] 41 0 0 0 WL [irq1: atkbd0] 40 0 0 0 WL [irq15: ata1] 39 0 0 0 WL [irq14: ata0] 38 0 0 0 SL usbevt 0xffffff000133a420 [usb5] 37 0 0 0 SL usbevt 0xffffffff81065420 [usb4] 36 0 0 0 SL usbevt 0xffffffff81063420 [usb3] 35 0 0 0 SL usbevt 0xffffff000130c420 [usb2] 34 0 0 0 WL [irq22: ehci0] 33 0 0 0 SL usbevt 0xffffffff81061420 [usb1] 32 0 0 0 WL [irq21: pcm0 uhci1+] 31 0 0 0 SL usbtsk 0xffffffff80a71028 [usbtask-dr] 30 0 0 0 SL usbtsk 0xffffffff80a71000 [usbtask-hc] 29 0 0 0 SL usbevt 0xffffffff8105f420 [usb0] 28 0 0 0 WL [irq20: uhci0 uhci+] 27 0 0 0 SL - 0xffffff00012ef880 [em0 taskq] 26 0 0 0 WL [irq9: acpi0] 25 0 0 0 SL - 0xffffff0001294580 [kqueue taskq] 24 0 0 0 WL [swi6: task queue] 23 0 0 0 RL CPU 1 [swi6: Giant taskq] 22 0 0 0 SL - 0xffffff000122c500 [thread taskq] 21 0 0 0 WL [swi5: +] 20 0 0 0 SL - 0xffffff000122ca80 [acpi_task_2] 19 0 0 0 SL - 0xffffff000122ca80 [acpi_task_1] 18 0 0 0 SL - 0xffffff000122ca80 [acpi_task_0] 17 0 0 0 WL [swi2: cambio] 9 0 0 0 SL ccb_scan 0xffffffff80a3fda0 [xpt_thrd] 16 0 0 0 SL - 0xffffffff80a74ea8 [yarrow] 8 0 0 0 SL crypto_r 0xffffffff80d293b0 [crypto returns] 7 0 0 0 SL crypto_w 0xffffffff80d29350 [crypto] 6 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] 5 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] 4 0 0 0 SL - 0xffffffff80a71838 [g_down] 3 0 0 0 SL - 0xffffffff80a71830 [g_up] 2 0 0 0 SL - 0xffffffff80a71820 [g_event] 15 0 0 0 WL [swi1: net] 14 0 0 0 WL [swi3: vm] 13 0 0 0 LL *Giant 0xffffff00015a2be0 [swi4: clock sio] 12 0 0 0 RL CPU 0 [idle: cpu0] 11 0 0 0 RL [idle: cpu1] 1 0 1 0 SLs wait 0xffffff000112f8d0 [init] 10 0 0 0 SL audit_wo 0xffffffff80a99260 [audit] 0 0 0 0 SLs vmwait 0xffffffff80a9a79c [swapper] db> trace 77872 Tracing pid 77872 tid 100157 td 0xffffff000eb9c350 sched_switch() at sched_switch+0x1fe mi_switch() at mi_switch+0x189 sleepq_wait() at sleepq_wait+0x3b _cv_wait() at _cv_wait+0xfe zio_wait() at zio_wait+0x5f dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6 dmu_buf_hold_array() at dmu_buf_hold_array+0x62 dmu_read_uio() at dmu_read_uio+0x3f zfs_freebsd_read() at zfs_freebsd_read+0x535 vn_read() at vn_read+0x1ca dofileread() at dofileread+0xa1 kern_readv() at kern_readv+0x4c read() at read+0x54 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = 0x7fffffffe2c8, rbp = 0 --- db> trace 51 Tracing pid 51 tid 100050 td 0xffffff00014246a0 sched_switch() at sched_switch+0x1fe mi_switch() at mi_switch+0x189 sleepq_wait() at sleepq_wait+0x3b _sx_xlock_hard() at _sx_xlock_hard+0x1ee _sx_xlock() at _sx_xlock+0x4e vdev_queue_io() at vdev_queue_io+0x74 vdev_geom_io_start() at vdev_geom_io_start+0x4a vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0 zil_lwb_write_start() at zil_lwb_write_start+0x2f1 zil_commit_writer() at zil_commit_writer+0x1c4 zil_commit() at zil_commit+0xb8 zfs_sync() at zfs_sync+0x9a sync_fsync() at sync_fsync+0x1ac sched_sync() at sched_sync+0x63f fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 --- Any ideas about this?