Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Mar 2008 13:51:07 +0200
From:      Attila Nagy <bra@fsn.hu>
To:        freebsd-fs@FreeBSD.org
Subject:   ZFS hangs very often
Message-ID:  <47F0D02B.8060504@fsn.hu>

next in thread | raw e-mail | index | archive | help
Hello,

On my desktop machine I use a ZFS pool for everything but the swap and 
the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is 
on gmirror of two-two partitions).

The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of 
two partitions from two SATA disks, which were encrypted with GELI 
individually.

After using the system for some weeks (without any higher number of IO 
activity, just working on the machine), the first hang came: I couldn't 
move the mouse under X, but remote sessions were alive and also the 
clock app still counted the time fine. I couldn't log into the machine 
via ssh, the port was open, but I haven't got the banner.
I've done a portupgrade at that time.

After this (and several other) hangs, I decided to remove GELI from the 
equation without success. Then one partition (disk) instead of two. And 
now, I am running amd64 instead of i386 and the problem still persists.

I've attached my notebook to the machine and here is what I have during 
the hang (currently, I am in the process of upgrading some ports and now 
a configure tries to run, but the machine has stopped):
KDB: enter: manual escape to debugger
[thread pid 23 tid 100022 ]
Stopped at      kdb_enter+0x31: leave
db> bt
Tracing pid 23 tid 100022 td 0xffffff000127c350
kdb_enter() at kdb_enter+0x31
scgetc() at scgetc+0x461
sckbdevent() at sckbdevent+0xa4
kbdmux_intr() at kbdmux_intr+0x43
kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20
taskqueue_run() at taskqueue_run+0x9f
ithread_loop() at ithread_loop+0x180
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 ---
db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
77873 77871 76757     0  S+      piperd   0xffffff0010036ba0 as
77872 77871 76757     0  S+      zfs:(&zi 0xffffff0020ba0298 cc1plus
77871 77870 76757     0  S+      wait     0xffffff0001446000 c++
77870 77321 76757     0  S+      wait     0xffffff000eba7468 sh
77321 76882 76757     0  S+      wait     0xffffff00396448d0 sh
76882 76881 76757     0  S+      wait     0xffffff000eba5468 sh
76881 76757 76757     0  S+      wait     0xffffff00014b28d0 sh
76757 76755 76757     0  Ss+     wait     0xffffff001b369000 make
76755 62725 62725     0  S+      select   0xffffffff80a89d50 script
62725   817 62725     0  S+      wait     0xffffff0001f43000 initial thread
86757 86674 86757  1001  SL+     pfault   0xffffffff80a9a79c ssh
86674 86672 86674  1001  Ss+     wait     0xffffff0001f428d0 bash
86672 86670 86670  1001  S       select   0xffffffff80a89d50 sshd
86670   721 86670     0  Ss      sbwait   0xffffff0001d2015c sshd
62788   802 62788     0  S+      ttyin    0xffffff00013bac10 csh
46310   801 46310     0  ?+                                  csh
  817   800   817     0  S+      pause    0xffffff0001f420c0 csh
  807     1   807     0  Ss+     ttyin    0xffffff000139e810 getty
  806     1   806     0  Ss+     ttyin    0xffffff00013bb410 getty
  805     1   805     0  Ss+     ttyin    0xffffff00013ba810 getty
  804     1   804     0  Ss+     ttyin    0xffffff00013ba010 getty
  803     1   803     0  Ss+     ttyin    0xffffff00013b9410 getty
  802     1   802     0  Ss+     wait     0xffffff00014b08d0 login
  801     1   801     0  Ss+     wait     0xffffff0001567468 login
  800     1   800     0  Ss+     wait     0xffffff00014b0468 login
  737     1   737     0  ?s                                  cron
  731     1   731    25  Ss      pause    0xffffff00015650c0 sendmail
  727     1   727     0  ?s                                  sendmail
  721     1   721     0  Ss      select   0xffffffff80a89d50 sshd
  688   687   687   123  ?                                   ntpd
  687     1   687     0  Ss      select   0xffffffff80a89d50 ntpd
  668     1   668     0  Ss      select   0xffffffff80a89d50 powerd
  559     1   559     0  ?s                                  syslogd
  488     1   488     0  Ss      select   0xffffffff80a89d50 devd
  440     1   440     0  Ss      select   0xffffffff80a89d50 moused
  248     1   248     0  Ss      pause    0xffffff00015640c0 adjkerntz
  175     0     0     0  SL      zfs:(&tq 0xffffff0001583080 [zil_clean]
  174     0     0     0  SL      zfs:(&tq 0xffffff00015831c0 [zil_clean]
  173     0     0     0  SL      zfs:(&tq 0xffffff0001583300 [zil_clean]
  172     0     0     0  SL      zfs:(&tq 0xffffff0001583440 [zil_clean]
  171     0     0     0  SL      zfs:(&tq 0xffffff0001583580 [zil_clean]
  170     0     0     0  SL      zfs:(&tq 0xffffff00015836c0 [zil_clean]
  168     0     0     0  SL      zfs:(&tx 0xffffff000146c590 
[txg_thread_enter]
  167     0     0     0  SL      zfs:(&zi 0xffffff000c717d58 
[txg_thread_enter]
  166     0     0     0  SL      zfs:(&tx 0xffffff00014e0a40 
[txg_thread_enter]
  165     0     0     0  SL      vgeom:io 0xffffff000145c410 
[vdev:worker ad0s1d]
  164     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
[spa_zio_intr_5]
  163     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
[spa_zio_intr_5]
  162     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
[spa_zio_issue_5]
  161     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
[spa_zio_issue_5]
  160     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
[spa_zio_intr_4]
  159     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
[spa_zio_intr_4]
  158     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
[spa_zio_issue_4]
  157     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
[spa_zio_issue_4]
  156     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
[spa_zio_intr_3]
  155     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
[spa_zio_intr_3]
  154     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
[spa_zio_issue_3]
  153     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
[spa_zio_issue_3]
  152     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 
[spa_zio_intr_2]
  151     0     0     0  SL      vmwait   0xffffffff80a9a79c 
[spa_zio_intr_2]
  150     0     0     0  SL      vmwait   0xffffffff80a9a79c 
[spa_zio_issue_2]
  149     0     0     0  SL      vmwait   0xffffffff80a9a79c 
[spa_zio_issue_2]
  148     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
[spa_zio_intr_1]
  147     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
[spa_zio_intr_1]
  146     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
[spa_zio_issue_1]
  145     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
[spa_zio_issue_1]
  144     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
[spa_zio_intr_0]
  143     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
[spa_zio_intr_0]
  142     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
[spa_zio_issue_0]
  141     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
[spa_zio_issue_0]
   87     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[1] 
mirror/swa]
   86     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[0] 
mirror/swa]
   53     0     0     0  SL      sdflush  0xffffffff80a99d88 [softdepflush]
   52     0     0     0  SL      vlruwt   0xffffff0001448000 [vnlru]
   51     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 [syncer]
   50     0     0     0  SL      psleep   0xffffffff80a8a55c [bufdaemon]
   49     0     0     0  SL      pgzero   0xffffffff80a9b804 [pagezero]
   48     0     0     0  SL      psleep   0xffffffff80a9ab48 [vmdaemon]
   47     0     0     0  SL      wswbuf0  0xffffffff80a9a004 [pagedaemon]
   46     0     0     0  SL      m:w1     0xffffff0001401200 [g_mirror swap]
   45     0     0     0  SL      m:w1     0xffffff00013c3800 [g_mirror root]
   44     0     0     0  SL      zfs:(&ar 0xffffffff80c746b0 
[arc_reclaim_thread]
   43     0     0     0  SL      waiting_ 0xffffffff80a8dc88 [sctp_iterator]
   42     0     0     0  WL                                  [swi0: sio]
   41     0     0     0  WL                                  [irq1: atkbd0]
   40     0     0     0  WL                                  [irq15: ata1]
   39     0     0     0  WL                                  [irq14: ata0]
   38     0     0     0  SL      usbevt   0xffffff000133a420 [usb5]
   37     0     0     0  SL      usbevt   0xffffffff81065420 [usb4]
   36     0     0     0  SL      usbevt   0xffffffff81063420 [usb3]
   35     0     0     0  SL      usbevt   0xffffff000130c420 [usb2]
   34     0     0     0  WL                                  [irq22: ehci0]
   33     0     0     0  SL      usbevt   0xffffffff81061420 [usb1]
   32     0     0     0  WL                                  [irq21: 
pcm0 uhci1+]
   31     0     0     0  SL      usbtsk   0xffffffff80a71028 [usbtask-dr]
   30     0     0     0  SL      usbtsk   0xffffffff80a71000 [usbtask-hc]
   29     0     0     0  SL      usbevt   0xffffffff8105f420 [usb0]
   28     0     0     0  WL                                  [irq20: 
uhci0 uhci+]
   27     0     0     0  SL      -        0xffffff00012ef880 [em0 taskq]
   26     0     0     0  WL                                  [irq9: acpi0]
   25     0     0     0  SL      -        0xffffff0001294580 [kqueue taskq]
   24     0     0     0  WL                                  [swi6: task 
queue]
   23     0     0     0  RL      CPU 1                       [swi6: 
Giant taskq]
   22     0     0     0  SL      -        0xffffff000122c500 [thread taskq]
   21     0     0     0  WL                                  [swi5: +]
   20     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_2]
   19     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_1]
   18     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_0]
   17     0     0     0  WL                                  [swi2: cambio]
    9     0     0     0  SL      ccb_scan 0xffffffff80a3fda0 [xpt_thrd]
   16     0     0     0  SL      -        0xffffffff80a74ea8 [yarrow]
    8     0     0     0  SL      crypto_r 0xffffffff80d293b0 [crypto 
returns]
    7     0     0     0  SL      crypto_w 0xffffffff80d29350 [crypto]
    6     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
    5     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
    4     0     0     0  SL      -        0xffffffff80a71838 [g_down]
    3     0     0     0  SL      -        0xffffffff80a71830 [g_up]
    2     0     0     0  SL      -        0xffffffff80a71820 [g_event]
   15     0     0     0  WL                                  [swi1: net]
   14     0     0     0  WL                                  [swi3: vm]
   13     0     0     0  LL     *Giant    0xffffff00015a2be0 [swi4: 
clock sio]
   12     0     0     0  RL      CPU 0                       [idle: cpu0]
   11     0     0     0  RL                                  [idle: cpu1]
    1     0     1     0  SLs     wait     0xffffff000112f8d0 [init]
   10     0     0     0  SL      audit_wo 0xffffffff80a99260 [audit]
    0     0     0     0  SLs     vmwait   0xffffffff80a9a79c [swapper]
db> trace 77872
Tracing pid 77872 tid 100157 td 0xffffff000eb9c350
sched_switch() at sched_switch+0x1fe
mi_switch() at mi_switch+0x189
sleepq_wait() at sleepq_wait+0x3b
_cv_wait() at _cv_wait+0xfe
zio_wait() at zio_wait+0x5f
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6
dmu_buf_hold_array() at dmu_buf_hold_array+0x62
dmu_read_uio() at dmu_read_uio+0x3f
zfs_freebsd_read() at zfs_freebsd_read+0x535
vn_read() at vn_read+0x1ca
dofileread() at dofileread+0xa1
kern_readv() at kern_readv+0x4c
read() at read+0x54
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = 
0x7fffffffe2c8, rbp = 0 ---
db> trace 51
Tracing pid 51 tid 100050 td 0xffffff00014246a0
sched_switch() at sched_switch+0x1fe
mi_switch() at mi_switch+0x189
sleepq_wait() at sleepq_wait+0x3b
_sx_xlock_hard() at _sx_xlock_hard+0x1ee
_sx_xlock() at _sx_xlock+0x4e
vdev_queue_io() at vdev_queue_io+0x74
vdev_geom_io_start() at vdev_geom_io_start+0x4a
vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0
zil_lwb_write_start() at zil_lwb_write_start+0x2f1
zil_commit_writer() at zil_commit_writer+0x1c4
zil_commit() at zil_commit+0xb8
zfs_sync() at zfs_sync+0x9a
sync_fsync() at sync_fsync+0x1ac
sched_sync() at sched_sync+0x63f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 ---

Any ideas about this?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47F0D02B.8060504>