Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Jul 2015 11:30:43 +0200
From:      Gergely Czuczy <gergely.czuczy@harmless.hu>
To:        freebsd-fs@freebsd.org
Subject:   Crashed ZFS pool
Message-ID:  <559CEDC3.2040107@harmless.hu>

next in thread | raw e-mail | index | archive | help
Hello,

We have a crashed ZFS pool. Initially the system was running 8, which 
we've upgraded to 9, then to 10-STABLE yesterday. Upon importing the 
pool the system crashes with a panic.

The pool used to have a file-backed zil device under /usr/zfslog, 
however the file size was 0 when this happened, and it used to be 
bigger. We've set vfs.zfs.recover=1 in /boot/loader.conf, and trying to 
import it with:
#  zpool import -fm tank
But it crashes the system
We've tried removing /boot/zfs/zpool.cache as well (renamed it 
actually), but it resulted in the same panic.

# uname -a
FreeBSD $x 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #0: Tue Jul  7 
20:30:27 CEST 2015     toor@$x:/usr/obj/usr/src/sys/REFLECTION amd64

When running zdb -AAAFXve tank it dumps some info, then gets stuck. zdb 
output can be found here:
http://czg.harmless.hu/zfscrash/tank.zdb-AAAFXve.script

The suspicious part is:
Assertion failed: zap_lookup(ddt->ddt_os, 
ddt->ddt_spa->spa_ddt_stat_object, name, sizeof (uint64_t), sizeof 
(ddt_histogram_t) / sizeof (uint64_t), &ddt->ddt_histogram[type][class]) 
== 0 (0x6 == 0x0), file 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, 
line 127.
Assertion failed: (ddt_object_info(ddt, type, class, &doi) == 0), file 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, 
line 132.

zdb seems to be stuck in the following state:
  21697 zdb      RET   read 8
  21697 zdb      CALL 
_umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL 
_umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL  read(0x4,0x7fffd7fbdf50,0x8)
  21697 zdb      GIO   fd 4 read 8 bytes
        0x0000 02be fe70 08a4 2335 |...p..#5|

  21697 zdb      RET   read 8
  21697 zdb      CALL 
_umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL 
_umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL  read(0x4,0x7fffd7fbdf50,0x8)
  21697 zdb      GIO   fd 4 read 8 bytes
        0x0000 459a ca93 c54b 9922 |E....K."|

  21697 zdb      RET   read 8
  21697 zdb      CALL 
_umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70)
  21697 zdb      RET   _umtx_op -1 errno 60 Operation timed out
  21697 zdb      CALL 
_umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80)


However I wasn't able to find what's FD 4.

There were no disk read errors in dmesg/messages, so i'm not sure what 
would be timing out.

And here's a screenshot of the crash:
http://czg.harmless.hu/zfscrash/zfspanic.jpg

So, anyone has any idea what to do with it? It would be nice to get it 
back to a functional state. Or at least to a state where the data can be 
accessed.

Thanks in advance,
-czg




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?559CEDC3.2040107>