Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Oct 2014 01:59:47 +0200
From:      Mark Martinec <Mark.Martinec+freebsd@ijs.si>
To:        freebsd-stable@freebsd.org
Subject:   zfs pool import hangs on [tx->tx_sync_done_cv]
Message-ID:  <54372173.1010100@ijs.si>

next in thread | raw e-mail | index | archive | help
In short, after upgrading to 10.1-BETA3 or -RC1 I ended up with several
zfs pools that can no longer be imported. The zpool import command
(with no arguments) does show all available pools, but trying to
import one just hangs and the command cannot be aborted, although
the rest of the system is still alive and fine:

# zpool import -f <mypool>

Typing ^T just shows an idle process, waiting on [tx->tx_sync_done_cv]:

load: 0.20  cmd: zpool 939 [tx->tx_sync_done_cv] 5723.65r 0.01u 0.02s 0% 
8220k
load: 0.16  cmd: zpool 939 [tx->tx_sync_done_cv] 5735.73r 0.01u 0.02s 0% 
8220k
load: 0.14  cmd: zpool 939 [tx->tx_sync_done_cv] 5741.83r 0.01u 0.02s 0% 
8220k
load: 0.13  cmd: zpool 939 [tx->tx_sync_done_cv] 5749.16r 0.01u 0.02s 0% 
8220k

ps shows (on a system re-booted to a LiveCD running FreeBSD-10.1-RC1):

   PID    TID COMM             TDNAME           CPU  PRI STATE WCHAN
   939 100632 zpool            -                  5  122 sleep tx->tx_s
UID PID PPID CPU PRI NI    VSZ  RSS MWCHAN   STAT TT     TIME COMMAND
   0 939  801   0  22  0 107732 8236 tx->tx_s D+   v0  0:00.04
   zpool import -f -o cachefile=/tmp/zpool.cache -R /tmp/sys0boot sys0boot

   NWCHAN
   fffff8007b0f2a20

# procstat -kk 939

   PID    TID COMM             TDNAME KSTACK
   939 100632 zpool            -                mi_switch+0xe1 
sleepq_wait+0x3a _cv_wait+0x16d txg_wait_synced+0x85 spa_load+0x1cd1 
spa_load_best+0x6f spa_import+0x1ff zfs_ioc_pool_import+0x137 
zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c 
amd64_syscall+0x351 Xfast_syscall+0xfb


Background story: the system where this happened was being kept
to a fairly recent 10-STABLE. The last upgrade was very close to
a BETA3 release. There are a couple of zfs pools there, one on a
mirrored pair of SSDs mostly holding the OS, one with a mirrored
pair of large spindles, and three more small ones (4 GiB each),
mostly for boot redundancy or testing - these small ones are on
old smallish disks. These disks are different, and attached to
different SATA controllers (LSI and onboard Intel). Pools were
mostly kept up-to-date to the most recent zpool features set
through their lifetime (some starting their life with 9.0, some
with 10.0).

About two weeks ago after a reboot to a 10-STABLE of the day
the small pools became unavailable, but the regular two large
pools were still normal. At first I wasn't giving much attention
to that, as these pools were on oldish disks and nonessential
for normal operation, blaming a potentially crappy hw.

Today I needed to do a reboot (for unrelated reason), and the
machine was no longer able to mount the boot pool.
The first instinct was - disks are malfunctioning - but ...

Booting it to a FreeBSD-10.1-RC1 LiveCD was successful.
smartmon disk test shows no problems.  dd is able to read whole
partititions of each problematic pool. And most importantly,
running a  'zdb -e -cc' on each (non-imported) pool was churning
normally and steadily, producing a stats report at the end
and reported no errors.

As a final proof that disks are fine I sacrificed one of the broken
4 GiB GPT partitions with one of the problematic pools, and
did a fresh 10.1-RC1 install on it from a distribution ISO DVD.
The installation went fine and the system does boot and run
fine from the newly installed OS. Trying to import one of the
remaining old pools hangs the import command as before.

As a final proof, I copied (with dd) one of the broken 4 GiB
partitions to a file on another system (running 10.1-BETA3,
which did not suffer from this problem), made a memory disk
out of this file, then run zfs import on this pool - and it hangs
there too! So hardware was not a problem - either these partitions
are truly broken (even though zdb -cc says they are fine),
or the new OS is somehow no longer able to import them.

Please advise.

I have a copy of the 4 GiB partition on a 400 MB compressed
file available, if somebody would be willing to play with it.

Also have a ktrace of the 'zpool import' command. It's last
actions before it hangs are:

    939 zpool    RET   madvise 0
    939 zpool    CALL  madvise(0x80604e000,0x1000,MADV_FREE)
    939 zpool    RET   madvise 0
    939 zpool    CALL  close(0x6)
    939 zpool    RET   close 0
    939 zpool    CALL  ioctl(0x3,0xc0185a05,0x7fffffffbf00)
    939 zpool    RET   ioctl -1 errno 2 No such file or directory
    939 zpool    CALL  madvise(0x802c71000,0x10000,MADV_FREE)
    939 zpool    RET   madvise 0
    939 zpool    CALL  madvise(0x802ca5000,0x1000,MADV_FREE)
    939 zpool    RET   madvise 0
    939 zpool    CALL  ioctl(0x3,0xc0185a06,0x7fffffffbf60)
    939 zpool    RET   ioctl 0
    939 zpool    CALL  ioctl(0x3,0xc0185a06,0x7fffffffbf60)
    939 zpool    RET   ioctl 0
    939 zpool    CALL  stat(0x802c380e0,0x7fffffffbc58)
    939 zpool    NAMI  "/tmp"
    939 zpool    STRU  struct stat {dev=273, ino=2, mode=041777, 
nlink=8, uid=0, gid=0, rdev=96, atime=1412866648, stime=1412871393, 
ctime=1412871393, birthtime=1412866648, size=512, blksize=32768, 
blocks=8, flags=0x0 }
    939 zpool    RET   stat 0
    939 zpool    CALL  ioctl(0x3,0xc0185a02,0x7fffffffbc60)


Mark




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54372173.1010100>