From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 17:51:59 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 24298781
 for <freebsd-fs@FreeBSD.org>; Tue,  8 Jan 2013 17:51:59 +0000 (UTC)
 (envelope-from nicolas@i.0x5.de)
Received: from n.0x5.de (n.0x5.de [217.197.85.144])
 by mx1.freebsd.org (Postfix) with ESMTP id DB510FF1
 for <freebsd-fs@FreeBSD.org>; Tue,  8 Jan 2013 17:51:58 +0000 (UTC)
Received: by pc5.i.0x5.de (Postfix, from userid 1003)
 id 3Ygglj6YxDz7ySH; Tue,  8 Jan 2013 18:42:25 +0100 (CET)
Date: Tue, 8 Jan 2013 18:42:25 +0100
From: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
To: freebsd-fs@FreeBSD.org
Subject: slowdown of zfs (tx->tx)
Message-ID: <20130108174225.GA17260@mid.pc5.i.0x5.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Powered-by: FreeBSD
X-Homepage: http://www.rachinsky.de
X-PGP-Keyid: 887BAE72
X-PGP-Fingerprint: 039E 9433 115F BC5F F88D  4524 5092 45C4 887B AE72
X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2013 17:51:59 -0000

Hallo,

we have a problem that started to begin on one of our backup servers
recently.

We noticed that backups took an absurd amount of time (we aborted them
after several hours when the same backup usually takes minutes). We
first considered a disk broken and kicked it. But that didn't change
anything.

About one third of the rsync invocations end in a state where top
shows mostly tx->tx as the state. It seems that other rsync instances
that run at the same time or are started while one rsync is in this
state do also get into this state.

These rsyncs can be killed, but it takes a while (several seconds
or tens of seconds).

Repeating the same rsync invocation afterwards works (sometimes).

There is almost no disk activity during this time.

What can I do to debug or avoid this?


Some information:

The backups are taken with rsync (and --fake-super, but a patched
version that does not read extended attributes, since it seems
writing them all the time is faster than reading them).

sync is disabled for the whole pool.

root uses UFS and is on another set of disks (together with swap).


zpool status
  pool: pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: scrub canceled on Fri Jan  4 10:31:35 2013
config:

        NAME                      STATE     READ WRITE CKSUM
        pool1                     DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            ada5                  ONLINE       0     0     0
            ada8                  ONLINE       0     0     0
            ada2                  ONLINE       0     0     0
            ada3                  ONLINE       0     0     0
            11846390416703086268  UNAVAIL      0     0     0  was /dev/dsk/ada1
            ada6                  ONLINE       0     0     0
            ada0                  ONLINE       0     0     1
            ada7                  ONLINE       0     0     0
            ada4                  ONLINE       0     0     3

errors: No known data errors


8.3-RELEASE-p5
with
http://svnweb.freebsd.org/base?view=revision&revision=240345
and
http://svnweb.freebsd.org/base?view=revision&revision=240632
applied

amd64 with 8G of RAM


Thanks in advance

Nicolas