From owner-freebsd-fs@FreeBSD.ORG Tue Jan 8 17:51:59 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 24298781 for ; Tue, 8 Jan 2013 17:51:59 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id DB510FF1 for ; Tue, 8 Jan 2013 17:51:58 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3Ygglj6YxDz7ySH; Tue, 8 Jan 2013 18:42:25 +0100 (CET) Date: Tue, 8 Jan 2013 18:42:25 +0100 From: Nicolas Rachinsky To: freebsd-fs@FreeBSD.org Subject: slowdown of zfs (tx->tx) Message-ID: <20130108174225.GA17260@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2013 17:51:59 -0000 Hallo, we have a problem that started to begin on one of our backup servers recently. We noticed that backups took an absurd amount of time (we aborted them after several hours when the same backup usually takes minutes). We first considered a disk broken and kicked it. But that didn't change anything. About one third of the rsync invocations end in a state where top shows mostly tx->tx as the state. It seems that other rsync instances that run at the same time or are started while one rsync is in this state do also get into this state. These rsyncs can be killed, but it takes a while (several seconds or tens of seconds). Repeating the same rsync invocation afterwards works (sometimes). There is almost no disk activity during this time. What can I do to debug or avoid this? Some information: The backups are taken with rsync (and --fake-super, but a patched version that does not read extended attributes, since it seems writing them all the time is faster than reading them). sync is disabled for the whole pool. root uses UFS and is on another set of disks (together with swap). zpool status pool: pool1 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: scrub canceled on Fri Jan 4 10:31:35 2013 config: NAME STATE READ WRITE CKSUM pool1 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 ada5 ONLINE 0 0 0 ada8 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 11846390416703086268 UNAVAIL 0 0 0 was /dev/dsk/ada1 ada6 ONLINE 0 0 0 ada0 ONLINE 0 0 1 ada7 ONLINE 0 0 0 ada4 ONLINE 0 0 3 errors: No known data errors 8.3-RELEASE-p5 with http://svnweb.freebsd.org/base?view=revision&revision=240345 and http://svnweb.freebsd.org/base?view=revision&revision=240632 applied amd64 with 8G of RAM Thanks in advance Nicolas