From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 14 07:55:15 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 92F6C112
 for <freebsd-fs@freebsd.org>; Sun, 14 Jul 2013 07:55:15 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-vc0-x231.google.com (mail-vc0-x231.google.com
 [IPv6:2607:f8b0:400c:c03::231])
 by mx1.freebsd.org (Postfix) with ESMTP id 5ACBB8A4
 for <freebsd-fs@freebsd.org>; Sun, 14 Jul 2013 07:55:15 +0000 (UTC)
Received: by mail-vc0-f177.google.com with SMTP id hv10so8526306vcb.22
 for <freebsd-fs@freebsd.org>; Sun, 14 Jul 2013 00:55:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=du1WtNY51RXPEOaCCtF1o5QrMpg3h3hi0MF/r3rLfxY=;
 b=0YFtDsoq/PJfNm6CwLPoz5QUGH5VdV2c2l6//hFa3vpjxBcQUYT/1IAKbsx8Zq+VpQ
 m6DOtW8HUhRiaD6ocOkT2lVSbnNCWOweNV8vt+O6UTy9qyYM5QsQZFFTjLvbrlfbdaGx
 w20kZlmv+epZsiUk62BR73mOXLM9f7eJJMrlf1Wxikg/0psRBnM1ML2gJnNC178SCO3K
 FWJxh0leYIprnKXZQ0hNNS7XpjTDzcXDiAQNEddG2oO+T8IdjI/iO1LSiBPztXBVRZCE
 /zmTaLz4oSfthhhie0nY+vnwiQVnraixhpl6DTfVvijG92vxXyjNLx/xa6SfnB9BGTzD
 4ZrQ==
MIME-Version: 1.0
X-Received: by 10.220.168.141 with SMTP id u13mr26953401vcy.23.1373788514638; 
 Sun, 14 Jul 2013 00:55:14 -0700 (PDT)
Received: by 10.221.22.199 with HTTP; Sun, 14 Jul 2013 00:55:14 -0700 (PDT)
Date: Sun, 14 Jul 2013 03:55:14 -0400
Message-ID: <CACpH0MfRr_SzjXbTSs72NJdcDzOp+wyzgi5ipidjDVy+oA2Hng@mail.gmail.com>
Subject: Efficiency of ZFS ZVOLs.
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: freebsd-fs <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Jul 2013 07:55:15 -0000

I have a ZFS pool that consists of 9 1.5T drives (Z1) and 8 2T drives
(Z1).  I know this is not exactly recommended, but this is more a home
machine that provides some backup space rather than a production machine
--- and thus it gets what it gets.

Anyways... a typical filesystem looks like:

[1:7:307]root@virtual:~> zfs list vr2/tmp
NAME      USED  AVAIL  REFER  MOUNTPOINT
vr2/tmp  74.3G  7.31T  74.3G  /vr2/tmp

... that is "tmp" uses 74.3G and the whole mess has 7.31T available.  If
tmp had children, "USED" could be larger than "REFER" because the children
account for the rest

Now... consider:

[1:3:303]root@virtual:~> zfs list -rt all vr2/Steam
NAME                      USED  AVAIL  REFER  MOUNTPOINT
vr2/Steam                3.25T  9.27T  1.18T  -
vr2/Steam@20130528-0029   255M      -  1.18T  -
vr2/Steam@20130529-0221   172M      -  1.18T  -

vr2/Steam is a ZVOL exported by iSCSI to my desktop and it contains an NTFS
filesystem which is mounted into C:\Program Files (x86)\Steam.  Windows
sees this drive as a 1.99T drive of which 1.02T is used.

Now... the value of "REFER" seems quite right: 1.18T vs. 1.02T is pretty
good... but the value of "USED" seems _way_ out.  3.25T ... even regarding
that more of the disk might have been "touched" (ie: used from the ZVOL's
impression) than is used, it seems too large.  Neither is it 1.18T + 255M +
172M.

Now... I understand that the smallest effective "block" is 7x512 or 8x512
(depending on which part of the disk is in play) --- but does that really
account for it?  A quick google check says that NTFS uses a default cluster
of 4096 (or larger).  Is there a fundamental inefficiency in the way ZVOLs
are stored on wide (or wide-ish) RAID stripes?

From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 14 08:50:20 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 958D76E3
 for <freebsd-fs@freebsd.org>; Sun, 14 Jul 2013 08:50:20 +0000 (UTC)
 (envelope-from nowakpl@platinum.linux.pl)
Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4])
 by mx1.freebsd.org (Postfix) with ESMTP id 5B2799BF
 for <freebsd-fs@freebsd.org>; Sun, 14 Jul 2013 08:50:18 +0000 (UTC)
Received: by platinum.linux.pl (Postfix, from userid 87)
 id 41DEA2BC19D; Sun, 14 Jul 2013 10:50:11 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=3.0 tests=ALL_TRUSTED,AWL
 autolearn=disabled version=3.3.2
Received: from [10.255.0.2] (unknown [83.151.38.73])
 by platinum.linux.pl (Postfix) with ESMTPA id 0F1802BC196
 for <freebsd-fs@freebsd.org>; Sun, 14 Jul 2013 10:50:11 +0200 (CEST)
Message-ID: <51E26632.8030907@platinum.linux.pl>
Date: Sun, 14 Jul 2013 10:49:54 +0200
From: Adam Nowacki <nowakpl@platinum.linux.pl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: Efficiency of ZFS ZVOLs.
References: <CACpH0MfRr_SzjXbTSs72NJdcDzOp+wyzgi5ipidjDVy+oA2Hng@mail.gmail.com>
In-Reply-To: <CACpH0MfRr_SzjXbTSs72NJdcDzOp+wyzgi5ipidjDVy+oA2Hng@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Jul 2013 08:50:20 -0000

On 2013-07-14 09:55, Zaphod Beeblebrox wrote:
> [1:3:303]root@virtual:~> zfs list -rt all vr2/Steam
> NAME                      USED  AVAIL  REFER  MOUNTPOINT
> vr2/Steam                3.25T  9.27T  1.18T  -
> vr2/Steam@20130528-0029   255M      -  1.18T  -
> vr2/Steam@20130529-0221   172M      -  1.18T  -
>
> vr2/Steam is a ZVOL exported by iSCSI to my desktop and it contains an NTFS
> filesystem which is mounted into C:\Program Files (x86)\Steam.  Windows
> sees this drive as a 1.99T drive of which 1.02T is used.
>
> Now... the value of "REFER" seems quite right: 1.18T vs. 1.02T is pretty
> good... but the value of "USED" seems _way_ out.  3.25T ... even regarding
> that more of the disk might have been "touched" (ie: used from the ZVOL's
> impression) than is used, it seems too large.  Neither is it 1.18T + 255M +
> 172M.

This is how much space would be required to store the snapshots plus 2TB 
volume with no shared blocks between any of the snapshots. 1.18T from 
snapshots + 2T reservation = 3.18T, just about the 3.25T displayed. You 
can remove the reservation with 'zfs set refreservation=none vr2/Steam'.


From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 15 09:51:11 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id EA5EB70;
 Mon, 15 Jul 2013 09:51:11 +0000 (UTC)
 (envelope-from godders@gmail.com)
Received: from mail-qc0-x22d.google.com (mail-qc0-x22d.google.com
 [IPv6:2607:f8b0:400d:c01::22d])
 by mx1.freebsd.org (Postfix) with ESMTP id A2C1BC24;
 Mon, 15 Jul 2013 09:51:11 +0000 (UTC)
Received: by mail-qc0-f173.google.com with SMTP id l10so6170420qcy.18
 for <multiple recipients>; Mon, 15 Jul 2013 02:51:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=bqPWOLlgO2DSqvvPF54slTA6ermS9eEi494GtoekWZk=;
 b=TXKAifsNQUfyW+hlWGlP0+FtnICKhuLSK4B28smJzaQOddkP0gf1Iy3EmBu1qKmXn7
 xaKUgG50YbHv8Vx8Fd2c5AlL/srrlVBBGrPyy74NE9UYjI7wo6JIEl9FotSVIKimo96U
 tbwMXK5OecU8MqxPs+uiZ14J9y21CXUk6vmIZIX/j2ybZi67TSK22r4a50ARQbbPo5uS
 VNIDaUadTBJgwMbOz4m4obnVuDccs7irErHt3VOp2wmJUbBug8scRm5Qw0r9CdWv1Cvz
 qHMELBusYb4x/+AC0cF5ZkQ4EGz3YMMT3OUjzhNHT7hOZNferAtOS7S7dpWehROzKWvH
 7p3w==
MIME-Version: 1.0
X-Received: by 10.49.85.4 with SMTP id d4mr50212452qez.10.1373881870442; Mon,
 15 Jul 2013 02:51:10 -0700 (PDT)
Received: by 10.49.52.65 with HTTP; Mon, 15 Jul 2013 02:51:10 -0700 (PDT)
In-Reply-To: <201306110017.r5B0HFct074482@chez.mckusick.com>
References: <51B5A277.2060904@FreeBSD.org>
 <201306110017.r5B0HFct074482@chez.mckusick.com>
Date: Mon, 15 Jul 2013 10:51:10 +0100
Message-ID: <CAG8duQ14p3s-9hK_bqjjYsb_4_yG8UsHdmXz-TyRe2qX5O5Kpg@mail.gmail.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
From: Dan Thomas <godders@gmail.com>
To: Kirk McKusick <mckusick@mckusick.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs@freebsd.org, Palle Girgensohn <girgen@freebsd.org>,
 Jeff Roberson <jroberson@jroberson.net>, Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2013 09:51:12 -0000

On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com> wrote:
> OK, good to have it narrowed down. I will look to devise some
> additional diagnostics that hopefully will help tease out the
> bug. I'll hopefully get back to you soon.

Hi,

Is there any news on this issue? We're still running several servers
that are exhibiting this problem (most recently, one that seems to be
leaking around 10gb/hour), and it's getting to the point where we're
looking at moving to a different OS until it's resolved.

We have access to several production systems with this problem and (at
least from time to time) will have systems with a significant leak on
them that we can experiment with. Is there any way we can assist with
tracking this down? Any diagnostics or testing that would be useful?

Thanks,

Dan

From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 15 10:44:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 83230AA8
 for <freebsd-fs@freebsd.org>; Mon, 15 Jul 2013 10:44:33 +0000 (UTC)
 (envelope-from thomas@gibfest.dk)
Received: from mail.tyknet.dk (mail.tyknet.dk [176.9.9.186])
 by mx1.freebsd.org (Postfix) with ESMTP id 45C6EE71
 for <freebsd-fs@freebsd.org>; Mon, 15 Jul 2013 10:44:33 +0000 (UTC)
Received: from [10.10.1.214] (217.71.4.82.static.router4.bolignet.dk
 [217.71.4.82])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.tyknet.dk (Postfix) with ESMTPSA id 60B0B15FA0F
 for <freebsd-fs@freebsd.org>; Mon, 15 Jul 2013 12:36:59 +0200 (CEST)
X-DKIM: OpenDKIM Filter v2.5.2 mail.tyknet.dk 60B0B15FA0F
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default;
 t=1373884619; bh=jd3CNUoCqlon3YHs4YejpnVqhZlmmBrQMGF0O04lUUI=;
 h=Date:From:To:Subject:References:In-Reply-To;
 b=Hkelfq0gPNbNKFCbY99cOT570mEqswhubpijN8wn2VjFlCnnUp7yaEC9Cp70roQMn
 zF9Oeid6CKIPa9Orz8AKuVlWywlbdjcyKclUGS1CYAIFhdoXpz9e7kQiqssZRJ8qQU
 on/0A7rR3buMMayTRTyCiZ81LyWaeEPd1iofyeG8=
Message-ID: <51E3D0C3.9020205@gibfest.dk>
Date: Mon, 15 Jul 2013 12:36:51 +0200
From: Thomas Steen Rasmussen <thomas@gibfest.dk>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: Reproducible ZFS jailed dataset panic after upgrading to latest
 9-stable
References: <51C97EAF.3000901@gibfest.dk>
In-Reply-To: <51C97EAF.3000901@gibfest.dk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2013 10:44:33 -0000

On 25-06-2013 13:27, Thomas Steen Rasmussen wrote:
> Hello,
>
> To fix the mmap vulnerability I've upgraded one of my jail hosts from:
> "FreeBSD 9.1-STABLE #1: Sun Mar 17 08:48:35 UTC 2013"
> to:
> "FreeBSD 9.1-STABLE #3: Tue Jun 18 12:49:39 UTC 2013"
>
> One of the jails on this machine has a jailed zfs dataset:
>
> $ zfs get jailed gelipool/backups
> NAME              PROPERTY  VALUE   SOURCE
> gelipool/backups  jailed    on      local
> $
>
> After the upgrade, when I start the jail, the machine panics.
>
> This is a remote zfs-only machine with swap on zfs, so far I have
> been unable to get a proper coredump. I have access to the
> console of the machine, and I have taken a couple of screenshots:
>
> http://imgur.com/2V0PBlf and http://imgur.com/OopP9Sp
>
> Any ideas what might have caused this ? It worked great before the
> upgrade to latest 9-STABLE. This is a production server, but I am
> willing to try any suggestions to get it working again.
>

Hello all,

I just wanted to confirm that since the MFC in r252524 this has
been fixed in stable/9:
http://svnweb.freebsd.org/base?view=revision&revision=252524

Thanks!

Thomas Steen Rasmussen


From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 15 11:06:42 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D7A16F41
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Jul 2013 11:06:42 +0000 (UTC)
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id CAD8DFBB
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Jul 2013 11:06:42 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r6FB6gkf084410
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Jul 2013 11:06:42 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r6FB6gfT084408
 for freebsd-fs@FreeBSD.org; Mon, 15 Jul 2013 11:06:42 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 15 Jul 2013 11:06:42 GMT
Message-Id: <201307151106.r6FB6gfT084408@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-fs@FreeBSD.org
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2013 11:06:43 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/180438  fs         [smbfs] [patch] mount_smbfs fails on arm because of wr
p kern/180236  fs         [zfs] [nullfs] Leakage free space using ZFS with nullf
o kern/178854  fs         [ufs] FreeBSD kernel crash in UFS
o kern/178713  fs         [nfs] [patch] Correct WebNFS support in NFS server and
o kern/178412  fs         [smbfs] Coredump when smbfs mounted
o kern/178388  fs         [zfs] [patch] allow up to 8MB recordsize
o kern/178349  fs         [zfs] zfs scrub on deduped data could be much less see
o kern/178329  fs         [zfs] extended attributes leak
o kern/178238  fs         [nullfs] nullfs don't release i-nodes on unlink.
f kern/178231  fs         [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr
o kern/178103  fs         [kernel] [nfs] [patch] Correct support of index files 
o kern/177985  fs         [zfs] disk usage problem when copying from one zfs dat
o kern/177971  fs         [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3,
o kern/177966  fs         [zfs] resilver completes but subsequent scrub reports 
o kern/177658  fs         [ufs] FreeBSD panics after get full filesystem with uf
o kern/177536  fs         [zfs] zfs livelock (deadlock) with high write-to-disk 
o kern/177445  fs         [hast] HAST panic
o kern/177240  fs         [zfs] zpool import failed with state UNAVAIL but all d
o kern/176978  fs         [zfs] [panic] zfs send -D causes "panic: System call i
o kern/176857  fs         [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic 
o bin/176253   fs         zpool(8): zfs pool indentation is misleading/wrong
o kern/176141  fs         [zfs] sharesmb=on makes errors for sharenfs, and still
o kern/175950  fs         [zfs] Possible deadlock in zfs after long uptime
o kern/175897  fs         [zfs] operations on readonly zpool hang
o kern/175449  fs         [unionfs] unionfs and devfs misbehaviour
o kern/175179  fs         [zfs] ZFS may attach wrong device on move
o kern/175071  fs         [ufs] [panic] softdep_deallocate_dependencies: unrecov
o kern/174372  fs         [zfs] Pagefault appears to be related to ZFS
o kern/174315  fs         [zfs] chflags uchg not supported
o kern/174310  fs         [zfs] root point mounting broken on CURRENT with multi
o kern/174279  fs         [ufs] UFS2-SU+J journal and filesystem corruption
o kern/173830  fs         [zfs] Brain-dead simple change to ZFS error descriptio
o kern/173718  fs         [zfs] phantom directory in zraid2 pool
f kern/173657  fs         [nfs] strange UID map with nfsuserd
o kern/173363  fs         [zfs] [panic] Panic on 'zpool replace' on readonly poo
o kern/173136  fs         [unionfs] mounting above the NFS read-only share panic
o kern/172942  fs         [smbfs] Unmounting a smb mount when the server became 
o kern/172348  fs         [unionfs] umount -f of filesystem in use with readonly
o kern/172334  fs         [unionfs] unionfs permits recursive union mounts; caus
o kern/171626  fs         [tmpfs] tmpfs should be noisier when the requested siz
o kern/171415  fs         [zfs] zfs recv fails with "cannot receive incremental 
o kern/170945  fs         [gpt] disk layout not portable between direct connect 
o bin/170778   fs         [zfs] [panic] FreeBSD panics randomly
o kern/170680  fs         [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA
o kern/170497  fs         [xfs][panic] kernel will panic whenever I ls a mounted
o kern/169945  fs         [zfs] [panic] Kernel panic while importing zpool (afte
o kern/169480  fs         [zfs] ZFS stalls on heavy I/O
o kern/169398  fs         [zfs] Can't remove file with permanent error
o kern/169339  fs         panic while " : > /etc/123"
o kern/169319  fs         [zfs] zfs resilver can't complete
o kern/168947  fs         [nfs] [zfs] .zfs/snapshot directory is messed up when 
o kern/168942  fs         [nfs] [hang] nfsd hangs after being restarted (not -HU
o kern/168158  fs         [zfs] incorrect parsing of sharenfs options in zfs (fs
o kern/167979  fs         [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste
o kern/167977  fs         [smbfs] mount_smbfs results are differ when utf-8 or U
o kern/167688  fs         [fusefs] Incorrect signal handling with direct_io
o kern/167685  fs         [zfs] ZFS on USB drive prevents shutdown / reboot
o kern/167612  fs         [portalfs] The portal file system gets stuck inside po
o kern/167272  fs         [zfs] ZFS Disks reordering causes ZFS to pick the wron
o kern/167260  fs         [msdosfs] msdosfs disk was mounted the second time whe
o kern/167109  fs         [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene
o kern/167105  fs         [nfs] mount_nfs can not handle source exports wiht mor
o kern/167067  fs         [zfs] [panic] ZFS panics the server
o kern/167065  fs         [zfs] boot fails when a spare is the boot disk
o kern/167048  fs         [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF
o kern/166912  fs         [ufs] [panic] Panic after converting Softupdates to jo
o kern/166851  fs         [zfs] [hang] Copying directory from the mounted UFS di
o kern/166477  fs         [nfs] NFS data corruption.
o kern/165950  fs         [ffs] SU+J and fsck problem
o kern/165521  fs         [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31
o kern/165392  fs         Multiple mkdir/rmdir fails with errno 31
o kern/165087  fs         [unionfs] lock violation in unionfs
o kern/164472  fs         [ufs] fsck -B panics on particular data inconsistency
o kern/164370  fs         [zfs] zfs destroy for snapshot fails on i386 and sparc
o kern/164261  fs         [nullfs] [patch] fix panic with NFS served from NULLFS
o kern/164256  fs         [zfs] device entry for volume is not created after zfs
o kern/164184  fs         [ufs] [panic] Kernel panic with ufs_makeinode
o kern/163801  fs         [md] [request] allow mfsBSD legacy installed in 'swap'
o kern/163770  fs         [zfs] [hang] LOR between zfs&syncer + vnlru leading to
o kern/163501  fs         [nfs] NFS exporting a dir and a subdir in that dir to 
o kern/162944  fs         [coda] Coda file system module looks broken in 9.0
o kern/162860  fs         [zfs] Cannot share ZFS filesystem to hosts with a hyph
o kern/162751  fs         [zfs] [panic] kernel panics during file operations
o kern/162591  fs         [nullfs] cross-filesystem nullfs does not work as expe
o kern/162519  fs         [zfs] "zpool import" relies on buggy realpath() behavi
o kern/161968  fs         [zfs] [hang] renaming snapshot with -r including a zvo
o kern/161864  fs         [ufs] removing journaling from UFS partition fails on 
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
f kern/160860  fs         [ufs] Random UFS root filesystem corruption with SU+J 
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         amd(8) ICMP storm and unkillable process.
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
p kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o bin/153142   fs         [zfs] ls -l outputs `ls: ./.zfs: Operation not support
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
o kern/145750  fs         [unionfs] [hang] unionfs locks the machine
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
f bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141950  fs         [unionfs] [lor] ufs/unionfs/ufs Lock order reversal
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/137588  fs         [unionfs] [lor] LOR nfs/ufs/nfs
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
p kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126973  fs         [unionfs] [hang] System hang with unionfs and init chr
o kern/126553  fs         [unionfs] unionfs move directory problem 2 (files appe
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o bin/123574   fs         [unionfs] df(1) -t option destroys info for unionfs (a
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o kern/121385  fs         [unionfs] unionfs cross mount -> kernel panic
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o kern/118318  fs         [nfs] NFS server hangs under special circumstances
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/67326   fs         [msdosfs] crash after attempt to mount write protected
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t
o kern/9619    fs         [nfs] Restarting mountd kills existing mounts

326 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 15 19:32:34 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 33361F8D;
 Mon, 15 Jul 2013 19:32:34 +0000 (UTC)
 (envelope-from mckusick@mckusick.com)
Received: from chez.mckusick.com (chez.mckusick.com
 [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452])
 by mx1.freebsd.org (Postfix) with ESMTP id 0B13F303;
 Mon, 15 Jul 2013 19:32:33 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
 by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r6FJWSxM087108;
 Mon, 15 Jul 2013 12:32:28 -0700 (PDT)
 (envelope-from mckusick@chez.mckusick.com)
Message-Id: <201307151932.r6FJWSxM087108@chez.mckusick.com>
To: Dan Thomas <godders@gmail.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) 
In-reply-to: <CAG8duQ14p3s-9hK_bqjjYsb_4_yG8UsHdmXz-TyRe2qX5O5Kpg@mail.gmail.com>
Date: Mon, 15 Jul 2013 12:32:28 -0700
From: Kirk McKusick <mckusick@mckusick.com>
Cc: freebsd-fs@freebsd.org, Palle Girgensohn <girgen@freebsd.org>,
 Jeff Roberson <jroberson@jroberson.net>, Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2013 19:32:34 -0000

> Date: Mon, 15 Jul 2013 10:51:10 +0100
> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
> From: Dan Thomas <godders@gmail.com>
> To: Kirk McKusick <mckusick@mckusick.com>
> Cc: Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org,
>         Jeff Roberson <jroberson@jroberson.net>,
>         Julian Akehurst <julian@pingpong.se>
> X-ASK-Info: Message Queued (2013/07/15 02:51:22)
> X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
> 
> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com> wrote:
> > OK, good to have it narrowed down. I will look to devise some
> > additional diagnostics that hopefully will help tease out the
> > bug. I'll hopefully get back to you soon.
> 
> Hi,
> 
> Is there any news on this issue? We're still running several servers
> that are exhibiting this problem (most recently, one that seems to be
> leaking around 10gb/hour), and it's getting to the point where we're
> looking at moving to a different OS until it's resolved.
> 
> We have access to several production systems with this problem and (at
> least from time to time) will have systems with a significant leak on
> them that we can experiment with. Is there any way we can assist with
> tracking this down? Any diagnostics or testing that would be useful?
> 
> Thanks,
> Dan

Hi Dan (and Palle),

Sorry for the long delay with no help / news. I have gotten
side-tracked on several projects and have had little time to try
and devise some tests that would help find the cause of the lost
space. It almost certainly is a one-line fix (a missing vput or
vrele probably in some error path), but finding where it goes is
the hard part :-)

I have had little success in inserting code that tracks reference
counts (too many false positives). So, I am going to need some help
from you to narrow it down. My belief is that there is some set of
filesystem operations (system calls) that are leading to the problem.
Notably, a file is being created, data put into it, then the file
is deleted (either before or after being closed).  Somehow a reference
to that file is persisting despite there being no valid reference
to it. Hence the filesystem thinks it is still live and is not
deleting it. When you do the forcible unmount, these files get
cleared and the space shows back up.

What I need to devise is a small test program doing the set of
system calls that cause this to happen. The way that I would like
to try and get it is to have you `ktrace -i' your application and
then run your application just long enough to create at least one
of these lost files. The goal is to minimize the amount of ktrace
data through which we need to sift.

In preparation for doing this test you need to have a kernel
compiled with `option DIAGNOSTIC' or if you prefer, just add
`#define DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will
know you have at least one offending file when you try to unmount the
affected filesystem and find it busy. Before doing the `umount -f',
enable busy printing using `sysctl debug.busyprt=1'. Then capture
the console output which will show the details of all the vnodes
that had to be forcibly flushed. Hopefully we will then be able to
correlate them back to the files (NAMI in the ktrace output) with
which they were associated. We may need to augment the NAMI data
with the inode number of the associated file to make the association
with the busyprt output. Anyway, once we have that, we can look at
all the system calls done on those files and create a small test
program that exhibits the problem. Given a small test program, Jeff
or I can track down the offending system call path and nail this
pernicious bug once and for all.

	Kirk McKusick

From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 15 21:59:53 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 52805F8
 for <freebsd-fs@freebsd.org>; Mon, 15 Jul 2013 21:59:53 +0000 (UTC)
 (envelope-from rmh.aybabtu@gmail.com)
Received: from mail-qa0-x22d.google.com (mail-qa0-x22d.google.com
 [IPv6:2607:f8b0:400d:c00::22d])
 by mx1.freebsd.org (Postfix) with ESMTP id 9AB30D5B
 for <freebsd-fs@freebsd.org>; Mon, 15 Jul 2013 21:59:52 +0000 (UTC)
Received: by mail-qa0-f45.google.com with SMTP id ci6so1899976qab.4
 for <freebsd-fs@freebsd.org>; Mon, 15 Jul 2013 14:59:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=rQHm/TzTV4kzSn6ocp6i6epGPTb1pK1fOz+JbMzYE2w=;
 b=C88GYWK6XZaSVk4dvb1vJ/K96A/ThJebD5KQIW02e9jDGSWs5KboC5syor8vPad3EF
 XTS6KGSgAoWZszca59MpfhQUW46bQvgIFd8L57mpRlg0+F3QMbSrY/BWV3xgA73oNlhr
 jGZSvffAwTX14i4eILSp10hIlvqRXQ/TlGnc0AiPRH+i2axrWIvInBVvKfyFODwwGPHx
 E216VbTkrREf3NVtI50PgAJr1HhY41cTxsZ4y/E+IYq4ppzsaagX1QPn+gb/b1aVtFr+
 uAm6A2nK4pD2BbPZspRn8xcSjo8gTuogFiWxu+UOKa7uO0XLhDMJx0NDIf2RwNLsszMp
 6TVw==
MIME-Version: 1.0
X-Received: by 10.49.24.52 with SMTP id r20mr52491305qef.54.1373925592152;
 Mon, 15 Jul 2013 14:59:52 -0700 (PDT)
Sender: rmh.aybabtu@gmail.com
Received: by 10.49.26.193 with HTTP; Mon, 15 Jul 2013 14:59:52 -0700 (PDT)
In-Reply-To: <201307132225.r6DMPP7p002100@chez.mckusick.com>
References: <CAOfDtXM958BXberg=N-Pt4H9Z3AF+A3MV02sMWFkCSTQXqi+nw@mail.gmail.com>
 <201307132225.r6DMPP7p002100@chez.mckusick.com>
Date: Mon, 15 Jul 2013 23:59:52 +0200
X-Google-Sender-Auth: Ume9qtTXxYp2-xTaegFff6uUa5E
Message-ID: <CAOfDtXMG__Q9k-umW=9LZGOdK17+_e+zeZexdtimdy916y7T7A@mail.gmail.com>
Subject: Re: Compatibility options for mount(8)
From: Robert Millan <rmh@freebsd.org>
To: Kirk McKusick <mckusick@mckusick.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2013 21:59:53 -0000

2013/7/14 Kirk McKusick <mckusick@mckusick.com>:
> OK to leave it.

Committed then, thanks everyone :-)

--
Robert Millan

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 11:41:44 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 33946236
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 11:41:44 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 by mx1.freebsd.org (Postfix) with ESMTP id BF224F9C
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 11:41:43 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r6GBfVaG010630
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 14:41:32 +0300 (EEST)
 (envelope-from daniel@digsys.bg)
Message-ID: <51E5316B.9070201@digsys.bg>
Date: Tue, 16 Jul 2013 14:41:31 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130627 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-fs <freebsd-fs@freebsd.org>
Subject: ZFS vdev I/O questions
Content-Type: text/plain; charset=windows-1251; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 11:41:44 -0000

I am observing some "strange" behaviour with I/O spread on ZFS vdevs and 
thought I might ask if someone has observed it too.

The system hardware is an Supermicro X8DTH-6F board with integrated 
LSI2008 controller, two Xeon E5620 CPUs and 72GB or RAM (6x4 + 6x8 GB 
modules).
Runs 9-stable r252690.

It has currently 18 drive zpool, split on three 6 drive raidz2 vdevs, 
plus ZIL and L2ARC on separate SSDs (240GB Intel 520). The ZIL consists 
of two partitions of the boot SSDs (Intel 320), not mirrored. The zpool 
layout is

   pool: storage
  state: ONLINE
   scan: scrub canceled on Thu Jul 11 17:14:50 2013
config:

         NAME            STATE     READ WRITE CKSUM
         storage         ONLINE       0     0     0
           raidz2-0      ONLINE       0     0     0
             gpt/disk00  ONLINE       0     0     0
             gpt/disk01  ONLINE       0     0     0
             gpt/disk02  ONLINE       0     0     0
             gpt/disk03  ONLINE       0     0     0
             gpt/disk04  ONLINE       0     0     0
             gpt/disk05  ONLINE       0     0     0
           raidz2-1      ONLINE       0     0     0
             gpt/disk06  ONLINE       0     0     0
             gpt/disk07  ONLINE       0     0     0
             gpt/disk08  ONLINE       0     0     0
             gpt/disk09  ONLINE       0     0     0
             gpt/disk10  ONLINE       0     0     0
             gpt/disk11  ONLINE       0     0     0
           raidz2-2      ONLINE       0     0     0
             gpt/disk12  ONLINE       0     0     0
             gpt/disk13  ONLINE       0     0     0
             gpt/disk14  ONLINE       0     0     0
             gpt/disk15  ONLINE       0     0     0
             gpt/disk16  ONLINE       0     0     0
             gpt/disk17  ONLINE       0     0     0
         logs
           ada0p2        ONLINE       0     0     0
           ada1p2        ONLINE       0     0     0
         cache
           da20p2        ONLINE       0     0     0


zdb output

storage:
     version: 5000
     name: 'storage'
     state: 0
     txg: 5258772
     pool_guid: 17094379857311239400
     hostid: 3505628652
     hostname: 'a1.register.bg'
     vdev_children: 5
     vdev_tree:
         type: 'root'
         id: 0
         guid: 17094379857311239400
         children[0]:
             type: 'raidz'
             id: 0
             guid: 2748500753748741494
             nparity: 2
             metaslab_array: 33
             metaslab_shift: 37
             ashift: 12
             asize: 18003521961984
             is_log: 0
             create_txg: 4
             children[0]:
                 type: 'disk'
                 id: 0
                 guid: 5074824874132816460
                 path: '/dev/gpt/disk00'
                 phys_path: '/dev/gpt/disk00'
                 whole_disk: 1
                 DTL: 378
                 create_txg: 4
             children[1]:
                 type: 'disk'
                 id: 1
                 guid: 14410366944090513563
                 path: '/dev/gpt/disk01'
                 phys_path: '/dev/gpt/disk01'
                 whole_disk: 1
                 DTL: 53
                 create_txg: 4
             children[2]:
                 type: 'disk'
                 id: 2
                 guid: 3526681390841761237
                 path: '/dev/gpt/disk02'
                 phys_path: '/dev/gpt/disk02'
                 whole_disk: 1
                 DTL: 52
                 create_txg: 4
             children[3]:
                 type: 'disk'
                 id: 3
                 guid: 3773850995072323004
                 path: '/dev/gpt/disk03'
                 phys_path: '/dev/gpt/disk03'
                 whole_disk: 1
                 DTL: 51
                 create_txg: 4
             children[4]:
                 type: 'disk'
                 id: 4
                 guid: 16528489666301728411
                 path: '/dev/gpt/disk04'
                 phys_path: '/dev/gpt/disk04'
                 whole_disk: 1
                 DTL: 50
                 create_txg: 4
             children[5]:
                 type: 'disk'
                 id: 5
                 guid: 11222774817699257051
                 path: '/dev/gpt/disk05'
                 phys_path: '/dev/gpt/disk05'
                 whole_disk: 1
                 DTL: 44147
                 create_txg: 4
         children[1]:
             type: 'raidz'
             id: 1
             guid: 614220834244218709
             nparity: 2
             metaslab_array: 39
             metaslab_shift: 37
             ashift: 12
             asize: 18003521961984
             is_log: 0
             create_txg: 40
             children[0]:
                 type: 'disk'
                 id: 0
                 guid: 8076478524731550200
                 path: '/dev/gpt/disk06'
                 phys_path: '/dev/gpt/disk06'
                 whole_disk: 1
                 DTL: 2914
                 create_txg: 40
             children[1]:
                 type: 'disk'
                 id: 1
                 guid: 1689851194543981566
                 path: '/dev/gpt/disk07'
                 phys_path: '/dev/gpt/disk07'
                 whole_disk: 1
                 DTL: 48
                 create_txg: 40
             children[2]:
                 type: 'disk'
                 id: 2
                 guid: 9743236178648200269
                 path: '/dev/gpt/disk08'
                 phys_path: '/dev/gpt/disk08'
                 whole_disk: 1
                 DTL: 47
                 create_txg: 40
             children[3]:
                 type: 'disk'
                 id: 3
                 guid: 10157617457760516410
                 path: '/dev/gpt/disk09'
                 phys_path: '/dev/gpt/disk09'
                 whole_disk: 1
                 DTL: 46
                 create_txg: 40
             children[4]:
                 type: 'disk'
                 id: 4
                 guid: 5035981195206926078
                 path: '/dev/gpt/disk10'
                 phys_path: '/dev/gpt/disk10'
                 whole_disk: 1
                 DTL: 45
                 create_txg: 40
             children[5]:
                 type: 'disk'
                 id: 5
                 guid: 4975835521778875251
                 path: '/dev/gpt/disk11'
                 phys_path: '/dev/gpt/disk11'
                 whole_disk: 1
                 DTL: 44149
                 create_txg: 40
         children[2]:
             type: 'raidz'
             id: 2
             guid: 7453512836015019221
             nparity: 2
             metaslab_array: 38974
             metaslab_shift: 37
             ashift: 12
             asize: 18003521961984
             is_log: 0
             create_txg: 4455560
             children[0]:
                 type: 'disk'
                 id: 0
                 guid: 11182458869377968267
                 path: '/dev/gpt/disk12'
                 phys_path: '/dev/gpt/disk12'
                 whole_disk: 1
                 DTL: 45059
                 create_txg: 4455560
             children[1]:
                 type: 'disk'
                 id: 1
                 guid: 5844283175515272344
                 path: '/dev/gpt/disk13'
                 phys_path: '/dev/gpt/disk13'
                 whole_disk: 1
                 DTL: 44145
                 create_txg: 4455560
             children[2]:
                 type: 'disk'
                 id: 2
                 guid: 13095364699938843583
                 path: '/dev/gpt/disk14'
                 phys_path: '/dev/gpt/disk14'
                 whole_disk: 1
                 DTL: 44144
                 create_txg: 4455560
             children[3]:
                 type: 'disk'
                 id: 3
                 guid: 5196507898996589388
                 path: '/dev/gpt/disk15'
                 phys_path: '/dev/gpt/disk15'
                 whole_disk: 1
                 DTL: 44143
                 create_txg: 4455560
             children[4]:
                 type: 'disk'
                 id: 4
                 guid: 12809770017318709512
                 path: '/dev/gpt/disk16'
                 phys_path: '/dev/gpt/disk16'
                 whole_disk: 1
                 DTL: 44142
                 create_txg: 4455560
             children[5]:
                 type: 'disk'
                 id: 5
                 guid: 7339883019925920701
                 path: '/dev/gpt/disk17'
                 phys_path: '/dev/gpt/disk17'
                 whole_disk: 1
                 DTL: 44141
                 create_txg: 4455560
         children[3]:
             type: 'disk'
             id: 3
             guid: 18011869864924559827
             path: '/dev/ada0p2'
             phys_path: '/dev/ada0p2'
             whole_disk: 1
             metaslab_array: 16675
             metaslab_shift: 26
             ashift: 12
             asize: 8585216000
             is_log: 1
             DTL: 86787
             create_txg: 5182360
         children[4]:
             type: 'disk'
             id: 4
             guid: 1338775535758010670
             path: '/dev/ada1p2'
             phys_path: '/dev/ada1p2'
             whole_disk: 1
             metaslab_array: 16693
             metaslab_shift: 26
             ashift: 12
             asize: 8585216000
             is_log: 1
             DTL: 86788
             create_txg: 5182377
     features_for_read:

Drives da0-da5 were Hitachi Deskstar 7K3000 (Hitachi HDS723030ALA640, 
firmware MKAOA3B0) -- these are 512 byte sector drives, but da0 has been 
replaced by Seagate Barracuda 7200.14 (AF) (ST3000DM001-1CH166, firmware 
CC24) -- this is an 4k sector drive of a new generation (notice the 
relatively 'old' firmware, that can't be upgraded).
Drives da6-da17 are also Seagate Barracuda 7200.14 (AF) but 
(ST3000DM001-9YN166, firmware CC4H) -- the more "normal" part number. 
Some have firmware CC4C which I replace drive by drive (but other than 
the excessive load counts no other issues so far).

The only ZFS related tuning is in /etc/sysctl.conf
# improve ZFS resilver
vfs.zfs.resilver_delay=0
vfs.zfs.scrub_delay=0
vfs.zfs.top_maxinflight=128
vfs.zfs.resilver_min_time_ms=5000
vfs.zfs.vdev.max_pending=24
# L2ARC:
vfs.zfs.l2arc_norw=0
vfs.zfs.l2arc_write_max=83886080
vfs.zfs.l2arc_write_boost=83886080


The pool of course had dedup and had serious dedup ratios, like over 
10x. In general, with the ZIL and L2ARC, the only trouble I have seen 
with dedup is when deleting lots of data... which this server has seen 
plenty of. During this experiment, I have moved most data to other 
server and un-dedup the last remaining TBs.

While doing zfs destroy on an 2-3TB dataset, I observe very annoying 
behaviour. The pool would stay mostly idle, accepting almost no I/O and 
doing small random reads, like this

$ zpool iostat storage 10
storage     45.3T  3.45T    466      0  1.82M      0
storage     45.3T  3.45T     50      0   203K      0
storage     45.3T  3.45T     45     25   183K  1.70M
storage     45.3T  3.45T     49      0   199K      0
storage     45.3T  3.45T     50      0   202K      0
storage     45.3T  3.45T     51      0   204K      0
storage     45.3T  3.45T     57      0   230K      0
storage     45.3T  3.45T     65      0   260K      0
storage     45.3T  3.45T     68     25   274K  1.70M
storage     45.3T  3.45T     65      0   260K      0
storage     45.3T  3.45T     64      0   260K      0
storage     45.3T  3.45T     67      0   272K      0
storage     45.3T  3.45T     66      0   266K      0
storage     45.3T  3.45T     64      0   258K      0
storage     45.3T  3.45T     62     25   250K  1.70M
storage     45.3T  3.45T     57      0   231K      0
storage     45.3T  3.45T     58      0   235K      0
storage     45.3T  3.45T     66      0   267K      0
storage     45.3T  3.45T     64      0   257K      0
storage     45.3T  3.45T     60      0   241K      0
storage     45.3T  3.45T     50      0   203K      0
storage     45.3T  3.45T     52     25   209K  1.70M
storage     45.3T  3.45T     54      0   217K      0
storage     45.3T  3.45T     51      0   205K      0
storage     45.3T  3.45T     54      0   216K      0
storage     45.3T  3.45T     55      0   222K      0
storage     45.3T  3.45T     56      0   226K      0
storage     45.3T  3.45T     65      0   264K      0
storage     45.3T  3.45T     71      0   286K      0

The write peaks are from processes syncing data to the pool - in this 
state it does not do reads (the data the sync process deals with is 
already in ARC).
Then it goes into writing back to the pool (perhaps DDT metadata)

storage     45.3T  3.45T     17  24.4K  69.6K  97.5M
storage     45.3T  3.45T      0  19.6K      0  78.5M
storage     45.3T  3.45T      0  14.2K      0  56.8M
storage     45.3T  3.45T      0  7.90K      0  31.6M
storage     45.3T  3.45T      0  7.81K      0  32.8M
storage     45.3T  3.45T      0  9.54K      0  38.2M
storage     45.3T  3.45T      0  7.07K      0  28.3M
storage     45.3T  3.45T      0  7.70K      0  30.8M
storage     45.3T  3.45T      0  6.19K      0  24.8M
storage     45.3T  3.45T      0  5.45K      0  21.8M
storage     45.3T  3.45T      0  5.78K      0  24.7M
storage     45.3T  3.45T      0  5.29K      0  21.2M
storage     45.3T  3.45T      0  5.69K      0  22.8M
storage     45.3T  3.45T      0  5.52K      0  22.1M
storage     45.3T  3.45T      0  3.26K      0  13.1M
storage     45.3T  3.45T      0  1.77K      0  7.10M
storage     45.3T  3.45T      0  1.63K      0  8.14M
storage     45.3T  3.45T      0  1.41K      0  5.64M
storage     45.3T  3.45T      0  1.22K      0  4.88M
storage     45.3T  3.45T      0  1.27K      0  5.09M
storage     45.3T  3.45T      0  1.06K      0  4.26M
storage     45.3T  3.45T      0  1.07K      0  4.30M
storage     45.3T  3.45T      0    979      0  3.83M
storage     45.3T  3.45T      0   1002      0  3.91M
storage     45.3T  3.45T      0   1010      0  3.95M
storage     45.3T  3.45T      0    948  2.40K  3.71M
storage     45.3T  3.45T      0    939      0  3.67M
storage     45.3T  3.45T      0   1023      0  7.10M
storage     45.3T  3.45T      0  1.01K  4.80K  4.04M
storage     45.3T  3.45T      0    822      0  3.22M
storage     45.3T  3.45T      0    434      0  1.70M
storage     45.3T  3.45T      0    398  2.40K  1.56M

For quite some time, there are no reads from the pool. When that 
happens, gstat (gstat -f 'da[0-9]*$') displays something like this:


dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    24   1338      0      0    0.0   1338  12224   17.8  100.9| da0
    24   6888      0      0    0.0   6888  60720    3.5  100.0| da1
    24   6464      0      0    0.0   6464  71997    3.7  100.0| da2
    24   6117      0      0    0.0   6117  82386    3.9   99.9| da3
    24   6455      0      0    0.0   6455  66822    3.7  100.0| da4
    24   6782      0      0    0.0   6782  69207    3.5  100.0| da5
    24    698      0      0    0.0    698  27533   34.1   99.6| da6
    24    590      0      0    0.0    590  21627   40.9   99.7| da7
    24    561      0      0    0.0    561  21031   42.8  100.2| da8
    24    724      0      0    0.0    724  25583   33.1   99.9| da9
    24    567      0      0    0.0    567  22965   41.4   98.0| da10
    24    566      0      0    0.0    566  21834   42.4   99.9| da11
    24    586      0      0    0.0    586   4899   43.5  100.2| da12
    24    487      0      0    0.0    487   4008   49.3  100.9| da13
    24    628      0      0    0.0    628   5007   38.9  100.2| da14
    24    714      0      0    0.0    714   5706   33.8   99.9| da15
    24    595      0      0    0.0    595   4831   39.8   99.8| da16
    24    485      0      0    0.0    485   3932   49.2  100.1| da17
     0      0      0      0    0.0      0      0    0.0    0.0| da18
     0      0      0      0    0.0      0      0    0.0    0.0| da19
     0      0      0      0    0.0      0      0    0.0    0.0| da20
     0      0      0      0    0.0      0      0    0.0    0.0| ada0
     0      0      0      0    0.0      0      0    0.0    0.0| ada1


(drives da8 and 19 are spares, da20 is the L2ARC SSD drive, ada0 and 
ada0 are the boot SSDs in separate zpool)
Now, here comes the weird part. the gpart display would show intensive 
writes to all vdevs (da0-da5, da6-da11,da12-da17) then one of the vdevs 
would complete writing, and stop writing, while other vdevs continue, at 
the end only one vdev writes until as it seems, data is completely 
written to all vdevs (this can be observed in the zfs iostat output 
above with the decreasing write IOPS each 10 seconds), then there is a 
few seconds "do nothing" period and then we are back to small reads.

The other observation I have is with the first vdev: the 512b drives do 
a lot of I/O fast, complete first and then sit idle, while da0 continues 
to write for many more seconds. They consistently show many more IOPS 
than the other drives for this type of activity -- on streaming writes 
all drives behave more or less the same. It is only on this un-dedup 
scenario where the difference is so much pronounced.

All the vdevs in the pool are with ashift=12 so the theory that ZFS 
actually issues 512b writes to these drives can't be true, can it?

Another worry is this Seagate Barracuda 7200.14 (AF) 
(ST3000DM001-1CH166, firmware CC24) drive. It seems constantly 
under-performing. Does anyone know if it is so different from the 
ST3000DM001-9YN166 drives? Might be, I should just replace it?

My concern is the bursty and irregular nature of writing to vdevs. As it 
is now, an write operation to the pool needs to wait for all of the vdev 
writes to complete which is this case takes tens of seconds. A single 
drive in an vdev that underperforms will slow down the entire pool.
Perhaps ZFS could prioritize vdev usage based on the vdev troughput, 
similar to how it prioritizes writes based on how much it is full.

Also, what is ZFS doing during the idle periods? Are there some timeouts 
involved? It is certainly not using any CPU... The small random I/O is 
certainly not loading the disks.

Then, I have 240GB L2ARC and secondarycache=metadata for the pool. Yet, 
the DDT apparently does not want to go there... Is there a way to 
"force" it to be loaded to L2ARC? Before the last big delete, I had

zdb -D storage
DDT-sha256-zap-duplicate: 19907778 entries, size 1603 on disk, 259 in core
DDT-sha256-zap-unique: 30101659 entries, size 1428 on disk, 230 in core

dedup = 1.98, compress = 1.00, copies = 1.03, dedup * compress / copies 
= 1.92

With time, the in core values stay more or less the same.

I also discovered, that the L2ARC drive apparently is not subject to 
TRIM for some reason. TRIM works on the boot drives, but these are 
connected to the motherboard SATA ports).

# sysctl kern.cam.da.20
kern.cam.da.20.delete_method: ATA_TRIM
kern.cam.da.20.minimum_cmd_size: 6
kern.cam.da.20.sort_io_queue: 0
kern.cam.da.20.error_inject: 0

# sysctl -a | grep trim
vfs.zfs.vdev.trim_on_init: 1
vfs.zfs.vdev.trim_max_pending: 64
vfs.zfs.vdev.trim_max_bytes: 2147483648
vfs.zfs.trim.enabled: 1
vfs.zfs.trim.max_interval: 1
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.txg_delay: 32
kstat.zfs.misc.zio_trim.bytes: 139489971200
kstat.zfs.misc.zio_trim.success: 628351
kstat.zfs.misc.zio_trim.unsupported: 622819
kstat.zfs.misc.zio_trim.failed: 0

Yet, I don't observe any BIO_DELETE activity to this drive with gstat -d

Wasn't TRIM supposed to work on drives attached to LSI2008 in 9-stable?

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 11:53:12 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 6778B4AC
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 11:53:12 +0000 (UTC)
 (envelope-from feld@freebsd.org)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25]) by mx1.freebsd.org (Postfix) with ESMTP id 4266AC0
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 11:53:11 +0000 (UTC)
Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46])
 by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 50E0D20F78
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 07:53:10 -0400 (EDT)
Received: from frontend2 ([10.202.2.161])
 by compute6.internal (MEProxy); Tue, 16 Jul 2013 07:53:10 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
 messagingengine.com; h=date:from:to:subject:message-id
 :references:mime-version:content-type:in-reply-to; s=smtpout;
 bh=qX1EwoWbVQPKDn700pWCAjjGrYI=; b=n4RZQU1oA4t8qeZJs+MjzNdKAT8a
 NLe/Et3/QfSZ9wzhRxObj9bsIetrhSBiidSc1Mk7M7xUr5NMyux6ikKYGSn90Yv4
 YzdNQ+offHF08VrOQL3d7dR9q9LDutAhX2+efWCJALQL05BVqB0C9df6mzbEvEGb
 rQrzb0vIK7c9IpA=
X-Sasl-enc: +Ml65F9anGLoJbKs+tdj46YkjSfG8VO+dH+mPLkaSHS3 1373975590
Received: from mwi1.coffeenet.org (unknown [66.170.3.2])
 by mail.messagingengine.com (Postfix) with ESMTPA id 0F0A56800AE
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 07:53:10 -0400 (EDT)
Date: Tue, 16 Jul 2013 06:53:05 -0500
From: Mark Felder <feld@freebsd.org>
To: freebsd-fs@freebsd.org
Subject: Re: ZFS vdev I/O questions
Message-ID: <20130716115305.GA40918@mwi1.coffeenet.org>
References: <51E5316B.9070201@digsys.bg>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51E5316B.9070201@digsys.bg>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 11:53:12 -0000

On Tue, Jul 16, 2013 at 02:41:31PM +0300, Daniel Kalchev wrote:
> I am observing some "strange" behaviour with I/O spread on ZFS vdevs and 
> thought I might ask if someone has observed it too.
> 

--SNIP--

> Drives da0-da5 were Hitachi Deskstar 7K3000 (Hitachi HDS723030ALA640, 
> firmware MKAOA3B0) -- these are 512 byte sector drives, but da0 has been 
> replaced by Seagate Barracuda 7200.14 (AF) (ST3000DM001-1CH166, firmware 
> CC24) -- this is an 4k sector drive of a new generation (notice the 
> relatively 'old' firmware, that can't be upgraded).

--SNIP--

> The other observation I have is with the first vdev: the 512b drives do 
> a lot of I/O fast, complete first and then sit idle, while da0 continues 
> to write for many more seconds. They consistently show many more IOPS 
> than the other drives for this type of activity -- on streaming writes 
> all drives behave more or less the same. It is only on this un-dedup 
> scenario where the difference is so much pronounced.
> 
> All the vdevs in the pool are with ashift=12 so the theory that ZFS 
> actually issues 512b writes to these drives can't be true, can it?
> 
> Another worry is this Seagate Barracuda 7200.14 (AF) 
> (ST3000DM001-1CH166, firmware CC24) drive. It seems constantly 
> under-performing. Does anyone know if it is so different from the 
> ST3000DM001-9YN166 drives? Might be, I should just replace it?
>

A lot of information here.

Those Hitachis are great drives. The addition of the Barracuda with
different performance characteristics could be part of the problem. I'm
glad you pointed out that the pool ashift=12 so we can try to rule that
out. I'd be quite interested in knowing if some or perhaps even all of
your issues go away simply by replacing that drive with another Hitachi.

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 11:58:22 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 08BDD788
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 11:58:22 +0000 (UTC)
 (envelope-from prvs=190921e474=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 686A8FB
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 11:58:21 +0000 (UTC)
Received: from r2d2 ([82.69.141.170])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50005008033.msg
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 12:58:18 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Tue, 16 Jul 2013 12:58:18 +0100
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 82.69.141.170
X-Return-Path: prvs=190921e474=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@freebsd.org
Message-ID: <3472068604314C9887FE3BD4CD42B7C8@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Daniel Kalchev" <daniel@digsys.bg>, "freebsd-fs" <freebsd-fs@freebsd.org>
References: <51E5316B.9070201@digsys.bg>
Subject: Re: ZFS vdev I/O questions
Date: Tue, 16 Jul 2013 12:58:43 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 11:58:22 -0000

One thing to check with this is to add -d to your gstat to see
if your waiting deletes IO? I doubt it but worth checking.

    Regards
    Steve
----- Original Message ----- 
From: "Daniel Kalchev" <daniel@digsys.bg>
To: "freebsd-fs" <freebsd-fs@freebsd.org>
Sent: Tuesday, July 16, 2013 12:41 PM
Subject: ZFS vdev I/O questions


>I am observing some "strange" behaviour with I/O spread on ZFS vdevs and 
> thought I might ask if someone has observed it too.
> 
> The system hardware is an Supermicro X8DTH-6F board with integrated 
> LSI2008 controller, two Xeon E5620 CPUs and 72GB or RAM (6x4 + 6x8 GB 
> modules).
> Runs 9-stable r252690.
> 
> It has currently 18 drive zpool, split on three 6 drive raidz2 vdevs, 
> plus ZIL and L2ARC on separate SSDs (240GB Intel 520). The ZIL consists 
> of two partitions of the boot SSDs (Intel 320), not mirrored. The zpool 
> layout is
> 
>   pool: storage
>  state: ONLINE
>   scan: scrub canceled on Thu Jul 11 17:14:50 2013
> config:
> 
>         NAME            STATE     READ WRITE CKSUM
>         storage         ONLINE       0     0     0
>           raidz2-0      ONLINE       0     0     0
>             gpt/disk00  ONLINE       0     0     0
>             gpt/disk01  ONLINE       0     0     0
>             gpt/disk02  ONLINE       0     0     0
>             gpt/disk03  ONLINE       0     0     0
>             gpt/disk04  ONLINE       0     0     0
>             gpt/disk05  ONLINE       0     0     0
>           raidz2-1      ONLINE       0     0     0
>             gpt/disk06  ONLINE       0     0     0
>             gpt/disk07  ONLINE       0     0     0
>             gpt/disk08  ONLINE       0     0     0
>             gpt/disk09  ONLINE       0     0     0
>             gpt/disk10  ONLINE       0     0     0
>             gpt/disk11  ONLINE       0     0     0
>           raidz2-2      ONLINE       0     0     0
>             gpt/disk12  ONLINE       0     0     0
>             gpt/disk13  ONLINE       0     0     0
>             gpt/disk14  ONLINE       0     0     0
>             gpt/disk15  ONLINE       0     0     0
>             gpt/disk16  ONLINE       0     0     0
>             gpt/disk17  ONLINE       0     0     0
>         logs
>           ada0p2        ONLINE       0     0     0
>           ada1p2        ONLINE       0     0     0
>         cache
>           da20p2        ONLINE       0     0     0
> 
> 
> zdb output
> 
> storage:
>     version: 5000
>     name: 'storage'
>     state: 0
>     txg: 5258772
>     pool_guid: 17094379857311239400
>     hostid: 3505628652
>     hostname: 'a1.register.bg'
>     vdev_children: 5
>     vdev_tree:
>         type: 'root'
>         id: 0
>         guid: 17094379857311239400
>         children[0]:
>             type: 'raidz'
>             id: 0
>             guid: 2748500753748741494
>             nparity: 2
>             metaslab_array: 33
>             metaslab_shift: 37
>             ashift: 12
>             asize: 18003521961984
>             is_log: 0
>             create_txg: 4
>             children[0]:
>                 type: 'disk'
>                 id: 0
>                 guid: 5074824874132816460
>                 path: '/dev/gpt/disk00'
>                 phys_path: '/dev/gpt/disk00'
>                 whole_disk: 1
>                 DTL: 378
>                 create_txg: 4
>             children[1]:
>                 type: 'disk'
>                 id: 1
>                 guid: 14410366944090513563
>                 path: '/dev/gpt/disk01'
>                 phys_path: '/dev/gpt/disk01'
>                 whole_disk: 1
>                 DTL: 53
>                 create_txg: 4
>             children[2]:
>                 type: 'disk'
>                 id: 2
>                 guid: 3526681390841761237
>                 path: '/dev/gpt/disk02'
>                 phys_path: '/dev/gpt/disk02'
>                 whole_disk: 1
>                 DTL: 52
>                 create_txg: 4
>             children[3]:
>                 type: 'disk'
>                 id: 3
>                 guid: 3773850995072323004
>                 path: '/dev/gpt/disk03'
>                 phys_path: '/dev/gpt/disk03'
>                 whole_disk: 1
>                 DTL: 51
>                 create_txg: 4
>             children[4]:
>                 type: 'disk'
>                 id: 4
>                 guid: 16528489666301728411
>                 path: '/dev/gpt/disk04'
>                 phys_path: '/dev/gpt/disk04'
>                 whole_disk: 1
>                 DTL: 50
>                 create_txg: 4
>             children[5]:
>                 type: 'disk'
>                 id: 5
>                 guid: 11222774817699257051
>                 path: '/dev/gpt/disk05'
>                 phys_path: '/dev/gpt/disk05'
>                 whole_disk: 1
>                 DTL: 44147
>                 create_txg: 4
>         children[1]:
>             type: 'raidz'
>             id: 1
>             guid: 614220834244218709
>             nparity: 2
>             metaslab_array: 39
>             metaslab_shift: 37
>             ashift: 12
>             asize: 18003521961984
>             is_log: 0
>             create_txg: 40
>             children[0]:
>                 type: 'disk'
>                 id: 0
>                 guid: 8076478524731550200
>                 path: '/dev/gpt/disk06'
>                 phys_path: '/dev/gpt/disk06'
>                 whole_disk: 1
>                 DTL: 2914
>                 create_txg: 40
>             children[1]:
>                 type: 'disk'
>                 id: 1
>                 guid: 1689851194543981566
>                 path: '/dev/gpt/disk07'
>                 phys_path: '/dev/gpt/disk07'
>                 whole_disk: 1
>                 DTL: 48
>                 create_txg: 40
>             children[2]:
>                 type: 'disk'
>                 id: 2
>                 guid: 9743236178648200269
>                 path: '/dev/gpt/disk08'
>                 phys_path: '/dev/gpt/disk08'
>                 whole_disk: 1
>                 DTL: 47
>                 create_txg: 40
>             children[3]:
>                 type: 'disk'
>                 id: 3
>                 guid: 10157617457760516410
>                 path: '/dev/gpt/disk09'
>                 phys_path: '/dev/gpt/disk09'
>                 whole_disk: 1
>                 DTL: 46
>                 create_txg: 40
>             children[4]:
>                 type: 'disk'
>                 id: 4
>                 guid: 5035981195206926078
>                 path: '/dev/gpt/disk10'
>                 phys_path: '/dev/gpt/disk10'
>                 whole_disk: 1
>                 DTL: 45
>                 create_txg: 40
>             children[5]:
>                 type: 'disk'
>                 id: 5
>                 guid: 4975835521778875251
>                 path: '/dev/gpt/disk11'
>                 phys_path: '/dev/gpt/disk11'
>                 whole_disk: 1
>                 DTL: 44149
>                 create_txg: 40
>         children[2]:
>             type: 'raidz'
>             id: 2
>             guid: 7453512836015019221
>             nparity: 2
>             metaslab_array: 38974
>             metaslab_shift: 37
>             ashift: 12
>             asize: 18003521961984
>             is_log: 0
>             create_txg: 4455560
>             children[0]:
>                 type: 'disk'
>                 id: 0
>                 guid: 11182458869377968267
>                 path: '/dev/gpt/disk12'
>                 phys_path: '/dev/gpt/disk12'
>                 whole_disk: 1
>                 DTL: 45059
>                 create_txg: 4455560
>             children[1]:
>                 type: 'disk'
>                 id: 1
>                 guid: 5844283175515272344
>                 path: '/dev/gpt/disk13'
>                 phys_path: '/dev/gpt/disk13'
>                 whole_disk: 1
>                 DTL: 44145
>                 create_txg: 4455560
>             children[2]:
>                 type: 'disk'
>                 id: 2
>                 guid: 13095364699938843583
>                 path: '/dev/gpt/disk14'
>                 phys_path: '/dev/gpt/disk14'
>                 whole_disk: 1
>                 DTL: 44144
>                 create_txg: 4455560
>             children[3]:
>                 type: 'disk'
>                 id: 3
>                 guid: 5196507898996589388
>                 path: '/dev/gpt/disk15'
>                 phys_path: '/dev/gpt/disk15'
>                 whole_disk: 1
>                 DTL: 44143
>                 create_txg: 4455560
>             children[4]:
>                 type: 'disk'
>                 id: 4
>                 guid: 12809770017318709512
>                 path: '/dev/gpt/disk16'
>                 phys_path: '/dev/gpt/disk16'
>                 whole_disk: 1
>                 DTL: 44142
>                 create_txg: 4455560
>             children[5]:
>                 type: 'disk'
>                 id: 5
>                 guid: 7339883019925920701
>                 path: '/dev/gpt/disk17'
>                 phys_path: '/dev/gpt/disk17'
>                 whole_disk: 1
>                 DTL: 44141
>                 create_txg: 4455560
>         children[3]:
>             type: 'disk'
>             id: 3
>             guid: 18011869864924559827
>             path: '/dev/ada0p2'
>             phys_path: '/dev/ada0p2'
>             whole_disk: 1
>             metaslab_array: 16675
>             metaslab_shift: 26
>             ashift: 12
>             asize: 8585216000
>             is_log: 1
>             DTL: 86787
>             create_txg: 5182360
>         children[4]:
>             type: 'disk'
>             id: 4
>             guid: 1338775535758010670
>             path: '/dev/ada1p2'
>             phys_path: '/dev/ada1p2'
>             whole_disk: 1
>             metaslab_array: 16693
>             metaslab_shift: 26
>             ashift: 12
>             asize: 8585216000
>             is_log: 1
>             DTL: 86788
>             create_txg: 5182377
>     features_for_read:
> 
> Drives da0-da5 were Hitachi Deskstar 7K3000 (Hitachi HDS723030ALA640, 
> firmware MKAOA3B0) -- these are 512 byte sector drives, but da0 has been 
> replaced by Seagate Barracuda 7200.14 (AF) (ST3000DM001-1CH166, firmware 
> CC24) -- this is an 4k sector drive of a new generation (notice the 
> relatively 'old' firmware, that can't be upgraded).
> Drives da6-da17 are also Seagate Barracuda 7200.14 (AF) but 
> (ST3000DM001-9YN166, firmware CC4H) -- the more "normal" part number. 
> Some have firmware CC4C which I replace drive by drive (but other than 
> the excessive load counts no other issues so far).
> 
> The only ZFS related tuning is in /etc/sysctl.conf
> # improve ZFS resilver
> vfs.zfs.resilver_delay=0
> vfs.zfs.scrub_delay=0
> vfs.zfs.top_maxinflight=128
> vfs.zfs.resilver_min_time_ms=5000
> vfs.zfs.vdev.max_pending=24
> # L2ARC:
> vfs.zfs.l2arc_norw=0
> vfs.zfs.l2arc_write_max=83886080
> vfs.zfs.l2arc_write_boost=83886080
> 
> 
> The pool of course had dedup and had serious dedup ratios, like over 
> 10x. In general, with the ZIL and L2ARC, the only trouble I have seen 
> with dedup is when deleting lots of data... which this server has seen 
> plenty of. During this experiment, I have moved most data to other 
> server and un-dedup the last remaining TBs.
> 
> While doing zfs destroy on an 2-3TB dataset, I observe very annoying 
> behaviour. The pool would stay mostly idle, accepting almost no I/O and 
> doing small random reads, like this
> 
> $ zpool iostat storage 10
> storage     45.3T  3.45T    466      0  1.82M      0
> storage     45.3T  3.45T     50      0   203K      0
> storage     45.3T  3.45T     45     25   183K  1.70M
> storage     45.3T  3.45T     49      0   199K      0
> storage     45.3T  3.45T     50      0   202K      0
> storage     45.3T  3.45T     51      0   204K      0
> storage     45.3T  3.45T     57      0   230K      0
> storage     45.3T  3.45T     65      0   260K      0
> storage     45.3T  3.45T     68     25   274K  1.70M
> storage     45.3T  3.45T     65      0   260K      0
> storage     45.3T  3.45T     64      0   260K      0
> storage     45.3T  3.45T     67      0   272K      0
> storage     45.3T  3.45T     66      0   266K      0
> storage     45.3T  3.45T     64      0   258K      0
> storage     45.3T  3.45T     62     25   250K  1.70M
> storage     45.3T  3.45T     57      0   231K      0
> storage     45.3T  3.45T     58      0   235K      0
> storage     45.3T  3.45T     66      0   267K      0
> storage     45.3T  3.45T     64      0   257K      0
> storage     45.3T  3.45T     60      0   241K      0
> storage     45.3T  3.45T     50      0   203K      0
> storage     45.3T  3.45T     52     25   209K  1.70M
> storage     45.3T  3.45T     54      0   217K      0
> storage     45.3T  3.45T     51      0   205K      0
> storage     45.3T  3.45T     54      0   216K      0
> storage     45.3T  3.45T     55      0   222K      0
> storage     45.3T  3.45T     56      0   226K      0
> storage     45.3T  3.45T     65      0   264K      0
> storage     45.3T  3.45T     71      0   286K      0
> 
> The write peaks are from processes syncing data to the pool - in this 
> state it does not do reads (the data the sync process deals with is 
> already in ARC).
> Then it goes into writing back to the pool (perhaps DDT metadata)
> 
> storage     45.3T  3.45T     17  24.4K  69.6K  97.5M
> storage     45.3T  3.45T      0  19.6K      0  78.5M
> storage     45.3T  3.45T      0  14.2K      0  56.8M
> storage     45.3T  3.45T      0  7.90K      0  31.6M
> storage     45.3T  3.45T      0  7.81K      0  32.8M
> storage     45.3T  3.45T      0  9.54K      0  38.2M
> storage     45.3T  3.45T      0  7.07K      0  28.3M
> storage     45.3T  3.45T      0  7.70K      0  30.8M
> storage     45.3T  3.45T      0  6.19K      0  24.8M
> storage     45.3T  3.45T      0  5.45K      0  21.8M
> storage     45.3T  3.45T      0  5.78K      0  24.7M
> storage     45.3T  3.45T      0  5.29K      0  21.2M
> storage     45.3T  3.45T      0  5.69K      0  22.8M
> storage     45.3T  3.45T      0  5.52K      0  22.1M
> storage     45.3T  3.45T      0  3.26K      0  13.1M
> storage     45.3T  3.45T      0  1.77K      0  7.10M
> storage     45.3T  3.45T      0  1.63K      0  8.14M
> storage     45.3T  3.45T      0  1.41K      0  5.64M
> storage     45.3T  3.45T      0  1.22K      0  4.88M
> storage     45.3T  3.45T      0  1.27K      0  5.09M
> storage     45.3T  3.45T      0  1.06K      0  4.26M
> storage     45.3T  3.45T      0  1.07K      0  4.30M
> storage     45.3T  3.45T      0    979      0  3.83M
> storage     45.3T  3.45T      0   1002      0  3.91M
> storage     45.3T  3.45T      0   1010      0  3.95M
> storage     45.3T  3.45T      0    948  2.40K  3.71M
> storage     45.3T  3.45T      0    939      0  3.67M
> storage     45.3T  3.45T      0   1023      0  7.10M
> storage     45.3T  3.45T      0  1.01K  4.80K  4.04M
> storage     45.3T  3.45T      0    822      0  3.22M
> storage     45.3T  3.45T      0    434      0  1.70M
> storage     45.3T  3.45T      0    398  2.40K  1.56M
> 
> For quite some time, there are no reads from the pool. When that 
> happens, gstat (gstat -f 'da[0-9]*$') displays something like this:
> 
> 
> dT: 1.001s  w: 1.000s  filter: da[0-9]*$
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>    24   1338      0      0    0.0   1338  12224   17.8  100.9| da0
>    24   6888      0      0    0.0   6888  60720    3.5  100.0| da1
>    24   6464      0      0    0.0   6464  71997    3.7  100.0| da2
>    24   6117      0      0    0.0   6117  82386    3.9   99.9| da3
>    24   6455      0      0    0.0   6455  66822    3.7  100.0| da4
>    24   6782      0      0    0.0   6782  69207    3.5  100.0| da5
>    24    698      0      0    0.0    698  27533   34.1   99.6| da6
>    24    590      0      0    0.0    590  21627   40.9   99.7| da7
>    24    561      0      0    0.0    561  21031   42.8  100.2| da8
>    24    724      0      0    0.0    724  25583   33.1   99.9| da9
>    24    567      0      0    0.0    567  22965   41.4   98.0| da10
>    24    566      0      0    0.0    566  21834   42.4   99.9| da11
>    24    586      0      0    0.0    586   4899   43.5  100.2| da12
>    24    487      0      0    0.0    487   4008   49.3  100.9| da13
>    24    628      0      0    0.0    628   5007   38.9  100.2| da14
>    24    714      0      0    0.0    714   5706   33.8   99.9| da15
>    24    595      0      0    0.0    595   4831   39.8   99.8| da16
>    24    485      0      0    0.0    485   3932   49.2  100.1| da17
>     0      0      0      0    0.0      0      0    0.0    0.0| da18
>     0      0      0      0    0.0      0      0    0.0    0.0| da19
>     0      0      0      0    0.0      0      0    0.0    0.0| da20
>     0      0      0      0    0.0      0      0    0.0    0.0| ada0
>     0      0      0      0    0.0      0      0    0.0    0.0| ada1
> 
> 
> (drives da8 and 19 are spares, da20 is the L2ARC SSD drive, ada0 and 
> ada0 are the boot SSDs in separate zpool)
> Now, here comes the weird part. the gpart display would show intensive 
> writes to all vdevs (da0-da5, da6-da11,da12-da17) then one of the vdevs 
> would complete writing, and stop writing, while other vdevs continue, at 
> the end only one vdev writes until as it seems, data is completely 
> written to all vdevs (this can be observed in the zfs iostat output 
> above with the decreasing write IOPS each 10 seconds), then there is a 
> few seconds "do nothing" period and then we are back to small reads.
> 
> The other observation I have is with the first vdev: the 512b drives do 
> a lot of I/O fast, complete first and then sit idle, while da0 continues 
> to write for many more seconds. They consistently show many more IOPS 
> than the other drives for this type of activity -- on streaming writes 
> all drives behave more or less the same. It is only on this un-dedup 
> scenario where the difference is so much pronounced.
> 
> All the vdevs in the pool are with ashift=12 so the theory that ZFS 
> actually issues 512b writes to these drives can't be true, can it?
> 
> Another worry is this Seagate Barracuda 7200.14 (AF) 
> (ST3000DM001-1CH166, firmware CC24) drive. It seems constantly 
> under-performing. Does anyone know if it is so different from the 
> ST3000DM001-9YN166 drives? Might be, I should just replace it?
> 
> My concern is the bursty and irregular nature of writing to vdevs. As it 
> is now, an write operation to the pool needs to wait for all of the vdev 
> writes to complete which is this case takes tens of seconds. A single 
> drive in an vdev that underperforms will slow down the entire pool.
> Perhaps ZFS could prioritize vdev usage based on the vdev troughput, 
> similar to how it prioritizes writes based on how much it is full.
> 
> Also, what is ZFS doing during the idle periods? Are there some timeouts 
> involved? It is certainly not using any CPU... The small random I/O is 
> certainly not loading the disks.
> 
> Then, I have 240GB L2ARC and secondarycache=metadata for the pool. Yet, 
> the DDT apparently does not want to go there... Is there a way to 
> "force" it to be loaded to L2ARC? Before the last big delete, I had
> 
> zdb -D storage
> DDT-sha256-zap-duplicate: 19907778 entries, size 1603 on disk, 259 in core
> DDT-sha256-zap-unique: 30101659 entries, size 1428 on disk, 230 in core
> 
> dedup = 1.98, compress = 1.00, copies = 1.03, dedup * compress / copies 
> = 1.92
> 
> With time, the in core values stay more or less the same.
> 
> I also discovered, that the L2ARC drive apparently is not subject to 
> TRIM for some reason. TRIM works on the boot drives, but these are 
> connected to the motherboard SATA ports).
> 
> # sysctl kern.cam.da.20
> kern.cam.da.20.delete_method: ATA_TRIM
> kern.cam.da.20.minimum_cmd_size: 6
> kern.cam.da.20.sort_io_queue: 0
> kern.cam.da.20.error_inject: 0
> 
> # sysctl -a | grep trim
> vfs.zfs.vdev.trim_on_init: 1
> vfs.zfs.vdev.trim_max_pending: 64
> vfs.zfs.vdev.trim_max_bytes: 2147483648
> vfs.zfs.trim.enabled: 1
> vfs.zfs.trim.max_interval: 1
> vfs.zfs.trim.timeout: 30
> vfs.zfs.trim.txg_delay: 32
> kstat.zfs.misc.zio_trim.bytes: 139489971200
> kstat.zfs.misc.zio_trim.success: 628351
> kstat.zfs.misc.zio_trim.unsupported: 622819
> kstat.zfs.misc.zio_trim.failed: 0
> 
> Yet, I don't observe any BIO_DELETE activity to this drive with gstat -d
> 
> Wasn't TRIM supposed to work on drives attached to LSI2008 in 9-stable?
> 
> Daniel
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 12:16:44 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 61619E15
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 12:16:44 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 by mx1.freebsd.org (Postfix) with ESMTP id D481E1FA
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 12:16:43 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r6GCGcZ4023385
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 15:16:39 +0300 (EEST)
 (envelope-from daniel@digsys.bg)
Message-ID: <51E539A6.9090109@digsys.bg>
Date: Tue, 16 Jul 2013 15:16:38 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130627 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS vdev I/O questions
References: <51E5316B.9070201@digsys.bg>
 <3472068604314C9887FE3BD4CD42B7C8@multiplay.co.uk>
In-Reply-To: <3472068604314C9887FE3BD4CD42B7C8@multiplay.co.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 12:16:44 -0000


On 16.07.13 14:53, Mark Felder wrote:
>> The other observation I have is with the first vdev: the 512b drives do
>> a lot of I/O fast, complete first and then sit idle, while da0 continues
>> to write for many more seconds. They consistently show many more IOPS
>> than the other drives for this type of activity -- on streaming writes
>> all drives behave more or less the same. It is only on this un-dedup
>> scenario where the difference is so much pronounced.
>>
>> All the vdevs in the pool are with ashift=12 so the theory that ZFS
>> actually issues 512b writes to these drives can't be true, can it?
>>
>> Another worry is this Seagate Barracuda 7200.14 (AF)
>> (ST3000DM001-1CH166, firmware CC24) drive. It seems constantly
>> under-performing. Does anyone know if it is so different from the
>> ST3000DM001-9YN166 drives? Might be, I should just replace it?
>>
> A lot of information here.
>
> Those Hitachis are great drives. The addition of the Barracuda with
> different performance characteristics could be part of the problem. I'm
> glad you pointed out that the pool ashift=12 so we can try to rule that
> out. I'd be quite interested in knowing if some or perhaps even all of
> your issues go away simply by replacing that drive with another Hitachi.

I don't have any more of these available and the vendor unfortunately 
could not supply more (but I am considering looking elsewhere)
At the moment, I could only replace that Barracuda with one of the spare 
drives, which are Seagate SV35 (ST3000VX000-9YW166, firmware CV13)
I have observed that these drives are not particularly good at random 
I/O however....


On 16.07.13 14:58, Steven Hartland wrote:
> One thing to check with this is to add -d to your gstat to see
> if your waiting deletes IO? I doubt it but worth checking.

Here is with gstat -d. Actually, running this script makes more sense:

while true ;do
gstat -f 'da[0-9]*$' -d -b
sleep 1
done


(small random reads, heavy writes, then some reads)
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0     11      1      4   20.8     10     44    0.4      0 0    
0.0    2.2  da0
     0     15      5     20    7.1     10     40    5.8      0 0    
0.0    5.1  da1
     0      7      0      0    0.0      7     28    0.4      0 0    
0.0    0.2  da2
     1      7      0      0    0.0      7     28    4.4      0 0    
0.0    1.1  da3
     1     11      0      0    0.0     11     44    0.5      0 0    
0.0    0.2  da4
     0     11      0      0    0.0     11     44    0.4      0 0    
0.0    0.2  da5
     0     25     16     68    7.8      9     36    0.4      0 0    
0.0    5.4  da6
     0     21     12     52   14.2      9     36    0.3      0 0    
0.0    6.0  da7
     0      9      0      0    0.0      9     36    0.3      0 0    
0.0    0.1  da8
     0     35     23    176   12.5     12     48    0.4      0 0    
0.0    9.4  da9
     0     26     17    140   13.7      9     36    0.3      0 0    
0.0    7.3  da10
     0      8      0      0    0.0      8     32    0.3      0 0    
0.0    0.1  da11
     0    158     29    212   17.4    129   4083   81.9      0 0    
0.0   57.8  da12
     0    162     25    192   16.7    137   4423   84.8      0 0    
0.0   61.9  da13
     0    118      0      0    0.0    118   3844   95.5      0 0    
0.0   45.6  da14
     5    152     20    112   10.8    132   4127   66.9      0 0    
0.0   45.8  da15
     0    171     14     88   17.4    157   4990   89.2      0 0    
0.0   69.6  da16
     0    124      1      4    6.6    123   3840   71.5      0 0    
0.0   37.6  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0   4558   4557   6736    0.5      1      1    0.2      0 0    
0.0   14.7  da20
     0    120      0      0    0.0    119   8083   35.2      0 0    
0.0   17.4  ada0
     0    120      0      0    0.0    119   8083   34.0      0 0    
0.0   16.8  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      1      1      4   14.0      0      0    0.0      0 0    
0.0    1.4  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      2      2      8   18.8      0      0    0.0      0 0    
0.0    3.7  da2
     0      1      1      4   35.1      0      0    0.0      0 0    
0.0    3.5  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     1      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
     0      3      3     12   12.8      0      0    0.0      0 0    
0.0    3.8  da6
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da7
     0      3      3     12   14.3      0      0    0.0      0 0    
0.0    4.3  da8
     0      1      1      4    9.8      0      0    0.0      0 0    
0.0    1.0  da9
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da10
     0      7      7     28    9.8      0      0    0.0      0 0    
0.0    6.8  da11
     0      1      1      4    7.4      0      0    0.0      0 0    
0.0    0.7  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      3      3     12   34.0      0      0    0.0      0 0    
0.0   10.2  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0   2656    236    455    0.2   2420  12284    0.3      0 0    
0.0   21.3  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      4      4     16   22.8      0      0    0.0      0 0    
0.0    9.1  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     1      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      1      1      4   17.4      0      0    0.0      0 0    
0.0    1.7  da3
     0      3      3     12   13.2      0      0    0.0      0 0    
0.0    4.0  da4
     0      2      2      8   28.2      0      0    0.0      0 0    
0.0    5.6  da5
     0      1      1      4    9.7      0      0    0.0      0 0    
0.0    1.0  da6
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da7
     0      3      3     12    8.8      0      0    0.0      0 0    
0.0    2.6  da8
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da9
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da10
     0      5      5     20   10.1      0      0    0.0      0 0    
0.0    5.0  da11
     0      4      4     16   10.4      0      0    0.0      0 0    
0.0    4.1  da12
     0      1      1      4   16.0      0      0    0.0      0 0    
0.0    1.6  da13
     0      1      1      4   13.8      0      0    0.0      0 0    
0.0    1.4  da14
     0      4      4     16    7.3      0      0    0.0      0 0    
0.0    2.9  da15
     0      1      1      4   13.1      0      0    0.0      0 0    
0.0    1.3  da16
     0      1      1      4   21.3      0      0    0.0      0 0    
0.0    2.1  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0    515    515    979    0.2      0      0    0.0      0 0    
0.0    9.2  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      1      1      4   32.3      0      0    0.0      0 0    
0.0    3.2  da0
     0      1      1      4   35.0      0      0    0.0      0 0    
0.0    3.5  da1
     0      4      4     16    4.3      0      0    0.0      0 0    
0.0    1.7  da2
     0      3      3     12   31.8      0      0    0.0      0 0    
0.0    9.5  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      4      4     16    6.9      0      0    0.0      0 0    
0.0    2.8  da5
     0      2      2      8   13.6      0      0    0.0      0 0    
0.0    2.7  da6
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da7
     0      4      4     16    7.4      0      0    0.0      0 0    
0.0    3.0  da8
     0      1      1      4   12.6      0      0    0.0      0 0    
0.0    1.3  da9
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da10
     0      6      6     24   10.0      0      0    0.0      0 0    
0.0    6.0  da11
     0      1      1      4   21.6      0      0    0.0      0 0    
0.0    2.2  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     1      1      1      4   10.8      0      0    0.0      0 0    
0.0    1.1  da14
     0      1      1      4   18.3      0      0    0.0      0 0    
0.0    1.8  da15
     0      1      1      4    9.9      0      0    0.0      0 0    
0.0    1.0  da16
     0      1      1      4   21.6      0      0    0.0      0 0    
0.0    2.2  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0    588    587   1130    0.2      1      2    0.6      0 0    
0.0   10.5  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    23   1412      0      0    0.0   1412  24423   15.0      0 0    
0.0   99.6  da0
    23   4699      0      0    0.0   4699  72429    4.1      0 0    
0.0   97.2  da1
    24   4294      0      0    0.0   4294  67749    4.6      0 0    
0.0   99.0  da2
    23   4923      0      0    0.0   4923  75746    3.8      0 0    
0.0   96.2  da3
    24   4452      0      0    0.0   4452  67729    4.3      0 0    
0.0   97.5  da4
    23   4558      0      0    0.0   4558  68992    4.2      0 0    
0.0   97.3  da5
    24    813      0      0    0.0    813  10675   27.2      0 0    
0.0   99.6  da6
    24    866      0      0    0.0    866  11338   25.5      0 0    
0.0   99.5  da7
    24    845      0      0    0.0    845  11058   25.8      0 0    
0.0   99.5  da8
    24    866      0      0    0.0    866  11174   25.3      0 0    
0.0   98.8  da9
    24    928      0      0    0.0    928  12061   23.6      0 0    
0.0   99.2  da10
    24    864      0      0    0.0    864  11150   25.6      0 0    
0.0   99.2  da11
    24    674      0      0    0.0    674  14587   32.4      0 0    
0.0   99.9  da12
    24    705      0      0    0.0    705  15043   31.7      0 0    0.0  
100.1  da13
    24    670      0      0    0.0    670  14527   33.0      0 0    
0.0   99.5  da14
    24    648      0      0    0.0    648  13360   34.3      0 0    
0.0   99.5  da15
    24    716      0      0    0.0    716  14835   30.9      0 0    
0.0   99.5  da16
    24    730      0      0    0.0    730  15227   30.0      0 0    
0.0   98.7  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    24   1169      0      0    0.0   1169  15954   20.6      0 0    0.0  
100.3  da0
     0   1811      0      0    0.0   1811  20961    3.9      0 0    
0.0   29.6  da1
     0   2392      0      0    0.0   2392  27444    3.7      0 0    
0.0   37.1  da2
     0   1199      0      0    0.0   1199  14571    3.7      0 0    
0.0   18.5  da3
     0   2305      0      0    0.0   2305  26924    4.0      0 0    
0.0   38.1  da4
     0   1820      0      0    0.0   1820  21573    3.8      0 0    
0.0   28.8  da5
    24    744      0      0    0.0    744   9248   32.6      0 0    
0.0   99.5  da6
    24    771      0      0    0.0    771  10722   31.1      0 0    0.0  
100.0  da7
    24    716      0      0    0.0    716   9268   33.5      0 0    0.0  
100.2  da8
    24    754      0      0    0.0    754   9280   31.8      0 0    
0.0   99.9  da9
    24    769      0      0    0.0    769   9384   31.4      0 0    
0.0   99.9  da10
    24    742      0      0    0.0    742   9508   32.1      0 0    
0.0   98.5  da11
    24    665      0      0    0.0    665   9967   36.5      0 0    
0.0   99.3  da12
    24    740      0      0    0.0    740  10958   32.2      0 0    
0.0   98.7  da13
    24    639      0      0    0.0    639   9619   37.4      0 0    
0.0   99.7  da14
    24    695      0      0    0.0    695  10367   35.1      0 0    
0.0   99.8  da15
    24    583      0      0    0.0    583   8429   41.5      0 0    0.0  
100.2  da16
    24    697      0      0    0.0    697  10287   34.3      0 0    0.0  
100.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    24    895      0      0    0.0    895  14773   26.9      0 0    0.0  
101.6  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    702      0      0    0.0    702   9327   33.8      0 0    
0.0   97.6  da6
    24    675      0      0    0.0    675   9191   35.3      0 0    0.0  
100.1  da7
    24    668      0      0    0.0    668   8575   36.1      0 0    0.0  
100.7  da8
    24    744      0      0    0.0    744   9179   31.4      0 0    
0.0   97.1  da9
    24    750      0      0    0.0    750   9003   32.3      0 0    0.0  
100.1  da10
    24    586      0      0    0.0    586   6909   40.9      0 0    0.0  
100.0  da11
    24    711      0      0    0.0    711  13990   33.6      0 0    
0.0   99.4  da12
    24    653      0      0    0.0    653  10074   36.5      0 0    0.0  
100.0  da13
    24    645      0      0    0.0    645  13111   37.0      0 0    0.0  
100.0  da14
    24    687      0      0    0.0    687  12795   34.8      0 0    
0.0   99.9  da15
    24    678      0      0    0.0    678  12855   35.6      0 0    0.0  
100.6  da16
    24    696      0      0    0.0    696  13091   34.4      0 0    
0.0   99.2  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    24   1177      0      0    0.0   1177  18586   20.6      0 0    0.0  
100.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    780      0      0    0.0    780  11733   30.6      0 0    
0.0   99.3  da6
    23    776      0      0    0.0    776  12636   31.1      0 0    0.0  
100.2  da7
    24    636      0      0    0.0    636   9387   37.8      0 0    
0.0   99.7  da8
    24    778      0      0    0.0    778  13343   31.2      0 0    0.0  
100.6  da9
    24    821      0      0    0.0    821  14202   29.3      0 0    0.0  
101.0  da10
    24    786      0      0    0.0    786  12024   30.6      0 0    0.0  
100.0  da11
    24    666      0      0    0.0    666   9351   35.9      0 0    0.0  
100.0  da12
    24    720      0      0    0.0    720   9954   33.1      0 0    
0.0   99.7  da13
    24    706      0      0    0.0    706  10014   34.0      0 0    
0.0   99.8  da14
    24    801      0      0    0.0    801  11105   30.3      0 0    0.0  
101.1  da15
    24    738      0      0    0.0    738  10126   32.5      0 0    
0.0   99.9  da16
    24    670      0      0    0.0    670   9203   36.5      0 0    0.0  
101.4  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    23   1650      0      0    0.0   1650  18674   14.4      0 0    0.0  
100.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    669      0      0    0.0    669  13234   35.9      0 0    0.0  
100.1  da6
    24    573      0      0    0.0    573  12223   42.4      0 0    
0.0   99.7  da7
    23    519      0      0    0.0    519  10584   46.3      0 0    0.0  
100.1  da8
    24    659      0      0    0.0    659  15552   36.9      0 0    0.0  
100.5  da9
    24    687      0      0    0.0    687  18410   35.1      0 0    
0.0   99.3  da10
    23    671      0      0    0.0    671  13190   36.0      0 0    0.0  
100.2  da11
    24    807      0      0    0.0    807   9425   29.7      0 0    
0.0   99.9  da12
    24    665      0      0    0.0    665   7902   36.4      0 0    0.0  
100.3  da13
    24    861      0      0    0.0    861  10140   28.0      0 0    
0.0   99.6  da14
    24    714      0      0    0.0    714   8797   33.4      0 0    
0.0   99.7  da15
    24    753      0      0    0.0    753   9225   32.5      0 0    0.0  
100.0  da16
    24    760      0      0    0.0    760   9337   31.5      0 0    0.0  
100.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    648      0      0    0.0    648   9954   37.0      0 0    0.0  
100.0  da6
    24    664      0      0    0.0    664  10194   36.1      0 0    0.0  
100.0  da7
    24    653      0      0    0.0    653  13163   36.7      0 0    
0.0   99.9  da8
    24    749      0      0    0.0    749  12716   31.8      0 0    0.0  
100.4  da9
    24    769      0      0    0.0    769  13239   31.4      0 0    0.0  
100.3  da10
    24    742      0      0    0.0    742  12017   32.4      0 0    0.0  
100.1  da11
    24    584      0      0    0.0    584   7393   40.8      0 0    0.0  
101.1  da12
    24    648      0      0    0.0    648   8336   37.7      0 0    
0.0   99.7  da13
    24    679      0      0    0.0    679   8756   35.3      0 0    0.0  
100.8  da14
    24    646      0      0    0.0    646   8160   37.0      0 0    0.0  
100.5  da15
    24    692      0      0    0.0    692   8900   35.5      0 0    
0.0   99.9  da16
    24    684      0      0    0.0    684   8692   35.6      0 0    0.0  
101.7  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    662      0      0    0.0    662  14641   35.8      0 0    0.0  
100.0  da6
    24    705      0      0    0.0    705  16535   34.2      0 0    
0.0   99.1  da7
    24    644      0      0    0.0    644  10120   37.4      0 0    0.0  
101.8  da8
    24    560      0      0    0.0    560  14721   43.4      0 0    
0.0   99.5  da9
    24    677      0      0    0.0    677  20864   35.2      0 0    0.0  
100.0  da10
    24    685      0      0    0.0    685  19009   35.6      0 0    0.0  
100.3  da11
    24    714      0      0    0.0    714  10328   34.0      0 0    0.0  
101.7  da12
    24    738      0      0    0.0    738  10476   32.2      0 0    0.0  
100.0  da13
    24    587      0      0    0.0    587   8465   40.6      0 0    0.0  
100.1  da14
    24    720      0      0    0.0    720   9888   33.8      0 0    0.0  
100.1  da15
    24    603      0      0    0.0    603   8258   39.1      0 0    0.0  
100.1  da16
    24    623      0      0    0.0    623   9025   38.3      0 0    
0.0   99.9  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
     0     76      0      0    0.0     76   1702   39.6      0 0    
0.0   10.9  da6
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da7
     0    298      0      0    0.0    298   8704   33.8      0 0    
0.0   40.5  da8
     0    112      0      0    0.0    112   3065   48.0      0 0    
0.0   22.1  da9
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da10
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da11
    24    645      0      0    0.0    645   9942   37.3      0 0    0.0  
100.9  da12
    24    581      0      0    0.0    581   8660   41.1      0 0    
0.0   99.8  da13
    24    717      0      0    0.0    717  10910   33.6      0 0    0.0  
101.4  da14
    24    737      0      0    0.0    737  11737   32.6      0 0    
0.0   99.9  da15
    24    556      0      0    0.0    556   8796   42.3      0 0    
0.0   99.5  da16
    24    538      0      0    0.0    538   8444   44.1      0 0    0.0  
100.1  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da6
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da7
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da8
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da9
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da10
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da11
     0    642      0      0    0.0    642   8875   37.2      0 0    0.0  
100.0  da12
    24    672      0      0    0.0    672   9323   36.0      0 0    0.0  
100.0  da13
    24    765      0      0    0.0    765  10602   31.5      0 0    0.0  
101.6  da14
     0    684      0      0    0.0    684   9902   31.0      0 0    
0.0   87.1  da15
    23    683      0      0    0.0    683  10210   35.6      0 0    0.0  
100.4  da16
    24    654      0      0    0.0    654   9603   37.1      0 0    
0.0   99.9  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.000s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0    114    109   3811    1.9      4     16    0.5      0 0    
0.0   12.6  da0
     0    349    343  12129    0.7      4     16    0.5      0 0    
0.0   24.5  da1
     0    232    226   8322    1.0      4     16    0.5      0 0    
0.0   23.6  da2
     0    230    224   8338    2.7      4     16    0.5      0 0    
0.0   28.0  da3
     0    349    343  12129    0.7      4     16    0.5      0 0    
0.0   23.9  da4
     0    122    116   3775    1.1      4     16    0.5      0 0    
0.0   21.5  da5
     0      5      0      0    0.0      4     16    0.4      0 0    
0.0    4.8  da6
     0      5      0      0    0.0      4     16    0.3      0 0    
0.0    4.2  da7
     0    356    351  12577    0.8      4     16    0.4      0 0    
0.0   14.4  da8
     0    349    344  12573    0.7      4     16    0.3      0 0    
0.0   15.5  da9
     0    359    354  12573    0.6      4     16    0.4      0 0    
0.0    9.5  da10
     0    360    355  12589    1.0      4     16    0.3      0 0    
0.0   14.0  da11
     0      6      0      0    0.0      4     16    0.3      0 0    
0.0   16.9  da12
     0      5      0      0    0.0      4     16    0.3      0 0    
0.0    3.6  da13
     0      5      0      0    0.0      4     16    0.3      0 0    
0.0    5.3  da14
     0      5      0      0    0.0      4     16    0.3      0 0    
0.0    3.8  da15
     0      5      0      0    0.0      4     16    0.3      0 0    
0.0    4.9  da16
     0      9      4     16   15.7      4     16    0.4      0 0    
0.0    5.8  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      1      0      0    0.0      0      0    0.0      1 3839    
0.4    0.0  ada0
     0      2      0      0    0.0      0      0    0.0      2 3839    
0.5    0.0  ada1
^C

There were some heavy writes to da20 (the L2ARC) but no delete requests. 
While deletes on ada0 and ada1 do happen as expected. The L2ARC should 
delete, as it is nearly full.

zpool iostat -v storage 10
                    capacity     operations    bandwidth
pool            alloc   free   read  write   read  write
--------------  -----  -----  -----  -----  -----  -----
storage         44.8T  3.95T     77  9.34K  5.03M  43.1M
   raidz2        15.1T  1.13T     27  3.09K  2.37M  14.1M
     gpt/disk00      -      -      8    485   303K  6.66M
     gpt/disk01      -      -     16    477   603K  6.56M
     gpt/disk02      -      -      9    476   320K  6.56M
     gpt/disk03      -      -      8    476   322K  6.56M
     gpt/disk04      -      -     16    477   603K  6.56M
     gpt/disk05      -      -      8    476   301K  6.55M
   raidz2        14.7T  1.53T     32  3.13K  2.57M  14.5M
     gpt/disk06      -      -     14    444   502K  6.74M
     gpt/disk07      -      -     11    444   385K  6.74M
     gpt/disk08      -      -     14    444   552K  6.74M
     gpt/disk09      -      -     15    445   582K  6.73M
     gpt/disk10      -      -      7    444   280K  6.72M
     gpt/disk11      -      -      9    444   369K  6.72M
   raidz2        15.0T  1.29T     17  3.10K  87.0K  14.1M
     gpt/disk12      -      -      4    434  40.8K  6.65M
     gpt/disk13      -      -      2    434  16.3K  6.65M
     gpt/disk14      -      -      0    433  8.75K  6.62M
     gpt/disk15      -      -      4    435  40.6K  6.66M
     gpt/disk16      -      -      2    433  15.3K  6.61M
     gpt/disk17      -      -      0    434  8.54K  6.65M
logs                -      -      -      -      -      -
   ada0p2        1.01M  7.94G      0      2      0   185K
   ada1p2        1.01M  7.94G      0      2      0   185K
cache               -      -      -      -      -      -
   da20p2         215G    80M    439    674   703K  2.84M
--------------  -----  -----  -----  -----  -----  -----


Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 13:16:15 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 0263E989
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 13:16:15 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 by mx1.freebsd.org (Postfix) with ESMTP id 75FF8670
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 13:16:14 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r6GDG9rV061441
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 16:16:10 +0300 (EEST)
 (envelope-from daniel@digsys.bg)
Message-ID: <51E54799.8070700@digsys.bg>
Date: Tue, 16 Jul 2013 16:16:09 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130627 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: ZFS vdev I/O questions
References: <51E5316B.9070201@digsys.bg>
 <20130716115305.GA40918@mwi1.coffeenet.org>
In-Reply-To: <20130716115305.GA40918@mwi1.coffeenet.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 13:16:15 -0000


On 16.07.13 14:53, Mark Felder wrote:
> On Tue, Jul 16, 2013 at 02:41:31PM +0300, Daniel Kalchev wrote:
>> I am observing some "strange" behaviour with I/O spread on ZFS vdevs and
>> thought I might ask if someone has observed it too.
>>
> --SNIP--
>
>> Drives da0-da5 were Hitachi Deskstar 7K3000 (Hitachi HDS723030ALA640,
>> firmware MKAOA3B0) -- these are 512 byte sector drives, but da0 has been
>> replaced by Seagate Barracuda 7200.14 (AF) (ST3000DM001-1CH166, firmware
>> CC24) -- this is an 4k sector drive of a new generation (notice the
>> relatively 'old' firmware, that can't be upgraded).
> --SNIP--
>
>> The other observation I have is with the first vdev: the 512b drives do
>> a lot of I/O fast, complete first and then sit idle, while da0 continues
>> to write for many more seconds. They consistently show many more IOPS
>> than the other drives for this type of activity -- on streaming writes
>> all drives behave more or less the same. It is only on this un-dedup
>> scenario where the difference is so much pronounced.
>>
>> All the vdevs in the pool are with ashift=12 so the theory that ZFS
>> actually issues 512b writes to these drives can't be true, can it?
>>
>> Another worry is this Seagate Barracuda 7200.14 (AF)
>> (ST3000DM001-1CH166, firmware CC24) drive. It seems constantly
>> under-performing. Does anyone know if it is so different from the
>> ST3000DM001-9YN166 drives? Might be, I should just replace it?
>>
> A lot of information here.
>
> Those Hitachis are great drives. The addition of the Barracuda with
> different performance characteristics could be part of the problem. I'm
> glad you pointed out that the pool ashift=12 so we can try to rule that
> out. I'd be quite interested in knowing if some or perhaps even all of
> your issues go away simply by replacing that drive with another Hitachi.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

I wanted to commend further on this. The Hitachi drives are only in the 
first vdev (da0-da5) together with that new Seagate Barracuda drive. 
However, I observe very irregular writing to all the vdevs, not just 
within the same vdev.

Here is output of gstat -d with interval 1 second:

dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0     42     42    600    1.1      0      0    0.0      0 0    
0.0    2.5  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0     33     33    460    4.3      0      0    0.0      0 0    
0.0    3.5  da5
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da6
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da7
     0     30     30    656    2.1      0      0    0.0      0 0    
0.0    3.0  da8
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da9
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da10
     0     34     34    748    1.5      0      0    0.0      0 0    
0.0    2.4  da11
     0     43     43   1299    1.7      0      0    0.0      0 0    
0.0    4.2  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0     41     41   1395    1.5      0      0    0.0      0 0    
0.0    3.3  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0   1081      0      0    0.0   1081  14551    0.7      0 0    
0.0   10.3  da20
     0    124      0      0    0.0     97    286    0.5     25 273    
3.7    1.2  ada0
     0    119      0      0    0.0     92    286    0.4     25 273    
3.5    1.1  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    24    501      0      0    0.0    501  18421   46.8      0 0    
0.0   98.8  da0
    24    690      0      0    0.0    690  34208   34.6      0 0    
0.0   99.8  da1
    24    691      0      0    0.0    691  33317   33.6      0 0    0.0  
100.2  da2
    24    750      0      0    0.0    750  37752   30.9      0 0    
0.0   99.9  da3
    24    672      0      0    0.0    672  32694   34.9      0 0    0.0  
100.1  da4
    24    722      0      0    0.0    722  36178   32.5      0 0    0.0  
100.0  da5
    24    633      0      0    0.0    633   9046   37.6      0 0    0.0  
100.1  da6
    24    601      0      0    0.0    601   8727   39.2      0 0    0.0  
100.0  da7
    24    620      0      0    0.0    620   9198   38.1      0 0    0.0  
100.0  da8
    24    619      0      0    0.0    619   8915   38.3      0 0    0.0  
100.3  da9
    24    539      0      0    0.0    539   7692   43.3      0 0    0.0  
100.0  da10
    24    715      0      0    0.0    715  10221   33.0      0 0    0.0  
100.5  da11
    24    584      0      0    0.0    584  44525   39.8      0 0    
0.0   99.4  da12
    24    543      0      0    0.0    543  41081   43.2      0 0    0.0  
100.6  da13
    24    523      0      0    0.0    523  40641   44.2      0 0    0.0  
100.0  da14
    24    521      0      0    0.0    521  40509   44.9      0 0    
0.0   99.9  da15
    24    505      0      0    0.0    505  40206   46.1      0 0    
0.0   99.8  da16
    24    524      0      0    0.0    524  40677   43.9      0 0    
0.0   99.9  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0   1082      0      0    0.0   1082   2941    0.2      0 0    
0.0    6.5  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.000s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    24    507      0      0    0.0    507  30284   47.7      0 0    
0.0   99.9  da0
    24    625      0      0    0.0    625  36929   38.7      0 0    0.0  
100.6  da1
    24    724      0      0    0.0    724  44142   33.2      0 0    0.0  
100.5  da2
    24    775      0      0    0.0    775  53063   30.6      0 0    
0.0   98.2  da3
    24    630      0      0    0.0    630  40891   37.8      0 0    0.0  
100.2  da4
    24    698      0      0    0.0    698  38149   35.4      0 0    0.0  
102.5  da5
    24    784      0      0    0.0    784  11787   30.7      0 0    
0.0   99.9  da6
    24    707      0      0    0.0    707  10840   34.3      0 0    
0.0   99.1  da7
    24    689      0      0    0.0    689  10668   34.9      0 0    
0.0   99.6  da8
    24    635      0      0    0.0    635   9528   37.8      0 0    0.0  
100.1  da9
    24    669      0      0    0.0    669  10268   35.6      0 0    
0.0   99.7  da10
    24    675      0      0    0.0    675  10304   35.2      0 0    0.0  
100.3  da11
    24    507      0      0    0.0    507  23746   47.4      0 0    0.0  
100.0  da12
    24    476      0      0    0.0    476  24454   48.9      0 0    0.0  
100.0  da13
    24    495      0      0    0.0    495  31043   48.2      0 0    0.0  
100.8  da14
    24    582      0      0    0.0    582  34710   41.3      0 0    0.0  
100.1  da15
    24    592      0      0    0.0    592  34022   41.1      0 0    0.0  
100.4  da16
    24    559      0      0    0.0    559  34854   42.5      0 0    
0.0   99.6  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.000s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
    24    719      0      0    0.0    719  23063   33.0      0 0    
0.0   99.2  da0
     0     94      0      0    0.0     94   8274   43.6      0 0    
0.0   16.0  da1
     0     46      0      0    0.0     46   3839   37.0      0 0    
0.0    7.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0    135      0      0    0.0    135   8966   39.3      0 0    
0.0   21.9  da4
     0     11      0      0    0.0     11    896   38.8      0 0    
0.0    1.4  da5
    24    648      0      0    0.0    648   9070   36.6      0 0    
0.0   99.9  da6
    24    679      0      0    0.0    679   9750   35.7      0 0    0.0  
100.1  da7
    24    686      0      0    0.0    686   9922   35.0      0 0    
0.0   99.9  da8
    24    666      0      0    0.0    666   9654   35.8      0 0    0.0  
100.6  da9
    24    682      0      0    0.0    682   9450   35.1      0 0    0.0  
100.4  da10
    24    700      0      0    0.0    700   9346   34.1      0 0    0.0  
100.0  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.000s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    428      0      0    0.0    428   4207   55.6      0 0    0.0  
100.6  da6
    24    447      0      0    0.0    447   4279   52.9      0 0    0.0  
100.4  da7
    24    432      0      0    0.0    432   4087   55.6      0 0    0.0  
100.5  da8
    24    524      0      0    0.0    524   6243   45.3      0 0    
0.0   99.6  da9
    24    554      0      0    0.0    554   6379   43.2      0 0    0.0  
100.0  da10
    24    439      0      0    0.0    439   4611   54.0      0 0    
0.0   97.9  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.000s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    350      0      0    0.0    350   3263   69.9      0 0    0.0  
102.4  da6
    24    326      0      0    0.0    326   3611   74.2      0 0    0.0  
100.1  da7
    24    335      0      0    0.0    335   3367   72.4      0 0    0.0  
100.0  da8
    24    329      0      0    0.0    329   2943   73.5      0 0    0.0  
100.3  da9
    24    326      0      0    0.0    326   2883   75.1      0 0    
0.0   99.8  da10
    24    369      0      0    0.0    369   2995   65.2      0 0    0.0  
100.3  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    525      0      0    0.0    525   8507   45.3      0 0    0.0  
100.1  da6
    24    430      0      0    0.0    430   6761   55.6      0 0    0.0  
101.7  da7
    24    479      0      0    0.0    479   7548   50.1      0 0    0.0  
100.6  da8
    24    542      0      0    0.0    542   9463   44.3      0 0    0.0  
100.2  da9
    23    593      0      0    0.0    593  10386   40.6      0 0    0.0  
100.1  da10
    24    555      0      0    0.0    555   9678   42.9      0 0    
0.0   98.0  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    566      0      0    0.0    566   9800   41.3      0 0    
0.0   98.4  da6
    24    526      0      0    0.0    526  12370   46.5      0 0    
0.0   99.9  da7
    24    577      0      0    0.0    577  13166   41.5      0 0    
0.0   99.9  da8
    24    538      0      0    0.0    538  11990   44.7      0 0    
0.0   99.9  da9
    24    631      0      0    0.0    631  12666   37.8      0 0    
0.0   99.6  da10
    24    650      0      0    0.0    650  12894   36.4      0 0    0.0  
101.2  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    365      0      0    0.0    365   3604   66.3      0 0    0.0  
100.0  da6
    24    361      0      0    0.0    361   3724   65.8      0 0    0.0  
100.2  da7
    24    363      0      0    0.0    363   3680   65.4      0 0    
0.0   99.5  da8
    24    342      0      0    0.0    342   3500   69.4      0 0    0.0  
100.8  da9
    24    355      0      0    0.0    355   3460   70.1      0 0    0.0  
101.1  da10
    24    373      0      0    0.0    373   3616   65.0      0 0    
0.0   99.5  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.000s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    539      0      0    0.0    539   4947   44.2      0 0    
0.0   99.8  da6
    24    468      0      0    0.0    468  12565   52.0      0 0    0.0  
100.2  da7
    24    493      0      0    0.0    493  10950   49.5      0 0    0.0  
100.0  da8
    24    450      0      0    0.0    450  12665   52.8      0 0    0.0  
100.2  da9
    24    528      0      0    0.0    528  11070   45.7      0 0    0.0  
100.0  da10
    24    542      0      0    0.0    542  10750   43.9      0 0    
0.0   98.0  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0     39      0      0    0.0     38   2583   14.6      0 0    
0.0    4.9  ada0
     0     39      0      0    0.0     38   2583   14.5      0 0    
0.0    4.9  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
    24    367      0      0    0.0    367   7972   65.1      0 0    0.0  
100.0  da6
    24    360      0      0    0.0    360   5055   67.4      0 0    0.0  
100.1  da7
    24    345      0      0    0.0    345   6233   69.4      0 0    0.0  
100.2  da8
    24    359      0      0    0.0    359   5191   65.9      0 0    0.0  
101.3  da9
    24    383      0      0    0.0    383   7452   62.2      0 0    0.0  
100.2  da10
    24    368      0      0    0.0    368   7528   64.7      0 0    0.0  
100.9  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da1
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da2
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da3
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da4
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da5
     0    181      0      0    0.0    181   6534   68.1      0 0    
0.0   50.3  da6
     0    379      0      0    0.0    379  14071   57.0      0 0    
0.0   90.5  da7
     0    229      0      0    0.0    229   8680   65.1      0 0    
0.0   64.1  da8
    24    400      0      0    0.0    400  13871   60.5      0 0    
0.0   99.8  da9
     0    193      0      0    0.0    193   6874   57.7      0 0    
0.0   45.4  da10
     0    222      0      0    0.0    222   7753   68.2      0 0    
0.0   65.0  da11
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da12
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da13
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da14
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da15
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da16
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1
dT: 1.001s  w: 1.000s  filter: da[0-9]*$
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d   %busy Name
     0    332    202  10093    8.8    130  11620   13.3      0 0    
0.0   51.2  da0
     0    253    121   4381    2.1    132  11620   10.9      0 0    
0.0   20.0  da1
     0    280    152   4976    4.6    128  11612   11.2      0 0    
0.0   28.9  da2
     0    376    247   9705    3.9    129  11612   11.3      0 0    
0.0   34.6  da3
     0    244    115   5540    5.4    129  11628   11.0      0 0    
0.0   27.7  da4
     0    268    138   5896    7.6    130  11620   10.9      0 0    
0.0   38.7  da5
     1    467    273  10988    7.9    194  12627   12.2      0 0    
0.0   60.4  da6
     0    349    147   6583    8.7    202  12659    9.4      0 0    
0.0   49.6  da7
     5    368    169   6803    7.0    199  12647   10.7      0 0    
0.0   48.7  da8
     0    451    253  11020    8.4    198  12651    9.7      0 0    
0.0   61.1  da9
     0    306    104   4493    6.2    202  12647   10.4      0 0    
0.0   31.7  da10
     0    350    151   4765    5.6    199  12627   10.4      0 0    
0.0   40.3  da11
     9    366    258  11652    6.9    108  12455   10.3      0 0    
0.0   47.1  da12
     0    302    194   8126    5.2    108  12455   13.2      0 0    
0.0   36.8  da13
     0    292    186   8162    3.1    106  12447   12.9      0 0    
0.0   30.0  da14
     0    370    264  12627    9.7    106  12447   13.3      0 0    
0.0   54.1  da15
     0    182     72   3110   10.2    110  12459    9.8      0 0    
0.0   28.1  da16
     0    171     62   3206    8.5    109  12455    9.8      0 0    
0.0   25.5  da17
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da18
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  da19
     4    744    216    394    0.4    529   1540    0.1      0 0    
0.0    4.0  da20
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada0
     0      0      0      0    0.0      0      0    0.0      0 0    
0.0    0.0  ada1

As you can see, the initial burst is to all vdevs, saturating drives at 
100%. Then vdev 3 completes, then the Hitachi drives of vdev 1 complete 
with the Seagate drive writing some more and then for few more seconds, 
only vdev 2 drives are writing. It seems the amount of data is the same, 
just vdev 2 writes the data slower. However, drives in vdev 2 and vdev 3 
are the same. They should have the same performance characteristics (and 
as long as the drives are not 100% saturated, all vdevs complete more or 
less at the same time). At other times, some other vdev would complete 
last -- it is never the same vdev that is 'slow'.

Could this be DDT/metadata specific issue? Is the DDT/metadata 
vdev-specific? The pool initially had only two vdevs and after vdev 3 
was added, most of the written data had no dedup enabled. Also, the ZIL 
was added later and initial metadata could be fragmented. But.. why 
should this affect writing? The zpool is indeed pretty full, but 
performance should degrade for all vdevs (which are more or less equally 
full).

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 13:28:12 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 3CA77AED
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 13:28:12 +0000 (UTC)
 (envelope-from feld@freebsd.org)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25]) by mx1.freebsd.org (Postfix) with ESMTP id 1593F6E3
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 13:28:11 +0000 (UTC)
Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41])
 by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id AB0A820DD9
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 09:28:10 -0400 (EDT)
Received: from web3 ([10.202.2.213])
 by compute1.internal (MEProxy); Tue, 16 Jul 2013 09:28:10 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
 messagingengine.com; h=message-id:from:to:mime-version
 :content-transfer-encoding:content-type:in-reply-to:references
 :subject:date; s=smtpout; bh=mddd0WTAXXbCrTook0FGMu24DSc=; b=n3z
 nXv31hzgW/bg+2diuacmdfPn4LE+Kw0+2hiqWqlevweTPJN6AQj2Yl/bKz3mHthp
 8IiF87O7XpFSqVnf+HSbYl3Fe6wIkKrLl7PRI9bFt9XFTM5D8IumluWZcWO1vb8Z
 71s6yVTVCJ7hIZw/d8eJi+isq7XnxoXbTEgXE0Pc=
Received: by web3.nyi.mail.srv.osa (Postfix, from userid 99)
 id 8A324B00003; Tue, 16 Jul 2013 09:28:10 -0400 (EDT)
Message-Id: <1373981290.1619.140661256268541.61E5E601@webmail.messagingengine.com>
X-Sasl-Enc: JPgyAKDUogswwdNuepDwVrkVBIyjJ64y7196KMhTC34O 1373981290
From: Mark Felder <feld@freebsd.org>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
X-Mailer: MessagingEngine.com Webmail Interface - ajax-bdcdd1cb
In-Reply-To: <51E54799.8070700@digsys.bg>
References: <51E5316B.9070201@digsys.bg>
 <20130716115305.GA40918@mwi1.coffeenet.org>
 <51E54799.8070700@digsys.bg>
Subject: Re: ZFS vdev I/O questions
Date: Tue, 16 Jul 2013 08:28:10 -0500
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 13:28:12 -0000

On Tue, Jul 16, 2013, at 8:16, Daniel Kalchev wrote:
> 
> Could this be DDT/metadata specific issue? Is the DDT/metadata 
> vdev-specific? The pool initially had only two vdevs and after vdev 3 
> was added, most of the written data had no dedup enabled. Also, the ZIL 
> was added later and initial metadata could be fragmented. But.. why 
> should this affect writing? The zpool is indeed pretty full, but 
> performance should degrade for all vdevs (which are more or less equally 
> full).
> 

I don't want to put you down the wrong path, but you're right -- the
zpool is pretty full, and zfs is known to have issues writing when above
~80%. There's another thread where this was discussed briefly. However,
you have quite a large pool so I find it hard to believe that your
3.45TB free is so fragmented that zfs is having issues choosing where to
write. It's certainly possible, though. 

Hopefully someone will drop in their 2c as well

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 14:09:42 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A217462D
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 14:09:42 +0000 (UTC)
 (envelope-from Ivailo.Tanusheff@skrill.com)
Received: from db9outboundpool.messaging.microsoft.com
 (mail-db9lp0249.outbound.messaging.microsoft.com [213.199.154.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 0743A93E
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 14:09:41 +0000 (UTC)
Received: from mail214-db9-R.bigfish.com (10.174.16.236) by
 DB9EHSOBE012.bigfish.com (10.174.14.75) with Microsoft SMTP Server id
 14.1.225.22; Tue, 16 Jul 2013 14:09:33 +0000
Received: from mail214-db9 (localhost [127.0.0.1])	by
 mail214-db9-R.bigfish.com (Postfix) with ESMTP id A6E9F1E01D2;	Tue, 16 Jul
 2013 14:09:33 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null);
 IPV:NLI; H:AM2PRD0710HT004.eurprd07.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -2
X-BigFish: PS-2(zz98dI9371I542I1432Izz1f42h1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz17326ah8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1e1dh9a9j1155h)
Received-SPF: pass (mail214-db9: domain of skrill.com designates
 157.56.249.213 as permitted sender) client-ip=157.56.249.213;
 envelope-from=Ivailo.Tanusheff@skrill.com;
 helo=AM2PRD0710HT004.eurprd07.prod.outlook.com ; .outlook.com ; 
X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;
 SFS:(189002)(199002)(51704005)(24454002)(13464003)(377454003)(49866001)(74706001)(80022001)(54316002)(47736001)(74662001)(81342001)(74366001)(74502001)(76576001)(50986001)(76786001)(56816003)(16406001)(47976001)(74316001)(15202345003)(81542001)(53806001)(77096001)(76796001)(56776001)(51856001)(66066001)(33646001)(63696002)(74876001)(79102001)(54356001)(4396001)(69226001)(83072001)(46102001)(77982001)(31966008)(65816001)(47446002)(76482001)(59766001)(24736002);
 DIR:OUT; SFP:; SCL:1; SRVR:DB3PR07MB060;
 H:DB3PR07MB059.eurprd07.prod.outlook.com; CLIP:217.18.249.148;
 RD:InfoNoRecords; MX:1; A:1; LANG:en; 
Received: from mail214-db9 (localhost.localdomain [127.0.0.1]) by mail214-db9
 (MessageSwitch) id 1373983771484888_2080;
 Tue, 16 Jul 2013 14:09:31 +0000 (UTC)
Received: from DB9EHSMHS025.bigfish.com (unknown [10.174.16.254])	by
 mail214-db9.bigfish.com (Postfix) with ESMTP id 7190220006E; Tue, 16 Jul 2013
 14:09:31 +0000 (UTC)
Received: from AM2PRD0710HT004.eurprd07.prod.outlook.com (157.56.249.213) by
 DB9EHSMHS025.bigfish.com (10.174.14.35) with Microsoft SMTP Server (TLS) id
 14.16.227.3; Tue, 16 Jul 2013 14:09:30 +0000
Received: from DB3PR07MB060.eurprd07.prod.outlook.com (10.242.137.151) by
 AM2PRD0710HT004.eurprd07.prod.outlook.com (10.255.165.39) with Microsoft SMTP
 Server (TLS) id 14.16.329.3; Tue, 16 Jul 2013 14:09:24 +0000
Received: from DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) by
 DB3PR07MB060.eurprd07.prod.outlook.com (10.242.137.151) with Microsoft SMTP
 Server (TLS) id 15.0.731.16; Tue, 16 Jul 2013 14:09:23 +0000
Received: from DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.117]) by
 DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.117]) with mapi id
 15.00.0731.000; Tue, 16 Jul 2013 14:09:23 +0000
From: Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com>
To: Daniel Kalchev <daniel@digsys.bg>, "freebsd-fs@freebsd.org"
 <freebsd-fs@freebsd.org>
Subject: RE: ZFS vdev I/O questions
Thread-Topic: ZFS vdev I/O questions
Thread-Index: AQHOghmmLmdfZD/4b0i3hLA3+rJ4o5lnMdyAgAAXNoCAAA4g4A==
Date: Tue, 16 Jul 2013 14:09:23 +0000
Message-ID: <9d3cf0be165d4351acc5e757de3868ec@DB3PR07MB059.eurprd07.prod.outlook.com>
References: <51E5316B.9070201@digsys.bg>
 <20130716115305.GA40918@mwi1.coffeenet.org> <51E54799.8070700@digsys.bg>
In-Reply-To: <51E54799.8070700@digsys.bg>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [217.18.249.148]
x-forefront-prvs: 09090B6B69
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: skrill.com
X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn%
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 14:09:42 -0000

Hi danbo :)

Isn't this some kind of pool fragmentation? Because this is usually the cas=
e in such slow parts of the disk systems. I think your pool is getting full=
 and it is heavily fragmented, that's why you have more data for each reque=
st on a different vdev.
But this has nothing to do with the single, slow device :(

Best regards,
Ivailo Tanusheff

-----Original Message-----
From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On=
 Behalf Of Daniel Kalchev
Sent: Tuesday, July 16, 2013 4:16 PM
To: freebsd-fs@freebsd.org
Subject: Re: ZFS vdev I/O questions


On 16.07.13 14:53, Mark Felder wrote:
> On Tue, Jul 16, 2013 at 02:41:31PM +0300, Daniel Kalchev wrote:
>> I am observing some "strange" behaviour with I/O spread on ZFS vdevs=20
>> and thought I might ask if someone has observed it too.
>>
> --SNIP--
>
>> Drives da0-da5 were Hitachi Deskstar 7K3000 (Hitachi HDS723030ALA640,=20
>> firmware MKAOA3B0) -- these are 512 byte sector drives, but da0 has=20
>> been replaced by Seagate Barracuda 7200.14 (AF) (ST3000DM001-1CH166,=20
>> firmware
>> CC24) -- this is an 4k sector drive of a new generation (notice the=20
>> relatively 'old' firmware, that can't be upgraded).
> --SNIP--
>

As you can see, the initial burst is to all vdevs, saturating drives at 100=
%. Then vdev 3 completes, then the Hitachi drives of vdev 1 complete with t=
he Seagate drive writing some more and then for few more seconds, only vdev=
 2 drives are writing. It seems the amount of data is the same, just vdev 2=
 writes the data slower. However, drives in vdev 2 and vdev 3 are the same.=
 They should have the same performance characteristics (and as long as the =
drives are not 100% saturated, all vdevs complete more or less at the same =
time). At other times, some other vdev would complete last -- it is never t=
he same vdev that is 'slow'.

Could this be DDT/metadata specific issue? Is the DDT/metadata vdev-specifi=
c? The pool initially had only two vdevs and after vdev 3 was added, most o=
f the written data had no dedup enabled. Also, the ZIL was added later and =
initial metadata could be fragmented. But.. why should this affect writing?=
 The zpool is indeed pretty full, but performance should degrade for all vd=
evs (which are more or less equally full).

Daniel
_______________________________________________
freebsd-fs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 14:23:41 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 4DF269FC
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 14:23:41 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 by mx1.freebsd.org (Postfix) with ESMTP id CEE059DC
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 14:23:39 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r6GENbX7075932
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 16 Jul 2013 17:23:38 +0300 (EEST)
 (envelope-from daniel@digsys.bg)
Message-ID: <51E55769.4030207@digsys.bg>
Date: Tue, 16 Jul 2013 17:23:37 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130627 Thunderbird/17.0.7
MIME-Version: 1.0
To: Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com>
Subject: Re: ZFS vdev I/O questions
References: <51E5316B.9070201@digsys.bg>
 <20130716115305.GA40918@mwi1.coffeenet.org> <51E54799.8070700@digsys.bg>
 <9d3cf0be165d4351acc5e757de3868ec@DB3PR07MB059.eurprd07.prod.outlook.com>
In-Reply-To: <9d3cf0be165d4351acc5e757de3868ec@DB3PR07MB059.eurprd07.prod.outlook.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 14:23:41 -0000


On 16.07.13 17:09, Ivailo Tanusheff wrote:
> Isn't this some kind of pool fragmentation? Because this is usually the case in such slow parts of the disk systems. I think your pool is getting full and it is heavily fragmented, that's why you have more data for each request on a different vdev.

The pool may be fragmented. But not because it is full. It is fragmented 
because I forgot to add an ZIL when creating the pool, then proceeded to 
heavily use dedup and even some compression. Now, I am rewriting the 
pool's data and hopefully metadata, in userland, for the lack of better 
technology, primarily by doing zfs send/receive of various datasets then 
removing the originals. That helps me both balance the data across all 
vdevs as well as get rid of dedup and compression (that go to other 
pools with less deletes).

My guess is this is more specifically metadata fragmentation. But 
fragmentation does not fully explain why the writes are so irregular -- 
writes should be grouped easily, especially metadata rewrites... and 
what is ZFS doing while not reading or writing (many seconds)?

Morale: always add an ZIL to an ZFS pool, as this will save you to deal 
with fragmentation later. Depending on the pool usage, even an normal 
drive could do. Writes to the ZIL are sequential.

> But this has nothing to do with the single, slow device :(
>

That drive is slow only when doing lots of small I/O. For bulk writes 
(which ZFS should be doing anyway with the kind of data this pool 
holds), it is actually faster than the Hitachi's. It will eventually get 
replaced soon.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 15:33:36 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 3718CE53;
 Tue, 16 Jul 2013 15:33:36 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 0E69FE91;
 Tue, 16 Jul 2013 15:33:36 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2082DB917;
 Tue, 16 Jul 2013 11:33:34 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Subject: Re: RAID10 stripe size and PostgreSQL performance
Date: Tue, 16 Jul 2013 11:04:53 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; )
References: <CAEGTrdKZetiwczu=39433KtX3Px7vgsJxzFBB-_AtAgsKoYkdw@mail.gmail.com>
 <krp1s0$atb$1@ger.gmane.org>
 <CAEGTrd+9YbSRvZuA_WyAimE13fO-YtZPV37dm2vmv5FDTKAmCg@mail.gmail.com>
In-Reply-To: <CAEGTrd+9YbSRvZuA_WyAimE13fO-YtZPV37dm2vmv5FDTKAmCg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201307161104.54089.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 16 Jul 2013 11:33:34 -0400 (EDT)
Cc: freebsd-database@freebsd.org, Ivan Voras <ivoras@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 15:33:36 -0000

On Friday, July 12, 2013 3:15:03 pm Artem Naluzhnyy wrote:
> On Fri, Jul 12, 2013 at 4:55 PM, Ivan Voras <ivoras@freebsd.org> wrote:
> > I just looked at your RAID configuration at http://pastebin.com/F8uZEZdm
> > and you have a mirror of stripes (RAID-01) nor a stripe of mirrors
> > (RAID-10). And apparently, is I parse your configuration correctly, you
> > have a 1M stripe in the MIRROR part of the RAID, and an unknown stripe
> > size in the STRIPE part.
> 
> This is probably a bug in mfiutil output. There is no "RAID 01" option
> in the controller configuration, and its documentation says
> (http://goo.gl/6X5pe):
> 
> "RAID 10, a combination of RAID 0 and RAID 1, consists of striped data
> across mirrored spans. A RAID 10 drive group is a spanned drive group
> that creates a striped set from a series of mirrored drives. RAID 10
> allows a maximum of eight spans. You must use an even number of
> configuration Scenarios 1-7 drives in each RAID virtual drive in the
> span. The RAID 1 virtual drives must have the same stripe size."
> 
> There is also no options to configure a different stripe size for the
> mirrors, I can only set it globally for the whole RAID 10 volume.

It is true that mfi only does stripes across RAID mirrors.  mfiutil depends on 
the secondary raid level being set in the ddf info for detecting a RAID-10 vs 
a RAID-1, but not all mfi BIOS-configured volumes have that set.  It should
probably check if a volume spans multiple arrays instead.

-- 
John Baldwin

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 19:33:40 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 97BF6C55;
 Tue, 16 Jul 2013 19:33:40 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 79B0ACF2;
 Tue, 16 Jul 2013 19:33:39 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA15503;
 Tue, 16 Jul 2013 22:33:37 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1UzB0D-000Lxa-CR; Tue, 16 Jul 2013 22:33:37 +0300
Message-ID: <51E59FD9.4020103@FreeBSD.org>
Date: Tue, 16 Jul 2013 22:32:41 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Adrian Chadd <adrian@FreeBSD.org>, freebsd-fs@FreeBSD.org
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
In-Reply-To: <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-current <freebsd-current@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 19:33:40 -0000

on 10/07/2013 19:50 Adrian Chadd said the following:
> On 9 July 2013 23:27, Andriy Gapon <avg@freebsd.org> wrote:
>> on 09/07/2013 16:03 Adrian Chadd said the following:
>>> Does anyone have any ideas as to what's going on?
>>
>> Please provide output of 'thread apply all bt' from kgdb, then perhaps someone
>> might be able to tell.
> 
> Done - http://people.freebsd.org/~adrian/ath/20130710-vm0-zfs-hang.txt

vmcore.0 was useless for some reason - an interesting address was not accessible.
vmcore.1 seems to be very similar and is actually useful.

This problem looks like an interesting deadlock involving ZFS and VFS and vnode
shortage.
The most obvious things are that many threads could not allocate a new vnode and
are waiting in getnewvnode_reserve and also many threads are stuck waiting on
vnode locks held by the former threads.
In effect, they all wait for vnlru, which in turn is stuck in
zfs_freebsd_reclaim on z_teardown_lock.
That lock is held by a thread doing a rollback ioctl.
And that thread waits for zfs sync thread to actually perform the rollback.
The sync thread waits on zfs quiesce thread to declare the current transaction
group as quiesced.
The quiesce thread, obviously, waits for all operations running in the current
transaction group to complete.
Some of those operations are e.g. VOP_CREATE -> zfs_create.  They already
started a zfs transaction (as a part of the current transaction group) and they
execute zfs_mknode which needs a new vnode.  So these threads are waiting for a
new vnode and do not let the current transaction group become quiesced.
GOTO beginning.

Compressing the above description to the extreme, it boils down to: ZFS needs a
new vnode from vnlru and is waiting on it, while vnlru has to wait on ZFS.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 19:40:37 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4F3ADEEC;
 Tue, 16 Jul 2013 19:40:37 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com
 [IPv6:2a00:1450:400c:c05::233])
 by mx1.freebsd.org (Postfix) with ESMTP id 8E8F2D66;
 Tue, 16 Jul 2013 19:40:36 +0000 (UTC)
Received: by mail-wi0-f179.google.com with SMTP id hj3so1109517wib.12
 for <multiple recipients>; Tue, 16 Jul 2013 12:40:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=e4g4eB7EyD1Iooh4RReiuCMqz0WX8xTsz/RvmJVnyr4=;
 b=e9lvg/uR8ueYsWfKdCrYICdjf0cFt5NYEhsrzeH4PnXcbiYDJgFg25ONdoVSIBl+OU
 +ElnQ6nDnyTTCw6BEo6jyXhhXdmonxZcNh0FZ9C3YkfWaoNmE5aqwY+NiRNWOHFmxnjc
 x5lf20441lW/dagqWFajN7+lRYP/JBmN3bYA9P6lUEdX+YVe49BgSLPD6uq1nuGKnWRj
 RXAcgkJAbwZVTV86cW066V+f/A1++abFIvKrcCy9DcXFPfm/zSekwsbRzjmTzM6+vG0v
 mJ+0Kau6ID9NaeIlMzHubZTUnxGlX1neIrC972dPWlCnFcELVbtQMeiXhJRcLDkNTZpb
 ZxlA==
MIME-Version: 1.0
X-Received: by 10.180.39.212 with SMTP id r20mr2286284wik.30.1374003635341;
 Tue, 16 Jul 2013 12:40:35 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.217.94.132 with HTTP; Tue, 16 Jul 2013 12:40:35 -0700 (PDT)
In-Reply-To: <51E59FD9.4020103@FreeBSD.org>
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
Date: Tue, 16 Jul 2013 12:40:35 -0700
X-Google-Sender-Auth: GCnD91X04MXzJpmn1mauhv_YnNw
Message-ID: <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
Subject: Re: Deadlock in nullfs/zfs somewhere
From: Adrian Chadd <adrian@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs@freebsd.org, freebsd-current <freebsd-current@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 19:40:37 -0000

On 16 July 2013 12:32, Andriy Gapon <avg@freebsd.org> wrote:

> vmcore.0 was useless for some reason - an interesting address was not accessible.

Eek.

> vmcore.1 seems to be very similar and is actually useful.

Oh good.

> This problem looks like an interesting deadlock involving ZFS and VFS and vnode
> shortage.
> The most obvious things are that many threads could not allocate a new vnode and
> are waiting in getnewvnode_reserve and also many threads are stuck waiting on
> vnode locks held by the former threads.
> In effect, they all wait for vnlru, which in turn is stuck in
> zfs_freebsd_reclaim on z_teardown_lock.
> That lock is held by a thread doing a rollback ioctl.
> And that thread waits for zfs sync thread to actually perform the rollback.
> The sync thread waits on zfs quiesce thread to declare the current transaction
> group as quiesced.
> The quiesce thread, obviously, waits for all operations running in the current
> transaction group to complete.
> Some of those operations are e.g. VOP_CREATE -> zfs_create.  They already
> started a zfs transaction (as a part of the current transaction group) and they
> execute zfs_mknode which needs a new vnode.  So these threads are waiting for a
> new vnode and do not let the current transaction group become quiesced.
> GOTO beginning.
>
> Compressing the above description to the extreme, it boils down to: ZFS needs a
> new vnode from vnlru and is waiting on it, while vnlru has to wait on ZFS.

:(  So it's a deadlock. Ok, so what's next?


-adrian

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 20:26:10 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 44BB7D6B
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 20:26:10 +0000 (UTC)
 (envelope-from javocado@gmail.com)
Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com
 [IPv6:2a00:1450:4010:c04::22d])
 by mx1.freebsd.org (Postfix) with ESMTP id C54C1F5C
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 20:26:09 +0000 (UTC)
Received: by mail-lb0-f173.google.com with SMTP id v1so962541lbd.4
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 13:26:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=STdwzaAPNi9jZB3iLbfPZJK29KnZSWsKkVMqYtxQplw=;
 b=cpAE+ftGI47DfaibXYo6lcFOsygTzmi+qPhuK67ceyuPiG42vIpIWMWfe5oGLWIugR
 PAQmoc0eefv6cTPBLEsNNj1uyfWPs+eMMOdyIL7nun4x1wpE+HYkFJ56ceZ8N64pfm2F
 8WuSQN5vXST52xx4hwr+wlFfif+m8zy0lATdISnVw48NLxbVDo7lIi1FczbAtZf+AF9g
 ao1fUsuU32aQeByo7VA5GqnBKe5p3qQPk57eE/RhC4mYIqEfCrUH2nGbTZCOtuHOLacH
 NlpK1XjzCMK0Z7P0kyP7J4aFrZGvHR5RvEYrTjAXkBpj0rti8spQ29so3709QK1Ry47J
 3fWQ==
MIME-Version: 1.0
X-Received: by 10.152.27.9 with SMTP id p9mr1550507lag.4.1374006368684; Tue,
 16 Jul 2013 13:26:08 -0700 (PDT)
Received: by 10.114.98.42 with HTTP; Tue, 16 Jul 2013 13:26:08 -0700 (PDT)
Date: Tue, 16 Jul 2013 13:26:08 -0700
Message-ID: <CAP1HOmS6+MziqruTsdgvjYxHEb-B6Y05SCmgNomwHvtJTt-x6Q@mail.gmail.com>
Subject: ZFS memory exhaustion?
From: javocado <javocado@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 20:26:10 -0000

I have a couple questions:

- what does it look like when zfs needs more physical memory / is running
out of memory for its operations?

- what diagnostic numbers (vmstat, etc.) should I watch for that?

swapinfo shows zero (basically zero) swap usage, so it doesn't look like
things get that bad.

Thanks

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 20:50:35 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id AFC9B3BB
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 20:50:35 +0000 (UTC)
 (envelope-from fjwcash@gmail.com)
Received: from mail-qe0-x232.google.com (mail-qe0-x232.google.com
 [IPv6:2607:f8b0:400d:c02::232])
 by mx1.freebsd.org (Postfix) with ESMTP id 7449BEF
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 20:50:35 +0000 (UTC)
Received: by mail-qe0-f50.google.com with SMTP id f6so687080qej.37
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 13:50:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=hIZPC6bzLhXC1kMfDLu3ymjyasMXmaZw7TeZgn7+9Yk=;
 b=j47h+yi5b7SZEMrWKfDOt1ng8DxYSByxfBMh+tKT7GkzbftVYcPMZJ2Ni8bRzlfDzg
 5wYYrLgEPnmqYyjC+HnfIBSi2DH2lg31VPMHV8VAm24ASXiU4wdhgzW/gxAkll8LaGeX
 MKH5HYWdryrnJFk1pmRvV21zGKW72tOBLJThO/JsIER88o/jsV/icVU3yAy8wPYoRZ+O
 KyFH+sAPeQ6PbeOWIHv4Elpc91MB51Ky1GiE7sswLM7Y7+UvxOzdKQN+pNXTJwcjSAU0
 OiC9JN4r4oAgpdwhxHDFLry/9H9hhFkWu9sbJeNELG+vRWP1QGOS/oPEPTvW/g6eTZzW
 lkWA==
MIME-Version: 1.0
X-Received: by 10.224.79.14 with SMTP id n14mr5545663qak.114.1374007834974;
 Tue, 16 Jul 2013 13:50:34 -0700 (PDT)
Received: by 10.49.49.135 with HTTP; Tue, 16 Jul 2013 13:50:34 -0700 (PDT)
In-Reply-To: <CAP1HOmS6+MziqruTsdgvjYxHEb-B6Y05SCmgNomwHvtJTt-x6Q@mail.gmail.com>
References: <CAP1HOmS6+MziqruTsdgvjYxHEb-B6Y05SCmgNomwHvtJTt-x6Q@mail.gmail.com>
Date: Tue, 16 Jul 2013 13:50:34 -0700
Message-ID: <CAOjFWZ6JZsVPBByczELR2e9UNS6HdHLuU+1J544CmFLX77ZBfQ@mail.gmail.com>
Subject: Re: ZFS memory exhaustion?
From: Freddie Cash <fjwcash@gmail.com>
To: javocado <javocado@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 20:50:35 -0000

On Tue, Jul 16, 2013 at 1:26 PM, javocado <javocado@gmail.com> wrote:

> I have a couple questions:
>
> - what does it look like when zfs needs more physical memory / is running
> out of memory for its operations?
>

Disk I/O drops to 0, reading/writing any file from the pool appears to
"hang" the console, programs already loaded into RAM continue to work so
long as they don't touch the pool, etc.


>
> - what diagnostic numbers (vmstat, etc.) should I watch for that?
>
> Top output will show Wired at/near 100% of RAM.


> swapinfo shows zero (basically zero) swap usage, so it doesn't look like
> things get that bad.
>
>
ZFS uses non-swappable kernel memory, so you won't ever see swap used when
ZFS runs out of RAM.

Those are the symptoms we've noticed when our ZFS systems have run out of
RAM.


-- 
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 21:55:05 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 8B745278
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 21:55:05 +0000 (UTC)
 (envelope-from javocado@gmail.com)
Received: from mail-la0-x231.google.com (mail-la0-x231.google.com
 [IPv6:2a00:1450:4010:c03::231])
 by mx1.freebsd.org (Postfix) with ESMTP id 15383385
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 21:55:04 +0000 (UTC)
Received: by mail-la0-f49.google.com with SMTP id ea20so941602lab.22
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 14:55:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=RyM3e43SBQoypfvbI0MCPx2WFRYOHVndSDQUETGCKNI=;
 b=bSvnIHDiuzoz61XNg0tipW+Fded7gVwEt3V/oV37s+VW+vzhIRrPWKsscDJ5n46w8Y
 3tYEB8Y1Rnjlmhq8ytUlqaBjf+OKK/uMKQdeHERakuuAL+y04gWdsWME3CwPXP4JaIAJ
 VU7dLt/TTEu8BFKNnyqOx6sCkVu8keMfWscf+d7gVb+iUEtCE+Ad9V7mN20PXvRsuwd2
 syqbaChJaZPbHqaKd6fmigXMufGuzxgTfzGV+PVtgUrft8mk3tAGFn4FfCO/9upqK5g0
 jCVZYksSUCLZPGHuoduEnp7bY42BkvoHUx85pUjGE4R6Ce0IzdoMHtZyaLvgqrF3Vkq9
 UBlw==
MIME-Version: 1.0
X-Received: by 10.152.25.169 with SMTP id d9mr1594497lag.63.1374011704047;
 Tue, 16 Jul 2013 14:55:04 -0700 (PDT)
Received: by 10.114.98.42 with HTTP; Tue, 16 Jul 2013 14:55:04 -0700 (PDT)
In-Reply-To: <CAOjFWZ6JZsVPBByczELR2e9UNS6HdHLuU+1J544CmFLX77ZBfQ@mail.gmail.com>
References: <CAP1HOmS6+MziqruTsdgvjYxHEb-B6Y05SCmgNomwHvtJTt-x6Q@mail.gmail.com>
 <CAOjFWZ6JZsVPBByczELR2e9UNS6HdHLuU+1J544CmFLX77ZBfQ@mail.gmail.com>
Date: Tue, 16 Jul 2013 14:55:04 -0700
Message-ID: <CAP1HOmSvyMUFp7jREodNRRS-vM-eabFOn-OCcmP8YpLaiJ0mRA@mail.gmail.com>
Subject: Re: ZFS memory exhaustion?
From: javocado <javocado@gmail.com>
To: Freddie Cash <fjwcash@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 21:55:05 -0000

Thank you. Is gstat the best way to watch zfs i/o ?


On Tue, Jul 16, 2013 at 1:50 PM, Freddie Cash <fjwcash@gmail.com> wrote:

> On Tue, Jul 16, 2013 at 1:26 PM, javocado <javocado@gmail.com> wrote:
>
>> I have a couple questions:
>>
>> - what does it look like when zfs needs more physical memory / is running
>> out of memory for its operations?
>>
>
> Disk I/O drops to 0, reading/writing any file from the pool appears to
> "hang" the console, programs already loaded into RAM continue to work so
> long as they don't touch the pool, etc.
>
>
>>
>> - what diagnostic numbers (vmstat, etc.) should I watch for that?
>>
>> Top output will show Wired at/near 100% of RAM.
>
>
>> swapinfo shows zero (basically zero) swap usage, so it doesn't look like
>> things get that bad.
>>
>>
> ZFS uses non-swappable kernel memory, so you won't ever see swap used when
> ZFS runs out of RAM.
>
> Those are the symptoms we've noticed when our ZFS systems have run out of
> RAM.
>
>
> --
> Freddie Cash
> fjwcash@gmail.com
>

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 22:24:03 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id DA46EB49
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 22:24:03 +0000 (UTC)
 (envelope-from gezeala@gmail.com)
Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com
 [IPv6:2a00:1450:4010:c03::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id 63CFA67E
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 22:24:03 +0000 (UTC)
Received: by mail-la0-f42.google.com with SMTP id eb20so990523lab.15
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 15:24:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=JtFKpZrQ2tE+KnNo5PbghaEcbeKN152+dypnW3f4xTI=;
 b=Fy7GV+aHdVY5Jkgm70EQ8FxxBD6DLyahA859LrqjHmDp6o7dlJuypm5Fk4C8EMhc2m
 NZGF9o5E9be6MNSTIqp69Njbgc2Ct1Qn5Yo3qgFt73fcOc7K3Hp8PEyP7EfNJgKJGgKH
 4Kp/SmpCd7dZIK9BdHbEd8zfK3L2aeoTQLyj9bMv4KiGgUlbKKldbUcZ2XkkKd4m5EMg
 bLG0ldqSs0QDH37Y8yLiHRvKtHdDFwNUtd6whkU2aRapaSfA2Kv18qSPPK+QHxM6PBkF
 PlCgw2Ykjj56bfziS23i4OwjiP6wz8yilaxFZbffFlInyBR77sZYr56l9IzFXqlrvhVB
 9QKg==
X-Received: by 10.152.19.70 with SMTP id c6mr1708656lae.13.1374013442273; Tue,
 16 Jul 2013 15:24:02 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.82.72 with HTTP; Tue, 16 Jul 2013 15:23:22 -0700 (PDT)
In-Reply-To: <CAP1HOmSvyMUFp7jREodNRRS-vM-eabFOn-OCcmP8YpLaiJ0mRA@mail.gmail.com>
References: <CAP1HOmS6+MziqruTsdgvjYxHEb-B6Y05SCmgNomwHvtJTt-x6Q@mail.gmail.com>
 <CAOjFWZ6JZsVPBByczELR2e9UNS6HdHLuU+1J544CmFLX77ZBfQ@mail.gmail.com>
 <CAP1HOmSvyMUFp7jREodNRRS-vM-eabFOn-OCcmP8YpLaiJ0mRA@mail.gmail.com>
From: =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
Date: Tue, 16 Jul 2013 15:23:22 -0700
Message-ID: <CAJKO3mXi8-g2LtGRF1rFFDn7PAg-udAXx7YxbfZF+Su6HE9+Og@mail.gmail.com>
Subject: Re: ZFS memory exhaustion?
To: javocado <javocado@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 22:24:03 -0000

On Tue, Jul 16, 2013 at 2:55 PM, javocado <javocado@gmail.com> wrote:

> Thank you. Is gstat the best way to watch zfs i/o ?
>

YMMV and stats you are looking for..

I use these a lot:
zpool iostat 1
zpool iostat -v 1
zpool iostat -v _pool_name_ 1
zpool iostat -v _poolA_ _poolB_ 1
iostat -xz -w1 -h

not much:
systat -iostat  ==> catch: there are several options although it still
truncates output and stats displayed is limited with your screen size
gstat

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 22:47:27 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id BDC6CF65
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 22:47:27 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from melon.pingpong.net (melon.pingpong.net [79.136.116.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 5D0AD76A
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 22:47:27 +0000 (UTC)
Received: from girgbook.lan
 (c-0f54e155.1525-1-64736c12.cust.bredbandsbolaget.se [85.225.84.15])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by melon.pingpong.net (Postfix) with ESMTPSA id BE2992E651;
 Wed, 17 Jul 2013 00:47:23 +0200 (CEST)
Message-ID: <51E5CD7A.2020109@FreeBSD.org>
Date: Wed, 17 Jul 2013 00:47:22 +0200
From: Palle Girgensohn <girgen@FreeBSD.org>
User-Agent: Postbox 3.0.8 (Macintosh/20130427)
MIME-Version: 1.0
To: Kirk McKusick <mckusick@mckusick.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
References: <201307151932.r6FJWSxM087108@chez.mckusick.com>
In-Reply-To: <201307151932.r6FJWSxM087108@chez.mckusick.com>
X-Enigmail-Version: 1.2.3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Jeff Roberson <jroberson@jroberson.net>,
 Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 22:47:27 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kirk McKusick skrev:
>> Date: Mon, 15 Jul 2013 10:51:10 +0100 Subject: Re: leaking lots of
>> unreferenced inodes (pg_xlog files?) From: Dan Thomas
>> <godders@gmail.com> To: Kirk McKusick <mckusick@mckusick.com> Cc:
>> Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org, Jeff
>> Roberson <jroberson@jroberson.net>, Julian Akehurst
>> <julian@pingpong.se> X-ASK-Info: Message Queued (2013/07/15
>> 02:51:22) X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
>> 
>> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com>
>> wrote:
>>> OK, good to have it narrowed down. I will look to devise some 
>>> additional diagnostics that hopefully will help tease out the 
>>> bug. I'll hopefully get back to you soon.
>> Hi,
>> 
>> Is there any news on this issue? We're still running several
>> servers that are exhibiting this problem (most recently, one that
>> seems to be leaking around 10gb/hour), and it's getting to the
>> point where we're looking at moving to a different OS until it's
>> resolved.
>> 
>> We have access to several production systems with this problem and
>> (at least from time to time) will have systems with a significant
>> leak on them that we can experiment with. Is there any way we can
>> assist with tracking this down? Any diagnostics or testing that
>> would be useful?
>> 
>> Thanks, Dan
> 
> Hi Dan (and Palle),
> 
> Sorry for the long delay with no help / news. I have gotten 
> side-tracked on several projects and have had little time to try and
> devise some tests that would help find the cause of the lost space.
> It almost certainly is a one-line fix (a missing vput or vrele
> probably in some error path), but finding where it goes is the hard
> part :-)
> 
> I have had little success in inserting code that tracks reference 
> counts (too many false positives). So, I am going to need some help 
> from you to narrow it down. My belief is that there is some set of 
> filesystem operations (system calls) that are leading to the
> problem. Notably, a file is being created, data put into it, then the
> file is deleted (either before or after being closed).  Somehow a
> reference to that file is persisting despite there being no valid
> reference to it. Hence the filesystem thinks it is still live and is
> not deleting it. When you do the forcible unmount, these files get 
> cleared and the space shows back up.
> 
> What I need to devise is a small test program doing the set of system
> calls that cause this to happen. The way that I would like to try and
> get it is to have you `ktrace -i' your application and then run your
> application just long enough to create at least one of these lost
> files. The goal is to minimize the amount of ktrace data through
> which we need to sift.
> 
> In preparation for doing this test you need to have a kernel compiled
> with `option DIAGNOSTIC' or if you prefer, just add `#define
> DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you
> have at least one offending file when you try to unmount the affected
> filesystem and find it busy. Before doing the `umount -f', enable
> busy printing using `sysctl debug.busyprt=1'. Then capture the
> console output which will show the details of all the vnodes that had
> to be forcibly flushed. Hopefully we will then be able to correlate
> them back to the files (NAMI in the ktrace output) with which they
> were associated. We may need to augment the NAMI data with the inode
> number of the associated file to make the association with the
> busyprt output. Anyway, once we have that, we can look at all the
> system calls done on those files and create a small test program that
> exhibits the problem. Given a small test program, Jeff or I can track
> down the offending system call path and nail this pernicious bug once
> and for all.
> 
> Kirk McKusick

Hi,

I have run ktrace -i on pg_ctl (which forks off all the postgresql
processes) and I got two "busy" files that where "lost" after a few
hours. dmesg reveals this:

vflush: busy vnode
0xfffffe067cdde960: tag ufs, type VREG
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags (VI(0x200))
 VI_LOCKed    v_object 0xfffffe0335922000 ref 0 pages 0
    lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
	ino 11047146, on dev da2s1d
vflush: busy vnode
0xfffffe039f35bb40: tag ufs, type VREG
    usecount 1, writecount 0, refcount 3 mountedhere 0
    flags (VI(0x200))
 VI_LOCKed    v_object 0xfffffe03352701d0 ref 0 pages 0
    lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
	ino 11045961, on dev da2s1d


I had to umount -f, so they where "lost".

So, now I have 55 GB ktrace output... ;)  Is there anything I can do to
filter it, or shall I compress it and put it on a web server for you to
fetch as it is?

Palle

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJR5c16AAoJEIhV+7FrxBJDK0AH/RLG1QLdyQhwNC6USlqO2+2B
6HXmYwbmDCMIlUQZAaG4h0x6QPzWjXWYMa1KDdpk/BtRhfL7z8tFPdWjTzqBPuK1
aEEQjv/Cp5IgI6FqVbc2agW3GfUwomtjEL3lUk2zmKdPImEWte6ZkLzOFgQpqQao
QAxFnN0I8/g+ynQNQIavGOo0foze89wAuOaNvoy9z1wa7tFbjlH2lsVK1xGU6eNj
AQn4RJw+tMPMGkNMy6Xjy7B/WMXfxutz1f4O9B1KBwLRZ/cgKxhmppoZdF3N4JsK
GNiQvcRbYR9GhBiK+Er87UXKBcj2NS+QQsdSqIb5Ik1ahp78hjxq3raHuOLCTLw=
=8+W4
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 00:29:36 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id B14CC2E0
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 00:29:36 +0000 (UTC)
 (envelope-from gezeala@gmail.com)
Received: from mail-la0-x234.google.com (mail-la0-x234.google.com
 [IPv6:2a00:1450:4010:c03::234])
 by mx1.freebsd.org (Postfix) with ESMTP id 3518BA17
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 00:29:36 +0000 (UTC)
Received: by mail-la0-f52.google.com with SMTP id fo12so1044101lab.11
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 17:29:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=WTqss1WK/2Z2odUxS2+sI6OJu5G5J7nMzHOlcf4qoJA=;
 b=b1ZvuxDNkvIKtIqpnoSoJktpQR+RLzlelVswps8YgdL27f/JquItSP98n9M2ZmJfJY
 tJMIrJe8Qu+Sp/cAvNqPskCdFocxkK/RDYjegYs9+wVJ2VO6GweSjuCrrJCXNSWm+r6c
 HSM259UkyhJFtvqjx+AtrDPHIIvAwUEUQ5HXNJU9xiXoij+cb5QM+M4gWex3AbZZQElq
 meh01OdT0URnqMdPUgaqXwotGc1Nk6+GrfpYE496BdZBWi2ifKU11KFR6iIZnirZZsKt
 r6Bczcgjq3hUNHBnCLUV+DhHQfF6Vh438uNif5O+49/YuM5DzX4TKYk2+oR+aCNXEYRI
 +m5A==
X-Received: by 10.112.167.100 with SMTP id zn4mr2143351lbb.44.1374020975173;
 Tue, 16 Jul 2013 17:29:35 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.82.72 with HTTP; Tue, 16 Jul 2013 17:28:55 -0700 (PDT)
In-Reply-To: <CAOjFWZ5CWV3UZRppM3nTehfTPaw1N+w6LjsEZZGxE16DOkS+GA@mail.gmail.com>
References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com>
 <51D42107.1050107@digsys.bg>
 <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se>
 <51D437E2.4060101@digsys.bg>
 <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk>
 <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se>
 <20130704000405.GA75529@icarus.home.lan>
 <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se>
 <20130704171637.GA94539@icarus.home.lan>
 <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se>
 <20130704191203.GA95642@icarus.home.lan>
 <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk>
 <CAOjFWZ4obK1cSmvTpW+t4xKdMf+kJV5w-sujDT1AZoepj+5YrA@mail.gmail.com>
 <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk>
 <51D6A206.2020303@digsys.bg>
 <CAOjFWZ5CWV3UZRppM3nTehfTPaw1N+w6LjsEZZGxE16DOkS+GA@mail.gmail.com>
From: =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
Date: Tue, 16 Jul 2013 17:28:55 -0700
Message-ID: <CAJKO3mW=ahm7sBdjGc-b2tN3D7+QH7gR7UPt24RsqUdZf=+jvA@mail.gmail.com>
Subject: Re: Slow resilvering with mirrored ZIL
To: Freddie Cash <fjwcash@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 00:29:36 -0000

On Fri, Jul 5, 2013 at 6:08 PM, Freddie Cash <fjwcash@gmail.com> wrote:

>
> ZFS- on-Linux has added this as "-o ashift=" property for zpool create.
>
> There's a threat on the illumos list about standardising this s across all
> ZFS- using OSes.
>
>
>
+1 on this. We tested zfs-on-linux last year and it does automatically
handle disk partitioning for correct alignment. What we do is just add
ashift=12 option during zpool create. No more gpart/gnop/ashift/import
steps.

http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdvacedFormatDrives


Back to FreeBSD ZFS,

After reading the thread, I'm still at a loss on this (too much info I
guess).. regarding gpart/gnop/ashift tweaks for alignment, do we still need
to perform gpart on newly purchased (SSD/SATA/SAS) Advanced Format drives?
Or, skip gpart and proceed with gnop/ashift only?

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 01:47:54 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id B25F7CB1
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 01:47:54 +0000 (UTC)
 (envelope-from wblock@wonkity.com)
Received: from wonkity.com (wonkity.com [67.158.26.137])
 by mx1.freebsd.org (Postfix) with ESMTP id 532EEBD5
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 01:47:54 +0000 (UTC)
Received: from wonkity.com (localhost [127.0.0.1])
 by wonkity.com (8.14.7/8.14.7) with ESMTP id r6H1lrRE084549;
 Tue, 16 Jul 2013 19:47:53 -0600 (MDT)
 (envelope-from wblock@wonkity.com)
Received: from localhost (wblock@localhost)
 by wonkity.com (8.14.7/8.14.7/Submit) with ESMTP id r6H1lrCT084546;
 Tue, 16 Jul 2013 19:47:53 -0600 (MDT)
 (envelope-from wblock@wonkity.com)
Date: Tue, 16 Jul 2013 19:47:53 -0600 (MDT)
From: Warren Block <wblock@wonkity.com>
To: =?ISO-8859-15?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
Subject: Re: Slow resilvering with mirrored ZIL
In-Reply-To: <CAJKO3mW=ahm7sBdjGc-b2tN3D7+QH7gR7UPt24RsqUdZf=+jvA@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1307161935020.84500@wonkity.com>
References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com>
 <51D42107.1050107@digsys.bg>
 <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se>
 <51D437E2.4060101@digsys.bg>
 <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk>
 <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se>
 <20130704000405.GA75529@icarus.home.lan>
 <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se>
 <20130704171637.GA94539@icarus.home.lan>
 <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se>
 <20130704191203.GA95642@icarus.home.lan>
 <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk>
 <CAOjFWZ4obK1cSmvTpW+t4xKdMf+kJV5w-sujDT1AZoepj+5YrA@mail.gmail.com>
 <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk>
 <51D6A206.2020303@digsys.bg>
 <CAOjFWZ5CWV3UZRppM3nTehfTPaw1N+w6LjsEZZGxE16DOkS+GA@mail.gmail.com>
 <CAJKO3mW=ahm7sBdjGc-b2tN3D7+QH7gR7UPt24RsqUdZf=+jvA@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
 BOUNDARY="3512871622-1111948915-1374025673=:84500"
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (wonkity.com [127.0.0.1]); Tue, 16 Jul 2013 19:47:53 -0600 (MDT)
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 01:47:54 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--3512871622-1111948915-1374025673=:84500
Content-Type: TEXT/PLAIN; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8BIT

On Tue, 16 Jul 2013, Gezeala M. Bacu�o II wrote:

> On Fri, Jul 5, 2013 at 6:08 PM, Freddie Cash <fjwcash@gmail.com> wrote:
>
>>
>> ZFS- on-Linux has added this as "-o ashift=" property for zpool create.
>>
>> There's a threat on the illumos list about standardising this s across all
>> ZFS- using OSes.
>>
>>
>>
> +1 on this. We tested zfs-on-linux last year and it does automatically
> handle disk partitioning for correct alignment. What we do is just add
> ashift=12 option during zpool create. No more gpart/gnop/ashift/import
> steps.
>
> http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdvacedFormatDrives
>
>
> Back to FreeBSD ZFS,
>
> After reading the thread, I'm still at a loss on this (too much info I
> guess).. regarding gpart/gnop/ashift tweaks for alignment, do we still need
> to perform gpart on newly purchased (SSD/SATA/SAS) Advanced Format drives?
> Or, skip gpart and proceed with gnop/ashift only?

If ZFS goes on a bare drive, it will be aligned by default.  If ZFS is 
going in a partition, yes, align that partition to 4K boundaries or 
larger multiples of 4K, like 1M.

The gnop/ashift workaround is just to get ZFS to use the right block 
size.  So if you don't take care to get partition alignment right, you 
might end up using the right block size but misaligned.

And yes, it will be nice to be able to just explicitly tell ZFS the 
block size to use.
--3512871622-1111948915-1374025673=:84500--

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 05:34:42 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id F127B2C4;
 Wed, 17 Jul 2013 05:34:42 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 222EA6A4;
 Wed, 17 Jul 2013 05:34:41 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6H5YWki097217;
 Wed, 17 Jul 2013 08:34:32 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6H5YWki097217
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6H5YWcq097216;
 Wed, 17 Jul 2013 08:34:32 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 17 Jul 2013 08:34:31 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Palle Girgensohn <girgen@FreeBSD.org>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
Message-ID: <20130717053431.GN5991@kib.kiev.ua>
References: <201307151932.r6FJWSxM087108@chez.mckusick.com>
 <51E5CD7A.2020109@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="LHvWgpbS7VDUdu2f"
Content-Disposition: inline
In-Reply-To: <51E5CD7A.2020109@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org,
 Jeff Roberson <jroberson@jroberson.net>, Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 05:34:43 -0000


--LHvWgpbS7VDUdu2f
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 17, 2013 at 12:47:22AM +0200, Palle Girgensohn wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Kirk McKusick skrev:
> >> Date: Mon, 15 Jul 2013 10:51:10 +0100 Subject: Re: leaking lots of
> >> unreferenced inodes (pg_xlog files?) From: Dan Thomas
> >> <godders@gmail.com> To: Kirk McKusick <mckusick@mckusick.com> Cc:
> >> Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org, Jeff
> >> Roberson <jroberson@jroberson.net>, Julian Akehurst
> >> <julian@pingpong.se> X-ASK-Info: Message Queued (2013/07/15
> >> 02:51:22) X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
> >>=20
> >> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com>
> >> wrote:
> >>> OK, good to have it narrowed down. I will look to devise some=20
> >>> additional diagnostics that hopefully will help tease out the=20
> >>> bug. I'll hopefully get back to you soon.
> >> Hi,
> >>=20
> >> Is there any news on this issue? We're still running several
> >> servers that are exhibiting this problem (most recently, one that
> >> seems to be leaking around 10gb/hour), and it's getting to the
> >> point where we're looking at moving to a different OS until it's
> >> resolved.
> >>=20
> >> We have access to several production systems with this problem and
> >> (at least from time to time) will have systems with a significant
> >> leak on them that we can experiment with. Is there any way we can
> >> assist with tracking this down? Any diagnostics or testing that
> >> would be useful?
> >>=20
> >> Thanks, Dan
> >=20
> > Hi Dan (and Palle),
> >=20
> > Sorry for the long delay with no help / news. I have gotten=20
> > side-tracked on several projects and have had little time to try and
> > devise some tests that would help find the cause of the lost space.
> > It almost certainly is a one-line fix (a missing vput or vrele
> > probably in some error path), but finding where it goes is the hard
> > part :-)
> >=20
> > I have had little success in inserting code that tracks reference=20
> > counts (too many false positives). So, I am going to need some help=20
> > from you to narrow it down. My belief is that there is some set of=20
> > filesystem operations (system calls) that are leading to the
> > problem. Notably, a file is being created, data put into it, then the
> > file is deleted (either before or after being closed).  Somehow a
> > reference to that file is persisting despite there being no valid
> > reference to it. Hence the filesystem thinks it is still live and is
> > not deleting it. When you do the forcible unmount, these files get=20
> > cleared and the space shows back up.
> >=20
> > What I need to devise is a small test program doing the set of system
> > calls that cause this to happen. The way that I would like to try and
> > get it is to have you `ktrace -i' your application and then run your
> > application just long enough to create at least one of these lost
> > files. The goal is to minimize the amount of ktrace data through
> > which we need to sift.
> >=20
> > In preparation for doing this test you need to have a kernel compiled
> > with `option DIAGNOSTIC' or if you prefer, just add `#define
> > DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you
> > have at least one offending file when you try to unmount the affected
> > filesystem and find it busy. Before doing the `umount -f', enable
> > busy printing using `sysctl debug.busyprt=3D1'. Then capture the
> > console output which will show the details of all the vnodes that had
> > to be forcibly flushed. Hopefully we will then be able to correlate
> > them back to the files (NAMI in the ktrace output) with which they
> > were associated. We may need to augment the NAMI data with the inode
> > number of the associated file to make the association with the
> > busyprt output. Anyway, once we have that, we can look at all the
> > system calls done on those files and create a small test program that
> > exhibits the problem. Given a small test program, Jeff or I can track
> > down the offending system call path and nail this pernicious bug once
> > and for all.
> >=20
> > Kirk McKusick
>=20
> Hi,
>=20
> I have run ktrace -i on pg_ctl (which forks off all the postgresql
> processes) and I got two "busy" files that where "lost" after a few
> hours. dmesg reveals this:
>=20
> vflush: busy vnode
> 0xfffffe067cdde960: tag ufs, type VREG
>     usecount 1, writecount 0, refcount 2 mountedhere 0
>     flags (VI(0x200))
>  VI_LOCKed    v_object 0xfffffe0335922000 ref 0 pages 0
>     lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
> 	ino 11047146, on dev da2s1d
> vflush: busy vnode
> 0xfffffe039f35bb40: tag ufs, type VREG
>     usecount 1, writecount 0, refcount 3 mountedhere 0
>     flags (VI(0x200))
>  VI_LOCKed    v_object 0xfffffe03352701d0 ref 0 pages 0
>     lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
> 	ino 11045961, on dev da2s1d
>=20
>=20
> I had to umount -f, so they where "lost".
>=20
> So, now I have 55 GB ktrace output... ;)  Is there anything I can do to
> filter it, or shall I compress it and put it on a web server for you to
> fetch as it is?

I think that 55GB of ktrace is obviously useless.  The Kirk' idea was to
have an isolated test case that would only create the situation triggering
the leak, without irrelevant activity.  This indeed requires drilling down
and isolating the file activities to get to the core of problem.

FWIW, I and Peter Holm used the following alternative approach quite
successfully when tracking down other vnode reference leaks.  The approach
still requires some understanding of the specifics of the problematic
files to be useful, but not as much as isolated test.

Basically, you take the patch below, and set the VV_DEBUGVREF flag for
the vnode that has characteristics as much specific for the leaked vnode
as possible.  The patch has example of setting the flag for all new NFS=20
vnodes.  You would probably want to do the same in vfs_vgetf(),
checking  e.g. for the partition where your leaks happen.  The limiting
of the vnodes for which the vref traces are accumulated is needed to
save the kernel memory.

Then after the leak was observed, you just print the vnode with ddb
command 'show vnode addr' and send the output to developer.

Index: sys/sys/vnode.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/sys/vnode.h	(revision 248723)
+++ sys/sys/vnode.h	(working copy)
@@ -94,6 +94,13 @@ struct vpollinfo {
=20
 #if defined(_KERNEL) || defined(_KVM_VNODE)
=20
+struct debug_ref {
+	TAILQ_ENTRY(debug_ref) link;
+	int val;
+	const char *op;
+	struct stack stack;
+};
+
 struct vnode {
 	/*
 	 * Fields which define the identity of the vnode.  These fields are
@@ -169,6 +176,7 @@ struct vnode {
 	int	v_writecount;			/* v ref count of writers */
 	u_int	v_hash;
 	enum	vtype v_type;			/* u vnode type */
+	TAILQ_HEAD(, debug_ref) v_debug_ref;
 };
=20
 #endif /* defined(_KERNEL) || defined(_KVM_VNODE) */
@@ -253,6 +261,7 @@ struct xvnode {
 #define	VV_DELETED	0x0400	/* should be removed */
 #define	VV_MD		0x0800	/* vnode backs the md device */
 #define	VV_FORCEINSMQ	0x1000	/* force the insmntque to succeed */
+#define	VV_DEBUGVREF	0x2000
=20
 /*
  * Vnode attributes.  A field value of VNOVAL represents a field whose val=
ue
Index: sys/kern/vfs_subr.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/kern/vfs_subr.c	(revision 248723)
+++ sys/kern/vfs_subr.c	(working copy)
@@ -71,6 +71,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/sched.h>
 #include <sys/sleepqueue.h>
 #include <sys/smp.h>
+#include <sys/stack.h>
 #include <sys/stat.h>
 #include <sys/sysctl.h>
 #include <sys/syslog.h>
@@ -871,6 +872,23 @@ static struct kproc_desc vnlru_kp =3D {
 };
 SYSINIT(vnlru, SI_SUB_KTHREAD_UPDATE, SI_ORDER_FIRST, kproc_start,
     &vnlru_kp);
+
+MALLOC_DEFINE(M_RECORD_REF, "recordref", "recordref");
+static void
+v_record_ref(struct vnode *vp, int val, const char *op)
+{
+	struct debug_ref *r;
+
+	if ((vp->v_type !=3D VREG && vp->v_type !=3D VBAD) ||
+	    (vp->v_vflag & VV_DEBUGVREF) =3D=3D 0)
+		return;
+	r =3D malloc(sizeof(struct debug_ref), M_RECORD_REF, M_NOWAIT |
+	    M_USE_RESERVE);
+	r->val =3D val;
+	r->op =3D op;
+	stack_save(&r->stack);
+	TAILQ_INSERT_TAIL(&vp->v_debug_ref, r, link);
+}
 =20
 /*
  * Routines having to do with the management of the vnode table.
@@ -1073,6 +1091,7 @@ alloc:
 			vp->v_vflag |=3D VV_NOKNOTE;
 	}
 	rangelock_init(&vp->v_rl);
+	TAILQ_INIT(&vp->v_debug_ref);
=20
 	/*
 	 * For the filesystems which do not use vfs_hash_insert(),
@@ -1082,6 +1101,7 @@ alloc:
 	 */
 	vp->v_hash =3D (uintptr_t)vp >> vnsz2log;
=20
+	TAILQ_INIT(&vp->v_debug_ref);
 	*vpp =3D vp;
 	return (0);
 }
@@ -2197,6 +2217,7 @@ vget(struct vnode *vp, int flags, struct thread *t
 			vinactive(vp, td);
 		vp->v_iflag &=3D ~VI_OWEINACT;
 	}
+	v_record_ref(vp, 1, "vget");
 	VI_UNLOCK(vp);
 	return (0);
 }
@@ -2211,6 +2232,7 @@ vref(struct vnode *vp)
 	CTR2(KTR_VFS, "%s: vp %p", __func__, vp);
 	VI_LOCK(vp);
 	v_incr_usecount(vp);
+	v_record_ref(vp, 1, "vref");
 	VI_UNLOCK(vp);
 }
=20
@@ -2253,6 +2275,7 @@ vputx(struct vnode *vp, int func)
 		KASSERT(func =3D=3D VPUTX_VRELE, ("vputx: wrong func"));
 	CTR2(KTR_VFS, "%s: vp %p", __func__, vp);
 	VI_LOCK(vp);
+	v_record_ref(vp, -1, "vputx");
=20
 	/* Skip this v_writecount check if we're going to panic below. */
 	VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp,
@@ -2409,6 +2432,7 @@ void
 vdropl(struct vnode *vp)
 {
 	struct bufobj *bo;
+	struct debug_ref *r, *r1;
 	struct mount *mp;
 	int active;
=20
@@ -2489,6 +2513,9 @@ vdropl(struct vnode *vp)
 	lockdestroy(vp->v_vnlock);
 	mtx_destroy(&vp->v_interlock);
 	mtx_destroy(BO_MTX(bo));
+	TAILQ_FOREACH_SAFE(r, &vp->v_debug_ref, link, r1) {
+		free(r, M_RECORD_REF);
+	}
 	uma_zfree(vnode_zone, vp);
 }
=20
@@ -2888,6 +2915,8 @@ vn_printf(struct vnode *vp, const char *fmt, ...)
 	va_list ap;
 	char buf[256], buf2[16];
 	u_long flags;
+	int ref;
+	struct debug_ref *r;
=20
 	va_start(ap, fmt);
 	vprintf(fmt, ap);
@@ -2960,8 +2989,21 @@ vn_printf(struct vnode *vp, const char *fmt, ...)
 		    vp->v_object->resident_page_count);
 	printf("    ");
 	lockmgr_printinfo(vp->v_vnlock);
-	if (vp->v_data !=3D NULL)
-		VOP_PRINT(vp);
+#if DDB
+	if (kdb_active) {
+		if (vp->v_data !=3D NULL)
+			VOP_PRINT(vp);
+	}
+#endif
+
+	/* Getnewvnode() initial reference is not recorded due to VNON */
+	ref =3D 1;
+	TAILQ_FOREACH(r, &vp->v_debug_ref, link) {
+		ref +=3D r->val;
+		printf("REF %d %s\n", ref, r->op);
+		stack_print(&r->stack);
+	}
+
 }
=20
 #ifdef DDB
Index: sys/fs/nfsclient/nfs_clport.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/fs/nfsclient/nfs_clport.c	(revision 248723)
+++ sys/fs/nfsclient/nfs_clport.c	(working copy)
@@ -273,6 +273,7 @@ nfscl_nget(struct mount *mntp, struct vnode *dvp,
 		/* vfs_hash_insert() vput()'s the losing vnode */
 		return (0);
 	}
+	vp->v_vflag |=3D VV_DEBUGVREF;
 	*npp =3D np;
=20
 	return (0);
Index: sys/fs/nfsclient/nfs_clnode.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/fs/nfsclient/nfs_clnode.c	(revision 248723)
+++ sys/fs/nfsclient/nfs_clnode.c	(working copy)
@@ -179,6 +179,7 @@ ncl_nget(struct mount *mntp, u_int8_t *fhp, int fh
 		/* vfs_hash_insert() vput()'s the losing vnode */
 		return (0);
 	}
+	vp->v_vflag |=3D VV_DEBUGVREF;
 	*npp =3D np;
=20
 	return (0);

--LHvWgpbS7VDUdu2f
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR5iznAAoJEJDCuSvBvK1BNLMP/1CiRpEvV7G5anywsWJTDMRl
ZyrXwXUUSjWnN7gAb3sHz8JpJf86AW57e7z8V6YBlZCZh7D7SeVwh5UaN7onuOzN
D1RQPmR2AMcU6eybvJwt2fuMvgaTRoncyPN5YdyQK/jWUpdFCoNFxh5wD5RAyIne
wFxQoeF9XFop+kMIBvcT92r5qM6jZIaNwzAChggkBAoh/r9b/DH9WXlUlX/tj+JN
ZJ34yHhCvz1hnRD5hJVMvkYGauZSv2J+0TYS8FCvLXCafOKDtwty8OjnfMGsJJOb
GQ18mEgjpsxJ6lZCvRTsjgrXbgkhHhrIxITzl914cRFPAFnjyLa5lXVsWwO+tdUZ
fQ2PVtFVIX1x5c7sl+UPMEpRWZU9gNs59zBEybDR8vwxyBnbrMyLdOhreZSLaWE/
6KiGKggAg65XNz6DBrhFlJaxZbXH2zlTBeAs1ZThQtclCL9u6jWrKjT3kM7rWmOy
8MD3SrLmV1nSJMRJ62emMGHlDmtHzFfLhT0/1JlLnHKuacXcJdL9dsLZW3t14Qm+
IvjJOAtId5nt4d+e/7RDstGi5ItjOsGmiM1szp2N5tbTrbB6WNmGbEiIJskxn5g7
Ba+GDD9goYCwsY/2AlL7T33QgCeblya9V9cf9j1TTOTjWubQWbzpNe+iMJJvG/le
Lym+qBFczH7WrR0VW1kA
=bhM7
-----END PGP SIGNATURE-----

--LHvWgpbS7VDUdu2f--

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 06:37:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E1865C86
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 06:37:33 +0000 (UTC)
 (envelope-from hostdl@gmail.com)
Received: from mail-ve0-x230.google.com (mail-ve0-x230.google.com
 [IPv6:2607:f8b0:400c:c01::230])
 by mx1.freebsd.org (Postfix) with ESMTP id A9F8E89E
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 06:37:33 +0000 (UTC)
Received: by mail-ve0-f176.google.com with SMTP id c13so1187005vea.7
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 23:37:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=Ad9RO0udsfuUu6SrcRSdyW2Ij5tx9LUFUQkVQdv6lLQ=;
 b=o16OpoLlaDUu42ZlspD+XnK+mdoSclLskI60MF5pfw8RQ3CUBfUmzATFcBs/zvLxtU
 GER93dPHsyyWasTytnkalKW33qO2v3xyJ9EyDV8+YiJfIGEY5XhZKCtasg4yqzXa0dBJ
 jnv6doMcJ3e6XEX2EImC3NGrQRgM5OEkkjUw8qNVYdPB7VX5SmSkkiXObN8V2ePdRJIE
 CDn4CySpaZmT3x/5d0A0+/+aMzKqYk6PTl3EUBp+9nHr2FIJTwOoAyKkp3Lq3F+k23ax
 Z8slmwjxUAy8vZDpnLlRGSB+qzHOr5z47F8p2lajkyyDsWrGuJaPTgTL/7dwKgEOooS/
 H7UQ==
MIME-Version: 1.0
X-Received: by 10.52.237.164 with SMTP id vd4mr1306105vdc.118.1374043053120;
 Tue, 16 Jul 2013 23:37:33 -0700 (PDT)
Received: by 10.59.6.227 with HTTP; Tue, 16 Jul 2013 23:37:33 -0700 (PDT)
Date: Wed, 17 Jul 2013 11:07:33 +0430
Message-ID: <CAP3aEKd4Ae-DKQOWV1wYz0tuMDVQM76vT9wpmh3bmjsVUDtuKQ@mail.gmail.com>
Subject: XFS write support
From: Host DL <hostdl@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 06:37:33 -0000

Hello,

I just want to port my current CentOS boxes to FreeBSD but noticed that XFS
write support hasn't been implemented yet after many years or it is still
experimental

Please let me know what is the current status of this feature and how to
enable it

Regards

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 08:28:23 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 506183BB
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 08:28:23 +0000 (UTC)
 (envelope-from maurizio.vairani@cloverinformatica.it)
Received: from smtpdg2.aruba.it (smtpdg220.aruba.it [62.149.158.220])
 by mx1.freebsd.org (Postfix) with ESMTP id 8B63BC96
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 08:28:21 +0000 (UTC)
Received: from cloverinformatica.it ([188.10.129.202])
 by smtpcmd01.ad.aruba.it with bizsmtp
 id 1LT91m00h4N8xN401LT9JH; Wed, 17 Jul 2013 10:27:10 +0200
Received: from [192.168.0.81] (ASUS-TERMINATOR [192.168.0.81])
 by cloverinformatica.it (Postfix) with ESMTP id B66FBF5E6;
 Wed, 17 Jul 2013 10:27:09 +0200 (CEST)
Message-ID: <51E6555D.2080803@cloverinformatica.it>
Date: Wed, 17 Jul 2013 10:27:09 +0200
From: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: freebsd-stable@FreeBSD.org, freebsd-fs@freebsd.org
Subject: Shutdown problem with an USB memory stick as ZFS cache device
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 08:28:23 -0000

Hi all,


on a Compaq Presario laptop I have just installed the latest stable


#uname -a

FreeBSD presario 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0: Tue Jul 16 
16:32:39 CEST 2013     root@presario:/usr/obj/usr/src/sys/GENERIC  amd64


For speed up the compilation I have added to the pool, tank0,  a SanDisk 
memory stick as cache device with the command:


# zpool add tank0 cache /dev/da0


But when I shutdown the laptop the process will halt with this screen shot:


http://www.dump-it.fr/freebsd-screen-shot/2f9169f18c7c77e52e873580f9c2d4bf.jpg.html


and I need to press the power button for more than 4 seconds to switch 
off the laptop.

The problem is always reproducible.


Regards

Maurizio


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 09:40:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E8D45449;
 Wed, 17 Jul 2013 09:40:18 +0000 (UTC)
 (envelope-from Ivailo.Tanusheff@skrill.com)
Received: from db9outboundpool.messaging.microsoft.com
 (mail-db9lp0253.outbound.messaging.microsoft.com [213.199.154.253])
 by mx1.freebsd.org (Postfix) with ESMTP id 4BD67F8C;
 Wed, 17 Jul 2013 09:40:17 +0000 (UTC)
Received: from mail9-db9-R.bigfish.com (10.174.16.230) by
 DB9EHSOBE017.bigfish.com (10.174.14.80) with Microsoft SMTP Server id
 14.1.225.22; Wed, 17 Jul 2013 09:25:05 +0000
Received: from mail9-db9 (localhost [127.0.0.1])	by mail9-db9-R.bigfish.com
 (Postfix) with ESMTP id 97B8DC80216;	Wed, 17 Jul 2013 09:25:05 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null);
 IPV:NLI; H:AM2PRD0710HT004.eurprd07.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -1
X-BigFish: PS-1(zz9371I542I14ffIzz1f42h1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz17326ah8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1e1dh9a9j1155h)
Received-SPF: pass (mail9-db9: domain of skrill.com designates 157.56.249.213
 as permitted sender) client-ip=157.56.249.213;
 envelope-from=Ivailo.Tanusheff@skrill.com;
 helo=AM2PRD0710HT004.eurprd07.prod.outlook.com ; .outlook.com ; 
X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;
 SFS:(377454003)(13464003)(199002)(189002)(53754006)(4396001)(69226001)(63696002)(79102001)(54356001)(33646001)(83072001)(74876001)(74366001)(59766001)(77982001)(65816001)(47446002)(46102001)(76786001)(16406001)(77096001)(74502001)(50986001)(76576001)(54316002)(81342001)(31966008)(49866001)(81542001)(47976001)(47736001)(74316001)(56816003)(66066001)(76796001)(56776001)(51856001)(74662001)(15202345003)(80022001)(53806001)(76482001)(74706001)(24736002);
 DIR:OUT; SFP:; SCL:1; SRVR:DB3PR07MB059;
 H:DB3PR07MB059.eurprd07.prod.outlook.com; CLIP:217.18.249.148;
 RD:InfoNoRecords; A:1; MX:1; LANG:en; 
Received: from mail9-db9 (localhost.localdomain [127.0.0.1]) by mail9-db9
 (MessageSwitch) id 1374053102822740_658; Wed, 17 Jul 2013 09:25:02 +0000
 (UTC)
Received: from DB9EHSMHS009.bigfish.com (unknown [10.174.16.231])	by
 mail9-db9.bigfish.com (Postfix) with ESMTP id B9673920046;	Wed, 17 Jul 2013
 09:25:02 +0000 (UTC)
Received: from AM2PRD0710HT004.eurprd07.prod.outlook.com (157.56.249.213) by
 DB9EHSMHS009.bigfish.com (10.174.14.19) with Microsoft SMTP Server (TLS) id
 14.16.227.3; Wed, 17 Jul 2013 09:25:02 +0000
Received: from DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) by
 AM2PRD0710HT004.eurprd07.prod.outlook.com (10.255.165.39) with Microsoft SMTP
 Server (TLS) id 14.16.329.3; Wed, 17 Jul 2013 09:25:02 +0000
Received: from DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) by
 DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) with Microsoft SMTP
 Server (TLS) id 15.0.731.16; Wed, 17 Jul 2013 09:25:00 +0000
Received: from DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.117]) by
 DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.117]) with mapi id
 15.00.0731.000; Wed, 17 Jul 2013 09:25:00 +0000
From: Ivailo Tanusheff <Ivailo.Tanusheff@skrill.com>
To: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>,
 "freebsd-stable@FreeBSD.org" <freebsd-stable@FreeBSD.org>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: RE: Shutdown problem with an USB memory stick as ZFS cache device
Thread-Topic: Shutdown problem with an USB memory stick as ZFS cache device
Thread-Index: AQHOgseq21n5ubkB006RuCOsHZ/zxplomN5Q
Date: Wed, 17 Jul 2013 09:25:00 +0000
Message-ID: <0243b7c6538240c69770fdd0aaa4e8e0@DB3PR07MB059.eurprd07.prod.outlook.com>
References: <51E6555D.2080803@cloverinformatica.it>
In-Reply-To: <51E6555D.2080803@cloverinformatica.it>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [217.18.249.148]
x-forefront-prvs: 0910AAF391
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: skrill.com
X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn%
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 09:40:19 -0000

I think this is expected as your screenshot shows the USB has being disconn=
ected, so you actually lost the cache device on the shutdown.
Maybe you should implement a shutdown script that removes the USB cache fro=
m the pool before the shutdown command is issued :)

Best regards,
Ivailo Tanusheff

-----Original Message-----
From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-stable@freebsd=
.org] On Behalf Of Maurizio Vairani
Sent: Wednesday, July 17, 2013 11:27 AM
To: freebsd-stable@FreeBSD.org; freebsd-fs@freebsd.org
Subject: Shutdown problem with an USB memory stick as ZFS cache device

Hi all,


on a Compaq Presario laptop I have just installed the latest stable


#uname -a

FreeBSD presario 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0: Tue Jul 16=20
16:32:39 CEST 2013     root@presario:/usr/obj/usr/src/sys/GENERIC  amd64


For speed up the compilation I have added to the pool, tank0,  a SanDisk me=
mory stick as cache device with the command:


# zpool add tank0 cache /dev/da0


But when I shutdown the laptop the process will halt with this screen shot:


http://www.dump-it.fr/freebsd-screen-shot/2f9169f18c7c77e52e873580f9c2d4bf.=
jpg.html


and I need to press the power button for more than 4 seconds to switch=20
off the laptop.

The problem is always reproducible.


Regards

Maurizio

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 09:50:32 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id BAD18703;
 Wed, 17 Jul 2013 09:50:32 +0000 (UTC)
 (envelope-from ronald-freebsd8@klop.yi.org)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.81]) by mx1.freebsd.org (Postfix) with ESMTP id 7F0D4A3;
 Wed, 17 Jul 2013 09:50:32 +0000 (UTC)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.69) (envelope-from <ronald-freebsd8@klop.yi.org>)
 id 1UzONM-0002DW-CT; Wed, 17 Jul 2013 11:50:25 +0200
Received: from [81.21.138.17] (helo=ronaldradial.versatec.local)
 by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.72) (envelope-from <ronald-freebsd8@klop.yi.org>)
 id 1UzONM-0002DC-8P; Wed, 17 Jul 2013 11:50:24 +0200
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org, "Maurizio Vairani"
 <maurizio.vairani@cloverinformatica.it>
Subject: Re: Shutdown problem with an USB memory stick as ZFS cache device
References: <51E6555D.2080803@cloverinformatica.it>
Date: Wed, 17 Jul 2013 11:50:22 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.w0c1l8yp8527sy@ronaldradial.versatec.local>
In-Reply-To: <51E6555D.2080803@cloverinformatica.it>
User-Agent: Opera Mail/12.16 (Win32)
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: -
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled
 version=3.3.1
X-Scan-Signature: dfea3049d3b923820beb462d65569822
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 09:50:32 -0000

On Wed, 17 Jul 2013 10:27:09 +0200, Maurizio Vairani  
<maurizio.vairani@cloverinformatica.it> wrote:

> Hi all,
>
>
> on a Compaq Presario laptop I have just installed the latest stable
>
>
> #uname -a
>
> FreeBSD presario 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0: Tue Jul 16  
> 16:32:39 CEST 2013     root@presario:/usr/obj/usr/src/sys/GENERIC  amd64
>
>
> For speed up the compilation I have added to the pool, tank0,  a SanDisk  
> memory stick as cache device with the command:
>
>
> # zpool add tank0 cache /dev/da0
>
>
> But when I shutdown the laptop the process will halt with this screen  
> shot:
>
>
> http://www.dump-it.fr/freebsd-screen-shot/2f9169f18c7c77e52e873580f9c2d4bf.jpg.html
>
>
> and I need to press the power button for more than 4 seconds to switch  
> off the laptop.
>
> The problem is always reproducible.

Does sysctl hw.usb.no_shutdown_wait=1 help?

Ronald.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 09:58:51 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 8D3F4F20;
 Wed, 17 Jul 2013 09:58:51 +0000 (UTC)
 (envelope-from prvs=1910bf16bb=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 0243A286;
 Wed, 17 Jul 2013 09:58:49 +0000 (UTC)
Received: from r2d2 ([82.69.141.170])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50005028799.msg;
 Wed, 17 Jul 2013 10:58:40 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Jul 2013 10:58:40 +0100
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 82.69.141.170
X-Return-Path: prvs=1910bf16bb=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <6ACFDC285CB64C148915BC0FD8357B11@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: <freebsd-stable@freebsd.org>, <freebsd-fs@freebsd.org>,
 "Maurizio Vairani" <maurizio.vairani@cloverinformatica.it>,
 "Ronald Klop" <ronald-freebsd8@klop.yi.org>
References: <51E6555D.2080803@cloverinformatica.it>
 <op.w0c1l8yp8527sy@ronaldradial.versatec.local>
Subject: Re: Shutdown problem with an USB memory stick as ZFS cache device
Date: Wed, 17 Jul 2013 10:59:04 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 09:58:51 -0000


----- Original Message ----- 
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
> 
> Does sysctl hw.usb.no_shutdown_wait=1 help?

That will just prevent the wait it won't stop the
shutdown from happening.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 10:08:47 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 649C86FC
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:08:47 +0000 (UTC)
 (envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
 by mx1.freebsd.org (Postfix) with ESMTP id 20BDD6D1
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:08:46 +0000 (UTC)
Received: by people.fsn.hu (Postfix, from userid 1001)
 id 0E8FE1127287; Wed, 17 Jul 2013 12:03:16 +0200 (CEST)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.001777, version=1.2.3
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR:
 7.5406]
X-CRM114-CacheID: sfid-20130717_12031_E9EBBC86 
X-CRM114-Status: Good  ( pR: 7.5406 )
X-DSPAM-Result: Whitelisted
X-DSPAM-Processed: Wed Jul 17 12:03:16 2013
X-DSPAM-Confidence: 0.7005
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 51e66be4282261417415160
X-DSPAM-Factors: 27, From*Attila Nagy <bra@fsn.hu>, 0.00010,
 From*Attila, 0.00535, Mounted, 0.00594, Mounted+on, 0.00594,
 mount, 0.00712, Subject*files, 0.00762, USE, 0.00762,
 the+files, 0.00762, shutdown, 0.00888, fsck, 0.00888,
 fsck, 0.00888, (at+least, 0.00888, files, 0.00923,
 files, 0.00923,
 Received*online.co.hu+[195.228.243.99]), 0.01000,
 Received*[195.228.243.99]), 0.01000, I'm+waiting, 0.99000,
 file+system, 0.01000, Received*online.co.hu, 0.01000,
 From*Attila+Nagy, 0.01000, Date*03+11, 0.99000,
 Sizes, 0.01000, find+on, 0.99000, Received*(japan.t, 0.01000,
 From*Nagy+<bra, 0.01000, Blocks, 0.01000,
X-Spambayes-Classification: ham; 0.00
Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99])
 by people.fsn.hu (Postfix) with ESMTPSA id A544A112727B
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 12:03:15 +0200 (CEST)
Message-ID: <51E66BDF.4010709@fsn.hu>
Date: Wed, 17 Jul 2013 12:03:11 +0200
From: Attila Nagy <bra@fsn.hu>
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: SU+J all files lost after a reboot?
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 10:08:47 -0000

Hi,

SU+J file systems formatted with r248885. The file systems were in 
active use for some months, they were 70% full.
Today, I rebooted the OS (clean shutdown, there were no crashes) with 
r251643 just to see this:
/dev/da0p2                                923G     32M    923G 0%    /fs

All files lost after a reboot???

But a quick find on the file system showed the files are(?) there, I can 
even read them (at least the ones I've tried so far).
Starting an fsck gives:
# fsck /fs
** /dev/da0p2

USE JOURNAL? [yn] y

** SU+J Recovering /dev/da0p2
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on
** Phase 1 - Check Blocks and Sizes

Now I'm waiting to see what this will do to the data.

I'm somewhat inclined to think that SU+J is not production ready yet...

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 10:28:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 6C059B14
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:28:33 +0000 (UTC)
 (envelope-from maurizio.vairani@cloverinformatica.it)
Received: from smtpdg9.aruba.it (smtpdg8.aruba.it [62.149.158.238])
 by mx1.freebsd.org (Postfix) with ESMTP id C72F37E1
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:28:32 +0000 (UTC)
Received: from cloverinformatica.it ([188.10.129.202])
 by smtpcmd03.ad.aruba.it with bizsmtp
 id 1NUN1m01L4N8xN401NUPLB; Wed, 17 Jul 2013 12:28:24 +0200
Received: from [192.168.0.100] (MAURIZIO-PC [192.168.0.100])
 by cloverinformatica.it (Postfix) with ESMTP id 8571BF651;
 Wed, 17 Jul 2013 12:28:23 +0200 (CEST)
Message-ID: <51E671C7.50409@cloverinformatica.it>
Date: Wed, 17 Jul 2013 12:28:23 +0200
From: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: [SOLVED] Re: Shutdown problem with an USB memory stick as ZFS cache
 device
References: <51E6555D.2080803@cloverinformatica.it>
 <op.w0c1l8yp8527sy@ronaldradial.versatec.local>
In-Reply-To: <op.w0c1l8yp8527sy@ronaldradial.versatec.local>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 10:28:33 -0000

On 17/07/2013 11:50, Ronald Klop wrote:
> On Wed, 17 Jul 2013 10:27:09 +0200, Maurizio Vairani 
> <maurizio.vairani@cloverinformatica.it> wrote:
>
>> Hi all,
>>
>>
>> on a Compaq Presario laptop I have just installed the latest stable
>>
>>
>> #uname -a
>>
>> FreeBSD presario 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0: Tue Jul 16 
>> 16:32:39 CEST 2013     root@presario:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>>
>> For speed up the compilation I have added to the pool, tank0,  a 
>> SanDisk memory stick as cache device with the command:
>>
>>
>> # zpool add tank0 cache /dev/da0
>>
>>
>> But when I shutdown the laptop the process will halt with this screen 
>> shot:
>>
>>
>> http://www.dump-it.fr/freebsd-screen-shot/2f9169f18c7c77e52e873580f9c2d4bf.jpg.html 
>>
>>
>>
>> and I need to press the power button for more than 4 seconds to 
>> switch off the laptop.
>>
>> The problem is always reproducible.
>
> Does sysctl hw.usb.no_shutdown_wait=1 help?
>
> Ronald.
Thank you Ronald it works !

In /boot/loader.conf added the line
hw.usb.no_shutdown_wait=1

Maurizio


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 10:30:44 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 3AC92CF9
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:30:44 +0000 (UTC)
 (envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
 by mx1.freebsd.org (Postfix) with ESMTP id E8A68819
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:30:43 +0000 (UTC)
Received: by people.fsn.hu (Postfix, from userid 1001)
 id 4DED8112759F; Wed, 17 Jul 2013 12:30:42 +0200 (CEST)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000812, version=1.2.3
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR:
 13.5615]
X-CRM114-CacheID: sfid-20130717_12304_19D79649 
X-CRM114-Status: Good  ( pR: 13.5615 )
X-DSPAM-Result: Whitelisted
X-DSPAM-Processed: Wed Jul 17 12:30:42 2013
X-DSPAM-Confidence: 0.9956
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 51e67252416701431120196
X-DSPAM-Factors: 27, From*Attila Nagy <bra@fsn.hu>, 0.00010, >+Hi, 0.00128,
 in+>, 0.00199, wrote+>, 0.00209, Hi+>, 0.00223, I+>, 0.00282,
 this+>, 0.00298, >+>, 0.00357, >+>, 0.00357,
 References*fsn.hu>, 0.00357, In-Reply-To*fsn.hu>, 0.00383,
 with+>, 0.00383, on+>, 0.00383, wrote, 0.00514,
 From*Attila, 0.00535, Capacity, 0.00535, Mounted, 0.00594,
 Mounted, 0.00594, Avail, 0.00594, Used+Avail, 0.00594,
 Mounted+on, 0.00594, Mounted+on, 0.00594,
 Avail+Capacity, 0.00594, Capacity+Mounted, 0.00594,
 Filesystem, 0.00594, >+But, 0.00594,
X-Spambayes-Classification: ham; 0.00
Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99])
 by people.fsn.hu (Postfix) with ESMTPSA id 29A56112758E
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 12:30:41 +0200 (CEST)
Message-ID: <51E67250.2020102@fsn.hu>
Date: Wed, 17 Jul 2013 12:30:40 +0200
From: Attila Nagy <bra@fsn.hu>
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: SU+J all files lost after a reboot?
References: <51E66BDF.4010709@fsn.hu>
In-Reply-To: <51E66BDF.4010709@fsn.hu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 10:30:44 -0000

On 07/17/13 12:03, Attila Nagy wrote:
> Hi,
>
> SU+J file systems formatted with r248885. The file systems were in 
> active use for some months, they were 70% full.
> Today, I rebooted the OS (clean shutdown, there were no crashes) with 
> r251643 just to see this:
> /dev/da0p2                                923G     32M    923G 0%    /fs
>
> All files lost after a reboot???
>
> But a quick find on the file system showed the files are(?) there, I 
> can even read them (at least the ones I've tried so far).
> Starting an fsck gives:
> # fsck /fs
> ** /dev/da0p2
>
> USE JOURNAL? [yn] y
>
> ** SU+J Recovering /dev/da0p2
> Journal timestamp does not match fs mount time
> ** Skipping journal, falling through to full fsck
>
> ** Last Mounted on
> ** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
SUMMARY BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] y

6768204 files, 84410191 used, 36633989 free (27013 frags, 4575872 
blocks, 0.0% fragmentation)

***** FILE SYSTEM IS CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
# mount /fs
# df -h /fs
Filesystem                                Size    Used   Avail Capacity  
Mounted on
/dev/da0p2                                923G    644G    279G 70%    /fs

Scary.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 10:38:41 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 862C4EF7;
 Wed, 17 Jul 2013 10:38:41 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 842F6857;
 Wed, 17 Jul 2013 10:38:40 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA01409;
 Wed, 17 Jul 2013 13:38:38 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1UzP81-00002p-NS; Wed, 17 Jul 2013 13:38:37 +0300
Message-ID: <51E67409.3010901@FreeBSD.org>
Date: Wed, 17 Jul 2013 13:38:01 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-stable@FreeBSD.org, freebsd-fs@FreeBSD.org
Subject: Re: Shutdown problem with an USB memory stick as ZFS cache device
References: <51E6555D.2080803@cloverinformatica.it>
 <op.w0c1l8yp8527sy@ronaldradial.versatec.local>
In-Reply-To: <op.w0c1l8yp8527sy@ronaldradial.versatec.local>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 10:38:41 -0000

on 17/07/2013 12:50 Ronald Klop said the following:
> Does sysctl hw.usb.no_shutdown_wait=1 help?

I believe that the root cause of the issue is that ZFS does not perform full
clean up on shutdown and thus does not release its devices.  But perhaps I am
mistaken.
In any case, I think that doing the same kind of clean up as done on zfs module
unload would be advantageous.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 11:04:31 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EE5816A1;
 Wed, 17 Jul 2013 11:04:30 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 0040196E;
 Wed, 17 Jul 2013 11:04:29 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA02116;
 Wed, 17 Jul 2013 14:04:21 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1UzPWu-00005Q-V1; Wed, 17 Jul 2013 14:04:21 +0300
Message-ID: <51E679FD.3040306@FreeBSD.org>
Date: Wed, 17 Jul 2013 14:03:25 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: zfs-devel@FreeBSD.org, freebsd-fs@FreeBSD.org
Subject: zfs_rename: another zfs+vfs deadlock
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 11:04:31 -0000


I received a report about what looked like a deadlock involving ZFS.
Interesting bits from the report are:

Thread 1156 (Thread 2038380):
#0  sched_switch (td=0xfffffe01a9e56460, newtd=0xfffffe001c3ff000,
flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1860
#1  0xffffffff808ab51a in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:466
#2  0xffffffff808e3e63 in sleepq_switch (wchan=0xfffffe06d9df2310, pri=96) at
/usr/src/sys/kern/subr_sleepqueue.c:538
#3  0xffffffff808e4a9d in sleepq_wait (wchan=0xfffffe06d9df2310, pri=96) at
/usr/src/sys/kern/subr_sleepqueue.c:617
#4  0xffffffff8088aebb in __lockmgr_args (lk=0xfffffe06d9df2310, flags=524544,
ilk=0xfffffe06d9df23d8, wmesg=Variable "wmesg" is not available.
) at /usr/src/sys/kern/kern_lock.c:214
#5  0xffffffff8092d349 in vop_stdlock (ap=Variable "ap" is not available.
) at lockmgr.h:97
#6  0xffffffff80bd62ab in VOP_LOCK1_APV (vop=0xffffffff8111cc80,
a=0xffffff90729ee6e0) at vnode_if.c:1988
#7  0xffffffff8094cfa7 in _vn_lock (vp=0xfffffe06d9df2278, flags=524288,
file=Variable "file" is not available.
) at vnode_if.h:859
#8  0xffffffff80942220 in vputx (vp=0xfffffe06d9df2278, func=1) at
/usr/src/sys/kern/vfs_subr.c:2279
#9  0xffffffff816e75a4 in zfs_rename_unlock (zlpp=0xffffff90729ee878) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3501
#10 0xffffffff816e8df4 in zfs_freebsd_rename (ap=Variable "ap" is not available.
) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3927
#11 0xffffffff80bd67cb in VOP_RENAME_APV (vop=0xffffffff8175b900,
a=0xffffff90729eeaa0) at vnode_if.c:1474
#12 0xffffffff80947844 in kern_renameat (td=Variable "td" is not available.
) at vnode_if.h:636
#13 0xffffffff80b4eff2 in amd64_syscall (td=0xfffffe01a9e56460, traced=0) at
subr_syscall.c:135
#14 0xffffffff80b39b97 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387

fr 11
p *a
$1 = {a_gen = {a_desc = 0xffffffff811594c0}, a_fdvp = 0xfffffe05fb094278, a_fvp
= 0xfffffe04b7b62278, a_fcnp = 0xffffff90729eea58, a_tdvp = 0xfffffe0514137768,
a_tvp = 0x0, a_tcnp = 0xffffff90729ee9a8}

Thread 1158 (Thread 4174978):
#0  sched_switch (td=0xfffffe088cbef000, newtd=0xfffffe001c40e000,
flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1860
#1  0xffffffff808ab51a in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:466
#2  0xffffffff808e3e63 in sleepq_switch (wchan=0xfffffe0514137800, pri=96) at
/usr/src/sys/kern/subr_sleepqueue.c:538
#3  0xffffffff808e4a9d in sleepq_wait (wchan=0xfffffe0514137800, pri=96) at
/usr/src/sys/kern/subr_sleepqueue.c:617
#4  0xffffffff8088b4e0 in __lockmgr_args (lk=0xfffffe0514137800, flags=2097152,
ilk=0xfffffe05141378c8, wmesg=Variable "wmesg" is not available.
) at /usr/src/sys/kern/kern_lock.c:214
#5  0xffffffff8092d349 in vop_stdlock (ap=Variable "ap" is not available.
) at lockmgr.h:97
#6  0xffffffff80bd62ab in VOP_LOCK1_APV (vop=0xffffffff8111cc80,
a=0xffffff9072813470) at vnode_if.c:1988
#7  0xffffffff8094cfa7 in _vn_lock (vp=0xfffffe0514137768, flags=2097152,
file=Variable "file" is not available.
) at vnode_if.h:859
#8  0xffffffff816e5bdd in zfs_vnode_lock (vp=0xfffffe0514137768, flags=Variable
"flags" is not available.
) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1704
#9  0xffffffff816e6d70 in zfs_lookup (dvp=0xfffffe06d9df2278,
nm=0xffffff90728135b0 "toBeDeleted", vpp=0xffffff9072813930,
cnp=0xffffff9072813958, nameiop=0, cr=0xfffffe0ba89b0a00, td=0xfffffe088cbef000,
flags=0)
    at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1433
#10 0xffffffff816e7511 in zfs_freebsd_lookup (ap=0xffffff9072813710) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5758
#11 0xffffffff80bd5d3f in VOP_CACHEDLOOKUP_APV (vop=0xffffffff8175b900,
a=0xffffff9072813710) at vnode_if.c:187
#12 0xffffffff8092b103 in vfs_cache_lookup (ap=Variable "ap" is not available.
) at vnode_if.h:80
#13 0xffffffff80bd7187 in VOP_LOOKUP_APV (vop=0xffffffff8175b900,
a=0xffffff90728137d0) at vnode_if.c:123
#14 0xffffffff8093260a in lookup (ndp=0xffffff90728138f0) at vnode_if.h:54
#15 0xffffffff8093354e in namei (ndp=0xffffff90728138f0) at
/usr/src/sys/kern/vfs_lookup.c:297
#16 0xffffffff80944213 in kern_statat_vnhook (td=0xfffffe088cbef000,
flag=Variable "flag" is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2432
#17 0xffffffff809443b5 in kern_statat (td=Variable "td" is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2413
#18 0xffffffff8094455a in sys_stat (td=Variable "td" is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2374
#19 0xffffffff80b4eff2 in amd64_syscall (td=0xfffffe088cbef000, traced=0) at
subr_syscall.c:135
#20 0xffffffff80b39b97 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387

As far as I understand the code, I think that zfs_rename_lock (called from
zfs_rename) iterates up ancestor chain of target directory (tdzp) and obtains a
reference on each of the ancestors via zfs_zget.
zfs_rename_unlock does the opposite - it iterates in the reverse order and
VN_RELE-s the ancestor znodes.
As you can see above, on FreeBSD VN_RELE translates to vputx, which internally
needs to obtain a vnode lock.

The problem seems to be is that VOP_RENAME -> zfs_freebsd_rename is called with
locked tdvp (and perhaps non-NULL and thus locked tvp).  tdvp's vnode lock is
released at the very end of zfs_freebsd_rename and so it is held over
zfs_rename_unlock.
And that means that vnode locks of tvp's ancestors can be acquired while tdvp's
vnode lock is held.
That violates the VFS lock ordering where a descendant's lock must always be
acquired after an ancestor's lock.  So that could lead to a deadlock with
another VFS operation that acquires locks in the proper order.

In the above snippet 0xfffffe06d9df2278 is a directory/ancestor of tdvp and
0xfffffe0514137768 is tdvp.  VOP_LOOKUP -> zfs_lookup acquires the locks in the
correct order (dvp is the ancestor while vp is the tdvp) while zfs_rename does
it in the opposite order.

A scenario to reproduce this bug could be like this.
mkdir a
mkdir a/b
mv some-file a/b/ (in parallel with) stat a/b
Of course it would have to be repeated many times to hit the right timing
window.  Also, namecache could interfere with this scenario, but I am not sure.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 11:27:16 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 6B8CDA3B;
 Wed, 17 Jul 2013 11:27:16 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 1858CA3E;
 Wed, 17 Jul 2013 11:27:14 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA02725;
 Wed, 17 Jul 2013 14:27:13 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1UzPt3-00007u-04; Wed, 17 Jul 2013 14:27:13 +0300
Message-ID: <51E67F54.9080800@FreeBSD.org>
Date: Wed, 17 Jul 2013 14:26:12 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Adrian Chadd <adrian@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
In-Reply-To: <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, freebsd-current <freebsd-current@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 11:27:16 -0000

on 16/07/2013 22:40 Adrian Chadd said the following:
> :(  So it's a deadlock. Ok, so what's next?

A creative process...

One possibility is to add getnewvnode_reserve() calls before the ZFS transaction
beginnings in the places where a new vnode/znode may have to be allocated within
a transaction.
This looks like a quick and cheap solution but it makes the code somewhat messier.

Another possibility is to change something in VFS machinery, so that VOP_RECLAIM
getting blocked for one filesystem does not prevent vnode allocation for other
filesystems.

I could think of other possible solutions via infrastructural changes in VFS or
ZFS...

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 15:30:02 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 7DAC37B9;
 Wed, 17 Jul 2013 15:30:02 +0000 (UTC) (envelope-from jhs@berklix.com)
Received: from land.berklix.org (land.berklix.org [144.76.10.75])
 by mx1.freebsd.org (Postfix) with ESMTP id EC7FC8D1;
 Wed, 17 Jul 2013 15:30:01 +0000 (UTC)
Received: from park.js.berklix.net (pD9FBEF06.dip0.t-ipconnect.de
 [217.251.239.6]) (authenticated bits=128)
 by land.berklix.org (8.14.5/8.14.5) with ESMTP id r6HFTwND089577;
 Wed, 17 Jul 2013 15:29:59 GMT (envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41])
 by park.js.berklix.net (8.14.3/8.14.3) with ESMTP id r6HFTpWF004069;
 Wed, 17 Jul 2013 17:29:51 +0200 (CEST)
 (envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (localhost [127.0.0.1])
 by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id r6HFT4EK063849;
 Wed, 17 Jul 2013 17:29:10 +0200 (CEST)
 (envelope-from jhs@fire.js.berklix.net)
Message-Id: <201307171529.r6HFT4EK063849@fire.js.berklix.net>
To: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
Subject: Re: [SOLVED] Re: Shutdown problem with an USB memory stick as ZFS
 cache device 
From: "Julian H. Stacey" <jhs@berklix.com>
Organization: http://berklix.com BSD Unix Linux Consultancy, Munich Germany
User-agent: EXMH on FreeBSD http://berklix.com/free/
X-URL: http://www.berklix.com
In-reply-to: Your message "Wed, 17 Jul 2013 12:28:23 +0200."
 <51E671C7.50409@cloverinformatica.it> 
Date: Wed, 17 Jul 2013 17:29:04 +0200
Sender: jhs@berklix.com
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org,
 Ronald Klop <ronald-freebsd8@klop.yi.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 15:30:02 -0000

Maurizio Vairani wrote:
> On 17/07/2013 11:50, Ronald Klop wrote:
> > On Wed, 17 Jul 2013 10:27:09 +0200, Maurizio Vairani 
> > <maurizio.vairani@cloverinformatica.it> wrote:
> >
> >> Hi all,
> >>
> >>
> >> on a Compaq Presario laptop I have just installed the latest stable
> >>
> >>
> >> #uname -a
> >>
> >> FreeBSD presario 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0: Tue Jul 16 
> >> 16:32:39 CEST 2013     root@presario:/usr/obj/usr/src/sys/GENERIC  amd64
> >>
> >>
> >> For speed up the compilation I have added to the pool, tank0,  a 
> >> SanDisk memory stick as cache device with the command:
> >>
> >>
> >> # zpool add tank0 cache /dev/da0
> >>
> >>
> >> But when I shutdown the laptop the process will halt with this screen 
> >> shot:
> >>
> >>
> >> http://www.dump-it.fr/freebsd-screen-shot/2f9169f18c7c77e52e873580f9c2d4bf.jpg.html 
> >>
> >>
> >>
> >> and I need to press the power button for more than 4 seconds to 
> >> switch off the laptop.
> >>
> >> The problem is always reproducible.
> >
> > Does sysctl hw.usb.no_shutdown_wait=1 help?
> >
> > Ronald.
> Thank you Ronald it works !
> 
> In /boot/loader.conf added the line
> hw.usb.no_shutdown_wait=1
> 
> Maurizio

I wonder (from ignorance as I dont use ZFS yet),
if that merely masks the symptom or cures the fault ?

Presumably one should use a ZFS command to disassociate whatever
might have the cache open ?  (in case something might need to be
written out from cache, if it was a writeable cache ?)

I too had a USB shutdown problem (non ZFS, now solved) & several people
made useful comments on shutdown scripts etc, so I'm cross referencing:

http://lists.freebsd.org/pipermail/freebsd-mobile/2013-July/012803.html

Cheers,
Julian
-- 
Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com
 Reply below not above, like a play script.  Indent old text with "> ".
 Send plain text.  No quoted-printable, HTML, base64, multipart/alternative.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 17:01:31 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 97C63308
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 17:01:31 +0000 (UTC)
 (envelope-from gezeala@gmail.com)
Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com
 [IPv6:2a00:1450:4010:c03::22d])
 by mx1.freebsd.org (Postfix) with ESMTP id 193E0DC3
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 17:01:30 +0000 (UTC)
Received: by mail-la0-f45.google.com with SMTP id fr10so1706083lab.18
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 10:01:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=OCDhufSUWjnloz7XeBjVA+ybkZOpW2Ct111XH49MBew=;
 b=dP1xVA+ZqlZn+naAq4Iv7ONDKwjydCRy2FmifMLTssVUIWDQZXEYD3lpowObeeNZNx
 Z/fgn4flPZzHvu1eeavqlbDY0iZQckaa/Pvl5DIsR14g+VzMD9JTqD4UzYZH5OasmaU8
 PGeW3Akeyaii/FEpHFMZvZc72TMDQrZS2OucqaZqO/HKiOSN53Omzhhzy1HVS0mPW07H
 CTzgJNx079B9pMsWhW5N3z+aTig6QYaK8SPFMrYj0qcymXagTrGYm63UXMGeXs2ED1MW
 9KxR4qh9DAvZVpzRkNZbL9MKxjLH/lFet3FGP36joGfhJFL8uRhvZ4CGgeUPRkGV5rMb
 jf1g==
X-Received: by 10.112.97.132 with SMTP id ea4mr3514560lbb.80.1374080490052;
 Wed, 17 Jul 2013 10:01:30 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.82.72 with HTTP; Wed, 17 Jul 2013 10:00:49 -0700 (PDT)
In-Reply-To: <alpine.BSF.2.00.1307161935020.84500@wonkity.com>
References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com>
 <51D42107.1050107@digsys.bg>
 <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se>
 <51D437E2.4060101@digsys.bg>
 <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk>
 <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se>
 <20130704000405.GA75529@icarus.home.lan>
 <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se>
 <20130704171637.GA94539@icarus.home.lan>
 <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se>
 <20130704191203.GA95642@icarus.home.lan>
 <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk>
 <CAOjFWZ4obK1cSmvTpW+t4xKdMf+kJV5w-sujDT1AZoepj+5YrA@mail.gmail.com>
 <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk>
 <51D6A206.2020303@digsys.bg>
 <CAOjFWZ5CWV3UZRppM3nTehfTPaw1N+w6LjsEZZGxE16DOkS+GA@mail.gmail.com>
 <CAJKO3mW=ahm7sBdjGc-b2tN3D7+QH7gR7UPt24RsqUdZf=+jvA@mail.gmail.com>
 <alpine.BSF.2.00.1307161935020.84500@wonkity.com>
From: =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
Date: Wed, 17 Jul 2013 10:00:49 -0700
Message-ID: <CAJKO3mWqUTB22Hu=Ce+y_0Om96Ai1ZgLL_3dEcC4p6DXoy3YLQ@mail.gmail.com>
Subject: Re: Slow resilvering with mirrored ZIL
To: Warren Block <wblock@wonkity.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 17:01:31 -0000

On Tue, Jul 16, 2013 at 6:47 PM, Warren Block <wblock@wonkity.com> wrote:

> On Tue, 16 Jul 2013, Gezeala M. Bacu=F1o II wrote:
>
>  On Fri, Jul 5, 2013 at 6:08 PM, Freddie Cash <fjwcash@gmail.com> wrote:
>>
>>
>>> ZFS- on-Linux has added this as "-o ashift=3D" property for zpool creat=
e.
>>>
>>> There's a threat on the illumos list about standardising this s across
>>> all
>>> ZFS- using OSes.
>>>
>>>
>>>
>>>  +1 on this. We tested zfs-on-linux last year and it does automatically
>> handle disk partitioning for correct alignment. What we do is just add
>> ashift=3D12 option during zpool create. No more gpart/gnop/ashift/import
>> steps.
>>
>> http://zfsonlinux.org/faq.**html#**HowDoesZFSonLinuxHandlesAdvace**
>> dFormatDrives<http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdv=
acedFormatDrives>
>>
>>
>> Back to FreeBSD ZFS,
>>
>> After reading the thread, I'm still at a loss on this (too much info I
>> guess).. regarding gpart/gnop/ashift tweaks for alignment, do we still
>> need
>> to perform gpart on newly purchased (SSD/SATA/SAS) Advanced Format drive=
s?
>> Or, skip gpart and proceed with gnop/ashift only?
>>
>
> If ZFS goes on a bare drive, it will be aligned by default.  If ZFS is
> going in a partition, yes, align that partition to 4K boundaries or large=
r
> multiples of 4K, like 1M.
>
>
Your statement is enlightening and concise, exactly what I need. Thanks.


> The gnop/ashift workaround is just to get ZFS to use the right block size=
.
>  So if you don't take care to get partition alignment right, you might en=
d
> up using the right block size but misaligned.
>
> And yes, it will be nice to be able to just explicitly tell ZFS the block
> size to use.


We do add the entire drive (no partitions) to ZFS, perform gnop/ashift and
other necessary steps and then verify ashift=3D12 through zdb.

The gpart/gnop/ashift steps, if I understand correctly (do correct me if
I'm stating this incorrectly), is needed for further SSD performance
tuning. Taking into consideration leaving a certain chunk for wear leveling
and also if the SSD has a size that may be too big for L2ARC.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 17:19:04 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4B102611;
 Wed, 17 Jul 2013 17:19:04 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-we0-x22b.google.com (mail-we0-x22b.google.com
 [IPv6:2a00:1450:400c:c03::22b])
 by mx1.freebsd.org (Postfix) with ESMTP id 8B044E95;
 Wed, 17 Jul 2013 17:19:03 +0000 (UTC)
Received: by mail-we0-f171.google.com with SMTP id m46so2072009wev.30
 for <multiple recipients>; Wed, 17 Jul 2013 10:19:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=QcE8DTskG1MeoodPIsqtquSYJMiHeaUmzK/gBYHIom4=;
 b=wzgeE/Vid9VR66auA1B1uq7BT9BHFsG7BA9L4fliwv/IYB0Mu8zwKA+/O2ro23N3+D
 pwnChAlscZsi1r9U8eOK/OmHTTMS1/Vk6+hIcq/xpi9u1Shir4WHr3WZEELIynmTdCtS
 nFjUQxSG1UmrfVP00z/pO46xzj12xxkl728HBzgVuVNNBOh2jCjvlssk28f4zzgGvDCN
 7vbWdyGRn3Q80e8RKlyqWfDjSHDKAl7Kztaj04DmzwD7U/3d5chRNu8v7cDPwu0SyRmM
 EIhKrI7xkocOYh1nCTmRbiokJwIaqkuausv/xKwFr7HSUcwLeZiQtF/O/mtzozyBD+Je
 gkKA==
MIME-Version: 1.0
X-Received: by 10.194.63.229 with SMTP id j5mr5541008wjs.79.1374081542505;
 Wed, 17 Jul 2013 10:19:02 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.217.94.132 with HTTP; Wed, 17 Jul 2013 10:19:02 -0700 (PDT)
In-Reply-To: <51E67F54.9080800@FreeBSD.org>
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
Date: Wed, 17 Jul 2013 10:19:02 -0700
X-Google-Sender-Auth: ptSTsrEDyA2Cg2Pxbh0saE1rqCw
Message-ID: <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
Subject: Re: Deadlock in nullfs/zfs somewhere
From: Adrian Chadd <adrian@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs@freebsd.org, freebsd-current <freebsd-current@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 17:19:04 -0000

On 17 July 2013 04:26, Andriy Gapon <avg@freebsd.org> wrote:
> on 16/07/2013 22:40 Adrian Chadd said the following:
>> :(  So it's a deadlock. Ok, so what's next?
>
> A creative process...

Wonderful. :)

> One possibility is to add getnewvnode_reserve() calls before the ZFS transaction
> beginnings in the places where a new vnode/znode may have to be allocated within
> a transaction.
> This looks like a quick and cheap solution but it makes the code somewhat messier.
>
> Another possibility is to change something in VFS machinery, so that VOP_RECLAIM
> getting blocked for one filesystem does not prevent vnode allocation for other
> filesystems.
>
> I could think of other possible solutions via infrastructural changes in VFS or
> ZFS...

Well, what do others think? This seems like a showstopper for systems
with lots and lots of ZFS filesystems doing lots and lots of activity.


-adrian

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 17:35:27 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id F37BCB85
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 17:35:26 +0000 (UTC)
 (envelope-from wblock@wonkity.com)
Received: from wonkity.com (wonkity.com [67.158.26.137])
 by mx1.freebsd.org (Postfix) with ESMTP id C20D6F60
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 17:35:26 +0000 (UTC)
Received: from wonkity.com (localhost [127.0.0.1])
 by wonkity.com (8.14.7/8.14.7) with ESMTP id r6HHZJk2091667;
 Wed, 17 Jul 2013 11:35:19 -0600 (MDT)
 (envelope-from wblock@wonkity.com)
Received: from localhost (wblock@localhost)
 by wonkity.com (8.14.7/8.14.7/Submit) with ESMTP id r6HHZJjE091664;
 Wed, 17 Jul 2013 11:35:19 -0600 (MDT)
 (envelope-from wblock@wonkity.com)
Date: Wed, 17 Jul 2013 11:35:19 -0600 (MDT)
From: Warren Block <wblock@wonkity.com>
To: =?ISO-8859-15?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
Subject: Re: Slow resilvering with mirrored ZIL
In-Reply-To: <CAJKO3mWqUTB22Hu=Ce+y_0Om96Ai1ZgLL_3dEcC4p6DXoy3YLQ@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1307171111460.91446@wonkity.com>
References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com>
 <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se>
 <51D437E2.4060101@digsys.bg>
 <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk>
 <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se>
 <20130704000405.GA75529@icarus.home.lan>
 <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se>
 <20130704171637.GA94539@icarus.home.lan>
 <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se>
 <20130704191203.GA95642@icarus.home.lan>
 <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk>
 <CAOjFWZ4obK1cSmvTpW+t4xKdMf+kJV5w-sujDT1AZoepj+5YrA@mail.gmail.com>
 <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk>
 <51D6A206.2020303@digsys.bg>
 <CAOjFWZ5CWV3UZRppM3nTehfTPaw1N+w6LjsEZZGxE16DOkS+GA@mail.gmail.com>
 <CAJKO3mW=ahm7sBdjGc-b2tN3D7+QH7gR7UPt24RsqUdZf=+jvA@mail.gmail.com>
 <alpine.BSF.2.00.1307161935020.84500@wonkity.com>
 <CAJKO3mWqUTB22Hu=Ce+y_0Om96Ai1ZgLL_3dEcC4p6DXoy3YLQ@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
 BOUNDARY="3512871622-236036210-1374082519=:91446"
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (wonkity.com [127.0.0.1]); Wed, 17 Jul 2013 11:35:20 -0600 (MDT)
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 17:35:27 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--3512871622-236036210-1374082519=:91446
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Wed, 17 Jul 2013, Gezeala M. Bacu�o II wrote:

> If ZFS goes on a bare drive, it will be aligned by default. �If ZFS is going in a partition, yes, align that partition to 4K boundaries or larger multiples of 4K, like 1M.
> 
> Your statement is enlightening and concise, exactly what I need. Thanks.
>
>       The gnop/ashift workaround is just to get ZFS to use the right block size. �So if you don't take care to get partition alignment right, you might end up using the right
>       block size but misaligned.
>
>       And yes, it will be nice to be able to just explicitly tell ZFS the block size to use.
> 
> 
> We do add the entire drive (no partitions) to ZFS, perform gnop/ashift and other necessary steps and then verify ashift=12 through zdb.
> 
> The gpart/gnop/ashift steps, if I understand correctly (do correct me if I'm stating this incorrectly), is needed for further SSD performance tuning. Taking into consideration leaving a
> certain chunk for wear leveling and also if the SSD has a size that may be too big for L2ARC.

Well, there are several things going on.

Partitions can be used for a couple of things.  Limiting the size of 
space available to ZFS, leaving an unallocated part of the drive for 
wear leveling.  Note that ZFS on FreeBSD now has TRIM, which should make 
leaving unused space on SSDs unnecessary.

Aligning partitions preserves performance.  If a partition is 
misaligned, writes can slow down to half speed.  For example, a 4K 
filesystem block written to an aligned partition writes a single block. 
If the partition is misaligned, that 4K write is split over two disk 
blocks.  Each block has to be read, partly modified, then written, 
taking roughly twice as long.

Finally, ZFS's ashift controls the minimum size of block ZFS uses. 
ashift=12 (12 bits) sets that to 4K blocks (2^12=4096).  Again, a 
performance thing, matching the filesystem block size to device block 
size.

It would be interesting to see a benchmark of ZFS on a 4K drive with 
different ashift values.
--3512871622-236036210-1374082519=:91446--

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 18:22:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 177447B6
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 18:22:33 +0000 (UTC)
 (envelope-from gezeala@gmail.com)
Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com
 [IPv6:2a00:1450:4010:c03::22c])
 by mx1.freebsd.org (Postfix) with ESMTP id 8CB5D224
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 18:22:32 +0000 (UTC)
Received: by mail-la0-f44.google.com with SMTP id er20so1795395lab.31
 for <freebsd-fs@freebsd.org>; Wed, 17 Jul 2013 11:22:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=yaEL1JjazyUBwi1Dsigqj7eHj8ZWc7148BhP83wQ7BM=;
 b=ZTA6qiWTLmMNVUI5c+7Fkja23IZE6aO4a7KjD2KZMsu/k3DrxpD1/fjROAhn6YPwJb
 VqukGDCxkvnDyXfPatndid7BcJmNNKKXjcJIoyEZKuoHXCJwBQDLTs58Q38wdm2IfLaN
 +jPf9lekFuwmUjXfilxYB3s+2jdytGg3LklV3bl38suNuDYOfyHowAD3XHJYDWEDH9rO
 17LHK25wRVbjVQrrJ8sdieeOglbrmBc4H40J5zS91ORv/IzjXFpKYfJkr1eRG0l+t/Od
 xqEsQo2iGlaZQIiszI+dwjotHw1UrSZNeBAjEE0iXR7YzhS3QuXGM52hujc8UNquFBIp
 JqTQ==
X-Received: by 10.112.51.16 with SMTP id g16mr3707852lbo.0.1374085351426; Wed,
 17 Jul 2013 11:22:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.82.72 with HTTP; Wed, 17 Jul 2013 11:21:51 -0700 (PDT)
In-Reply-To: <alpine.BSF.2.00.1307171111460.91446@wonkity.com>
References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com>
 <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se>
 <51D437E2.4060101@digsys.bg>
 <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk>
 <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se>
 <20130704000405.GA75529@icarus.home.lan>
 <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se>
 <20130704171637.GA94539@icarus.home.lan>
 <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se>
 <20130704191203.GA95642@icarus.home.lan>
 <43015E9015084CA6BAC6978F39D22E8B@multiplay.co.uk>
 <CAOjFWZ4obK1cSmvTpW+t4xKdMf+kJV5w-sujDT1AZoepj+5YrA@mail.gmail.com>
 <3CFB4564D8EB4A6A9BCE2AFCC5B6E400@multiplay.co.uk>
 <51D6A206.2020303@digsys.bg>
 <CAOjFWZ5CWV3UZRppM3nTehfTPaw1N+w6LjsEZZGxE16DOkS+GA@mail.gmail.com>
 <CAJKO3mW=ahm7sBdjGc-b2tN3D7+QH7gR7UPt24RsqUdZf=+jvA@mail.gmail.com>
 <alpine.BSF.2.00.1307161935020.84500@wonkity.com>
 <CAJKO3mWqUTB22Hu=Ce+y_0Om96Ai1ZgLL_3dEcC4p6DXoy3YLQ@mail.gmail.com>
 <alpine.BSF.2.00.1307171111460.91446@wonkity.com>
From: =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
Date: Wed, 17 Jul 2013 11:21:51 -0700
Message-ID: <CAJKO3mWhbEFaNRrQgsUgGm+1sOvmA+2VMNVyBvqvs9F8P7xkLQ@mail.gmail.com>
Subject: Re: Slow resilvering with mirrored ZIL
To: Warren Block <wblock@wonkity.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 18:22:33 -0000

On Wed, Jul 17, 2013 at 10:35 AM, Warren Block <wblock@wonkity.com> wrote:

> On Wed, 17 Jul 2013, Gezeala M. Bacu=F1o II wrote:
>
>  If ZFS goes on a bare drive, it will be aligned by default.  If ZFS is
>> going in a partition, yes, align that partition to 4K boundaries or larg=
er
>> multiples of 4K, like 1M.
>>
>> Your statement is enlightening and concise, exactly what I need. Thanks.
>>
>>       The gnop/ashift workaround is just to get ZFS to use the right
>> block size.  So if you don't take care to get partition alignment right,
>> you might end up using the right
>>       block size but misaligned.
>>
>>       And yes, it will be nice to be able to just explicitly tell ZFS th=
e
>> block size to use.
>>
>>
>> We do add the entire drive (no partitions) to ZFS, perform gnop/ashift
>> and other necessary steps and then verify ashift=3D12 through zdb.
>>
>> The gpart/gnop/ashift steps, if I understand correctly (do correct me if
>> I'm stating this incorrectly), is needed for further SSD performance
>> tuning. Taking into consideration leaving a
>> certain chunk for wear leveling and also if the SSD has a size that may
>> be too big for L2ARC.
>>
>
> Well, there are several things going on.
>
> Partitions can be used for a couple of things.  Limiting the size of spac=
e
> available to ZFS, leaving an unallocated part of the drive for wear
> leveling.  Note that ZFS on FreeBSD now has TRIM, which should make leavi=
ng
> unused space on SSDs unnecessary.
>
> Aligning partitions preserves performance.  If a partition is misaligned,
> writes can slow down to half speed.  For example, a 4K filesystem block
> written to an aligned partition writes a single block. If the partition i=
s
> misaligned, that 4K write is split over two disk blocks.  Each block has =
to
> be read, partly modified, then written, taking roughly twice as long.
>
> Finally, ZFS's ashift controls the minimum size of block ZFS uses.
> ashift=3D12 (12 bits) sets that to 4K blocks (2^12=3D4096).  Again, a
> performance thing, matching the filesystem block size to device block siz=
e.
>
> It would be interesting to see a benchmark of ZFS on a 4K drive with
> different ashift values.


Right on again. I forgot to include on my reply, that it is for a specific
use case similar to ours, wherein we dedicate the entire drive to the pool.
I believe it is totally time to put all these howto/faq stuff on a central
FreeBSD repository, I think there's another thread requesting for the same
thing.

Scenarios:
a] maximizing pool and drive size, don't need to partition -- these are the
steps.
     a.1] new drives
     a.2] used drives
b] For those with limited drives, limited enclosures etc -- these are the
steps you may want to check out
c] zfs-on-root
d] and so on..

This will help a lot on deciding which steps to follow and which are
necessary or not, therefore, avoiding all these repeated questions (just
like mine) on ZFS setup/performance/tuning.

https://wiki.freebsd.org/ZFSTuningGuide (WIP) -  outdated, and there's no
section for initial zpool/drive(s) setup.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 17 19:46:05 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A8030E47;
 Wed, 17 Jul 2013 19:46:05 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 0E2C07BC;
 Wed, 17 Jul 2013 19:46:04 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6HJjvNV095405;
 Wed, 17 Jul 2013 22:45:57 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6HJjvNV095405
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6HJjvUH095403;
 Wed, 17 Jul 2013 22:45:57 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 17 Jul 2013 22:45:57 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: zfs_rename: another zfs+vfs deadlock
Message-ID: <20130717194557.GU5991@kib.kiev.ua>
References: <51E679FD.3040306@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="/Isdj7O9hWi8F9Bn"
Content-Disposition: inline
In-Reply-To: <51E679FD.3040306@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-fs@FreeBSD.org, zfs-devel@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 19:46:05 -0000


--/Isdj7O9hWi8F9Bn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 17, 2013 at 02:03:25PM +0300, Andriy Gapon wrote:
> A scenario to reproduce this bug could be like this.
> mkdir a
> mkdir a/b
> mv some-file a/b/ (in parallel with) stat a/b
> Of course it would have to be repeated many times to hit the right timing
> window.  Also, namecache could interfere with this scenario, but I am not=
 sure.
>=20

There is no questions or proposals on how to approach the fix, JFYI mail ?

I recommend you to look at the ufs_checkpath() and its use in the
ufs_rename().

--/Isdj7O9hWi8F9Bn
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR5vR0AAoJEJDCuSvBvK1BevYP/03MlbINCVbX1tI9KuT02IPF
KK1YPWykvf11h/GmeONiZv3qZjvYWe9jwkga4f9Hrb6DjAhIZS+3MuIwLK12yANd
xfNNFF7XMHcoxyvuF4wDeufgn04ttRgREV0vaDFnODL+fMhzuz7sfjXI4lM9x6+0
nZaAjsS8eR2rYgC2z0oPRyBK+/mMldayM5FWUXBynLpkjgwlk7XP7A6BX9Fw7Mtp
vFVKtGSg613ugUYZWwgI5gzJbUjtGCO7l6gQyYQCDGBeetWmyPLRHfz2aS+KsPEI
cpG5vi7ruXcA9KMUg8jW9M+9qyMcCKWsnkkTUcpUOXNhbpDMaRKthGM1MVSu8HA6
Q1KfdVuXWPYgg8GJvrBXo6UjgPQmzp/Gw2a4SE/DcHhZ4ouusU0lxX0TOErf+wHW
4i8vWCJO4zk7HIpX546wLqF7eOzDSGJ3VdCkWNheeO6ca7f8wAW8f2/8mD1iBdZo
s3wcGSfAKcYXJMX5J7SwTtFtv8V36lU4+XxOo0KiW/tDTu07sPyo7Zgw6iRwnlr+
+KYJzqTI0RftjD0lKlJPYZJTSYIPYffzu9fweiyrO9BbzQf/k+amDK00k30oy1D9
zf0olSwJN+2FhfnzQJf9P+3Urq10JilpmH4xJwuy3M8yKtqQ/eLh4no2ojAORErl
nr17M0hGUNV9MUHmaDwL
=zpSb
-----END PGP SIGNATURE-----

--/Isdj7O9hWi8F9Bn--

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 07:29:23 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id DF647735;
 Thu, 18 Jul 2013 07:29:23 +0000 (UTC)
 (envelope-from joe@karthauser.co.uk)
Received: from babel.karthauser.co.uk (212-13-197-151.karthauser.co.uk
 [212.13.197.151])
 by mx1.freebsd.org (Postfix) with ESMTP id 3C0EBBC7;
 Thu, 18 Jul 2013 07:29:22 +0000 (UTC)
Received: from [192.168.10.240] (unknown [81.144.225.214])
 (Authenticated sender: joemail@tao.org.uk)
 by babel.karthauser.co.uk (Postfix) with ESMTPSA id B4FFC290E;
 Thu, 18 Jul 2013 07:29:13 +0000 (UTC)
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
From: Dr Josef Karthauser <joe@karthauser.co.uk>
Date: Thu, 18 Jul 2013 08:29:14 +0100
Message-Id: <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
References: <20130716225013.1C63B23A@babel.karthauser.co.uk>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.1508)
Content-Type: text/plain;
	charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 07:29:23 -0000

Hi there,

I'm scratching my head. I've just migrated to a super micro chassis and =
at the same time gone from FreeBSD 9.0 to 9.1-RELEASE.

The machine in question is running a ZFS mirror configuration on two ada =
devices (with a 8gb gmirror carved out for swap).

Since doing so I've been having strange drop outs on the drives; the =
just disappear from the bus like so:

(ada2:ahcich2:0:0:0): removing device entry
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 =
(ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 =
(ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted


At first I though it was a failing drive - one of the drives did this, =
and I limped on a single drive for a week until I could get someone up =
to the rack to plug a third drive in.  We resilvered the zpool onto the =
new device and ran with the failed drive still plugged in (but not =
responding to a reset on the ada bus with camcontrol) for a week or so.

Then, the new drive dropped out in exactly the same way, followed in =
short order by the remaining original drive!!!

After rebooting the machine, and observing all three drives probing and =
available, I resilvered the gmirror and zpool again on the two devices =
expected that I thought were reliable, but before the resilvering was =
completed the new drive dropped out again.

I'm scratching my head now. I can't imagine that it's a wiring problem, =
as they are all on individual SATA buses and individually cabled.

Smart isn't reporting an drive issues either=85. :/

So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade =
to 9-RELENG would I expect that to resolve the problem?  (Have there =
been any reported ada bus issuer reported since last December?)

The hardware in question is:

ahci0: <Intel Cougar Point AHCI SATA controller> port =
0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f =
mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8


Any ideas would be greatly welcomed.

Thanks,
Joe


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 07:33:14 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 822B993A;
 Thu, 18 Jul 2013 07:33:14 +0000 (UTC)
 (envelope-from prvs=1911771df7=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 02AF4C0C;
 Thu, 18 Jul 2013 07:33:13 +0000 (UTC)
Received: from r2d2 ([82.69.141.170])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50005041117.msg;
 Thu, 18 Jul 2013 08:33:10 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Jul 2013 08:33:10 +0100
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 82.69.141.170
X-Return-Path: prvs=1911771df7=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <33EF2240EDC1446D8E45F8C51974136B@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Dr Josef Karthauser" <joe@karthauser.co.uk>,
	<freebsd-fs@freebsd.org>
References: <20130716225013.1C63B23A@babel.karthauser.co.uk>
 <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
Subject: Re: Drive failures with ada on FreeBSD-9.1,
 driver bug or wiring issue?
Date: Thu, 18 Jul 2013 08:33:37 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="Windows-1252";
 reply-type=original
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: freebsd-stable@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 07:33:14 -0000

What chassis is this?

----- Original Message ----- 
From: "Dr Josef Karthauser" <joe@karthauser.co.uk>
To: <freebsd-fs@freebsd.org>
Cc: <freebsd-stable@freebsd.org>
Sent: Thursday, July 18, 2013 8:29 AM
Subject: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?


Hi there,

I'm scratching my head. I've just migrated to a super micro chassis and at the same time gone from FreeBSD 9.0 to 9.1-RELEASE.

The machine in question is running a ZFS mirror configuration on two ada devices (with a 8gb gmirror carved out for swap).

Since doing so I've been having strange drop outs on the drives; the just disappear from the bus like so:

(ada2:ahcich2:0:0:0): removing device entry
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
(aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted


At first I though it was a failing drive - one of the drives did this, and I limped on a single drive for a week until I could get 
someone up to the rack to plug a third drive in.  We resilvered the zpool onto the new device and ran with the failed drive still 
plugged in (but not responding to a reset on the ada bus with camcontrol) for a week or so.

Then, the new drive dropped out in exactly the same way, followed in short order by the remaining original drive!!!

After rebooting the machine, and observing all three drives probing and available, I resilvered the gmirror and zpool again on the 
two devices expected that I thought were reliable, but before the resilvering was completed the new drive dropped out again.

I'm scratching my head now. I can't imagine that it's a wiring problem, as they are all on individual SATA buses and individually 
cabled.

Smart isn't reporting an drive issues either�. :/

So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade to 9-RELENG would I expect that to resolve the problem? 
(Have there been any reported ada bus issuer reported since last December?)

The hardware in question is:

ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 
0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8


Any ideas would be greatly welcomed.

Thanks,
Joe

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 07:53:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C388FF67
 for <freebsd-fs@freebsd.org>; Thu, 18 Jul 2013 07:53:33 +0000 (UTC)
 (envelope-from maurizio.vairani@cloverinformatica.it)
Received: from smtpdg10.aruba.it (smtpdg4.aruba.it [62.149.158.234])
 by mx1.freebsd.org (Postfix) with ESMTP id ACD9CCFE
 for <freebsd-fs@freebsd.org>; Thu, 18 Jul 2013 07:53:31 +0000 (UTC)
Received: from cloverinformatica.it ([188.10.129.202])
 by smtpcmd04.ad.aruba.it with bizsmtp
 id 1jsB1m0114N8xN401jsCrs; Thu, 18 Jul 2013 09:52:15 +0200
Received: from [192.168.0.100] (MAURIZIO-PC [192.168.0.100])
 by cloverinformatica.it (Postfix) with ESMTP id 3FDB9FCB3;
 Thu, 18 Jul 2013 09:52:12 +0200 (CEST)
Message-ID: <51E79EAD.5040602@cloverinformatica.it>
Date: Thu, 18 Jul 2013 09:52:13 +0200
From: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: "Julian H. Stacey" <jhs@berklix.com>
Subject: Re: [SOLVED] Re: Shutdown problem with an USB memory stick as ZFS
 cache device
References: <201307171529.r6HFT4EK063849@fire.js.berklix.net>
In-Reply-To: <201307171529.r6HFT4EK063849@fire.js.berklix.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org,
 Ronald Klop <ronald-freebsd8@klop.yi.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 07:53:33 -0000

On 17/07/2013 17:29, Julian H. Stacey wrote:
> Maurizio Vairani wrote:
>> On 17/07/2013 11:50, Ronald Klop wrote:
>>> On Wed, 17 Jul 2013 10:27:09 +0200, Maurizio Vairani
>>> <maurizio.vairani@cloverinformatica.it>  wrote:
>>>
>>>> Hi all,
>>>>
>>>>
>>>> on a Compaq Presario laptop I have just installed the latest stable
>>>>
>>>>
>>>> #uname -a
>>>>
>>>> FreeBSD presario 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0: Tue Jul 16
>>>> 16:32:39 CEST 2013     root@presario:/usr/obj/usr/src/sys/GENERIC  amd64
>>>>
>>>>
>>>> For speed up the compilation I have added to the pool, tank0,  a
>>>> SanDisk memory stick as cache device with the command:
>>>>
>>>>
>>>> # zpool add tank0 cache /dev/da0
>>>>
>>>>
>>>> But when I shutdown the laptop the process will halt with this screen
>>>> shot:
>>>>
>>>>
>>>> http://www.dump-it.fr/freebsd-screen-shot/2f9169f18c7c77e52e873580f9c2d4bf.jpg.html
>>>>
>>>>
>>>>
>>>> and I need to press the power button for more than 4 seconds to
>>>> switch off the laptop.
>>>>
>>>> The problem is always reproducible.
>>> Does sysctl hw.usb.no_shutdown_wait=1 help?
>>>
>>> Ronald.
>> Thank you Ronald it works !
>>
>> In /boot/loader.conf added the line
>> hw.usb.no_shutdown_wait=1
>>
>> Maurizio
> I wonder (from ignorance as I dont use ZFS yet),
> if that merely masks the symptom or cures the fault ?
>
> Presumably one should use a ZFS command to disassociate whatever
> might have the cache open ?  (in case something might need to be
> written out from cache, if it was a writeable cache ?)
>
> I too had a USB shutdown problem (non ZFS, now solved)&  several people
> made useful comments on shutdown scripts etc, so I'm cross referencing:
>
> http://lists.freebsd.org/pipermail/freebsd-mobile/2013-July/012803.html
>
> Cheers,
> Julian
Probably it masks the symptom. Andriy Gapon hypothesizes a bug in the 
ZFS clean up code:
http://lists.freebsd.org/pipermail/freebsd-fs/2013-July/017857.html

Surely one can use a startup script with the command:
zpool add tank0 cache /dev/da0
and a shutdown script with:
zpool remove tank0 /dev/da0
but this mask the symptom too.

I prefer the Ronald solution because:
- is simpler: it adds only one line (hw.usb.no_shutdown_wait=1) to one 
file (/boot/loader.conf).
- is fastest: the zpool add/remove commands take time and 
�hw.usb.no_shutdown_wait=1� in /boot/loader.conf speeds up the shutdown 
process.
- is cleaner: the zpool add/remove commands pair will fill up the tank0 
pool history.

Regards
Maurizio

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 08:25:27 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 147B58BF;
 Thu, 18 Jul 2013 08:25:27 +0000 (UTC) (envelope-from rb@gid.co.uk)
Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250])
 by mx1.freebsd.org (Postfix) with ESMTP id C0550E7B;
 Thu, 18 Jul 2013 08:25:26 +0000 (UTC)
Received: from [194.32.164.26] (80-46-130-69.static.dsl.as9105.com
 [80.46.130.69])
 by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id r6I8POHj066332;
 Thu, 18 Jul 2013 09:25:25 +0100 (BST) (envelope-from rb@gid.co.uk)
Subject: Re: Drive failures with ada on FreeBSD-9.1,
 driver bug or wiring issue?
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset=windows-1252
From: Bob Bishop <rb@gid.co.uk>
In-Reply-To: <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
Date: Thu, 18 Jul 2013 09:25:19 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <281DBD06-81D5-4DDD-9464-B96C80C22C3F@gid.co.uk>
References: <20130716225013.1C63B23A@babel.karthauser.co.uk>
 <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
To: Dr Josef Karthauser <joe@karthauser.co.uk>
X-Mailer: Apple Mail (2.1283)
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
 "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 08:25:27 -0000

Hi,

On 18 Jul 2013, at 08:29, Dr Josef Karthauser wrote:

> Hi there,
>=20
> I'm scratching my head. I've just migrated to a super micro chassis =
and at the same time gone from FreeBSD 9.0 to 9.1-RELEASE.
>=20
> The machine in question is running a ZFS mirror configuration on two =
ada devices (with a 8gb gmirror carved out for swap).
>=20
> Since doing so I've been having strange drop outs on the drives; the =
just disappear from the bus like so:
>=20
> (ada2:ahcich2:0:0:0): removing device entry
> (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
> (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
> (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 =
(ABRT )
> (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
> (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
> (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
> (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error
> (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 =
(ABRT )
> (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
> (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
>=20
>=20
> At first I though it was a failing drive - one of the drives did this, =
and I limped on a single drive for a week until I could get someone up =
to the rack to plug a third drive in.  We resilvered the zpool onto the =
new device and ran with the failed drive still plugged in (but not =
responding to a reset on the ada bus with camcontrol) for a week or so.
>=20
> Then, the new drive dropped out in exactly the same way, followed in =
short order by the remaining original drive!!!
>=20
> After rebooting the machine, and observing all three drives probing =
and available, I resilvered the gmirror and zpool again on the two =
devices expected that I thought were reliable, but before the =
resilvering was completed the new drive dropped out again.
>=20
> I'm scratching my head now. I can't imagine that it's a wiring =
problem, as they are all on individual SATA buses and individually =
cabled.
>=20
> Smart isn't reporting an drive issues either=85. :/
>=20
> So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I =
upgrade to 9-RELENG would I expect that to resolve the problem?  (Have =
there been any reported ada bus issuer reported since last December?)
>=20
> The hardware in question is:
>=20
> ahci0: <Intel Cougar Point AHCI SATA controller> port =
0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f =
mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0
> ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> ahcich1: <AHCI channel> at channel 1 on ahci0
> ahcich2: <AHCI channel> at channel 2 on ahci0
> ahcich3: <AHCI channel> at channel 3 on ahci0
> ahcich4: <AHCI channel> at channel 4 on ahci0
> ahcich5: <AHCI channel> at channel 5 on ahci0
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada0: Previously was known as ad4
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada1: Previously was known as ad6
> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada2: <WDC WD1000FYPS-01ZKB0 02.01B01> ATA-8 SATA 2.x device
> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada2: Command Queueing enabled
> ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada2: Previously was known as ad8
>=20
>=20
> Any ideas would be greatly welcomed.
>=20
> Thanks,
> Joe

Me too (over a long period, with various hardware).

There is a general problem with energy-saving drives that controllers =
don't understand them. Typically the drive decides to go into some =
power-saving mode, the controller wants to do some operation, the drive =
takes too long to come ready, the controller decides the drive has gone =
away.

You have to persuade the controller to wait longer for the drive to come =
ready, and/or persuade the drive to stay awake. This isn't necessarily =
easy, eg the controller's ready wait may not be programmable.

(Or avoid such drives like the plague, life's too short).

--
Bob Bishop
rb@gid.co.uk


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 09:34:58 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C5400504;
 Thu, 18 Jul 2013 09:34:58 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id B404C1AA;
 Thu, 18 Jul 2013 09:34:57 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA19305;
 Thu, 18 Jul 2013 12:34:55 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1Uzkbv-0004kP-I2; Thu, 18 Jul 2013 12:34:55 +0300
Message-ID: <51E7B686.4090509@FreeBSD.org>
Date: Thu, 18 Jul 2013 12:33:58 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Adrian Chadd <adrian@FreeBSD.org>, Konstantin Belousov <kib@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
In-Reply-To: <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 09:34:58 -0000

on 17/07/2013 20:19 Adrian Chadd said the following:
> On 17 July 2013 04:26, Andriy Gapon <avg@freebsd.org> wrote:
>> One possibility is to add getnewvnode_reserve() calls before the ZFS transaction
>> beginnings in the places where a new vnode/znode may have to be allocated within
>> a transaction.
>> This looks like a quick and cheap solution but it makes the code somewhat messier.
>>
>> Another possibility is to change something in VFS machinery, so that VOP_RECLAIM
>> getting blocked for one filesystem does not prevent vnode allocation for other
>> filesystems.
>>
>> I could think of other possible solutions via infrastructural changes in VFS or
>> ZFS...
> 
> Well, what do others think? This seems like a showstopper for systems
> with lots and lots of ZFS filesystems doing lots and lots of activity.
> 

Looks like others are not speaking yet :-)

My current idea is that ZFS should set MNTK_SUSPEND in zfs_suspend_fs() path
before acquiring its z_teardown* locks.  This should make intentions of ZFS
visible to VFS.  And thus it should prevent VOP_RECLAIM call on a suspended ZFS
filesystem and that should prevent vnlru_free() getting stuck.
Hopefully this should break the deadlock cycle.

Kostik,

what is your opinion?
For your convenience here is a message with my analysis of this issue:
http://thread.gmane.org/gmane.os.freebsd.current/150889/focus=18534
-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 10:43:57 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E11CC69B;
 Thu, 18 Jul 2013 10:43:56 +0000 (UTC)
 (envelope-from godders@gmail.com)
Received: from mail-qc0-x22a.google.com (mail-qc0-x22a.google.com
 [IPv6:2607:f8b0:400d:c01::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id 9755D664;
 Thu, 18 Jul 2013 10:43:56 +0000 (UTC)
Received: by mail-qc0-f170.google.com with SMTP id s1so1641380qcw.1
 for <multiple recipients>; Thu, 18 Jul 2013 03:43:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=ZBPYMT1IGIdD6eDH1wQWLEzJObvK/lBROJZcfLNReWQ=;
 b=FdXSHoGfCY02bvlYMk51OOoKoi55LajFHINJ1U9faZoKySI1JwTCEo8ACcZZRcJJzB
 3zdO8xutOhlfQAC+dqldZFILsp6FLqntFv1qnABrRBgVimg9IkA4dG7RDDGXPQbkYdLm
 4NSzuqRnB26hjRPPmotkcndCqL32DKHv3r1hNP1wX1fqgPyxoxQr/FrPJfZ5+fdC1E8N
 P73WUaViq3Pg0/luZVuj0OZRNzIaVNIyYPNezYokfxQED8Buc2R7rpxRthkXryA8nY46
 +ZUGrS5j5shDlgNVXZIioWFTzRoHESilO3TrJR1pfyXhsCyyG+OFSZPgeVj/J9NJxjxd
 fTOg==
MIME-Version: 1.0
X-Received: by 10.229.105.218 with SMTP id u26mr2821014qco.8.1374144235032;
 Thu, 18 Jul 2013 03:43:55 -0700 (PDT)
Received: by 10.49.52.65 with HTTP; Thu, 18 Jul 2013 03:43:54 -0700 (PDT)
In-Reply-To: <20130717053431.GN5991@kib.kiev.ua>
References: <201307151932.r6FJWSxM087108@chez.mckusick.com>
 <51E5CD7A.2020109@FreeBSD.org> <20130717053431.GN5991@kib.kiev.ua>
Date: Thu, 18 Jul 2013 11:43:54 +0100
Message-ID: <CAG8duQ2gB11=bXzJ6hFFzzNUSwofn3WGd4=EEPJuhyNr6UmjwQ@mail.gmail.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
From: Dan Thomas <godders@gmail.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org,
 Palle Girgensohn <girgen@freebsd.org>, Jeff Roberson <jroberson@jroberson.net>,
 Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 10:43:57 -0000

After a bit of experimentation, we've managed to nail down a
reasonably short run that exhibits this leak. Postgres' verbose log
output is linked below - whatever is causing the leak is in there
somewhere, but alas I lack the necessary understanding of Postgres'
internals to be able to pin it down any further.

https://dl.dropboxusercontent.com/u/13916028/pg_leak_log.txt

I've also got a 2.4M ktrace of this run, which is still pretty big,
I'll admit. Unfortunately it's got some data in it that I'd rather not
publish, but I'm happy to send it directly to anyone who might find it
useful.

Thanks,

Dan

On 17 July 2013 06:34, Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Wed, Jul 17, 2013 at 12:47:22AM +0200, Palle Girgensohn wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Kirk McKusick skrev:
>> >> Date: Mon, 15 Jul 2013 10:51:10 +0100 Subject: Re: leaking lots of
>> >> unreferenced inodes (pg_xlog files?) From: Dan Thomas
>> >> <godders@gmail.com> To: Kirk McKusick <mckusick@mckusick.com> Cc:
>> >> Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org, Jeff
>> >> Roberson <jroberson@jroberson.net>, Julian Akehurst
>> >> <julian@pingpong.se> X-ASK-Info: Message Queued (2013/07/15
>> >> 02:51:22) X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
>> >>
>> >> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com>
>> >> wrote:
>> >>> OK, good to have it narrowed down. I will look to devise some
>> >>> additional diagnostics that hopefully will help tease out the
>> >>> bug. I'll hopefully get back to you soon.
>> >> Hi,
>> >>
>> >> Is there any news on this issue? We're still running several
>> >> servers that are exhibiting this problem (most recently, one that
>> >> seems to be leaking around 10gb/hour), and it's getting to the
>> >> point where we're looking at moving to a different OS until it's
>> >> resolved.
>> >>
>> >> We have access to several production systems with this problem and
>> >> (at least from time to time) will have systems with a significant
>> >> leak on them that we can experiment with. Is there any way we can
>> >> assist with tracking this down? Any diagnostics or testing that
>> >> would be useful?
>> >>
>> >> Thanks, Dan
>> >
>> > Hi Dan (and Palle),
>> >
>> > Sorry for the long delay with no help / news. I have gotten
>> > side-tracked on several projects and have had little time to try and
>> > devise some tests that would help find the cause of the lost space.
>> > It almost certainly is a one-line fix (a missing vput or vrele
>> > probably in some error path), but finding where it goes is the hard
>> > part :-)
>> >
>> > I have had little success in inserting code that tracks reference
>> > counts (too many false positives). So, I am going to need some help
>> > from you to narrow it down. My belief is that there is some set of
>> > filesystem operations (system calls) that are leading to the
>> > problem. Notably, a file is being created, data put into it, then the
>> > file is deleted (either before or after being closed).  Somehow a
>> > reference to that file is persisting despite there being no valid
>> > reference to it. Hence the filesystem thinks it is still live and is
>> > not deleting it. When you do the forcible unmount, these files get
>> > cleared and the space shows back up.
>> >
>> > What I need to devise is a small test program doing the set of system
>> > calls that cause this to happen. The way that I would like to try and
>> > get it is to have you `ktrace -i' your application and then run your
>> > application just long enough to create at least one of these lost
>> > files. The goal is to minimize the amount of ktrace data through
>> > which we need to sift.
>> >
>> > In preparation for doing this test you need to have a kernel compiled
>> > with `option DIAGNOSTIC' or if you prefer, just add `#define
>> > DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you
>> > have at least one offending file when you try to unmount the affected
>> > filesystem and find it busy. Before doing the `umount -f', enable
>> > busy printing using `sysctl debug.busyprt=1'. Then capture the
>> > console output which will show the details of all the vnodes that had
>> > to be forcibly flushed. Hopefully we will then be able to correlate
>> > them back to the files (NAMI in the ktrace output) with which they
>> > were associated. We may need to augment the NAMI data with the inode
>> > number of the associated file to make the association with the
>> > busyprt output. Anyway, once we have that, we can look at all the
>> > system calls done on those files and create a small test program that
>> > exhibits the problem. Given a small test program, Jeff or I can track
>> > down the offending system call path and nail this pernicious bug once
>> > and for all.
>> >
>> > Kirk McKusick
>>
>> Hi,
>>
>> I have run ktrace -i on pg_ctl (which forks off all the postgresql
>> processes) and I got two "busy" files that where "lost" after a few
>> hours. dmesg reveals this:
>>
>> vflush: busy vnode
>> 0xfffffe067cdde960: tag ufs, type VREG
>>     usecount 1, writecount 0, refcount 2 mountedhere 0
>>     flags (VI(0x200))
>>  VI_LOCKed    v_object 0xfffffe0335922000 ref 0 pages 0
>>     lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
>>       ino 11047146, on dev da2s1d
>> vflush: busy vnode
>> 0xfffffe039f35bb40: tag ufs, type VREG
>>     usecount 1, writecount 0, refcount 3 mountedhere 0
>>     flags (VI(0x200))
>>  VI_LOCKed    v_object 0xfffffe03352701d0 ref 0 pages 0
>>     lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
>>       ino 11045961, on dev da2s1d
>>
>>
>> I had to umount -f, so they where "lost".
>>
>> So, now I have 55 GB ktrace output... ;)  Is there anything I can do to
>> filter it, or shall I compress it and put it on a web server for you to
>> fetch as it is?
>
> I think that 55GB of ktrace is obviously useless.  The Kirk' idea was to
> have an isolated test case that would only create the situation triggering
> the leak, without irrelevant activity.  This indeed requires drilling down
> and isolating the file activities to get to the core of problem.
>
> FWIW, I and Peter Holm used the following alternative approach quite
> successfully when tracking down other vnode reference leaks.  The approach
> still requires some understanding of the specifics of the problematic
> files to be useful, but not as much as isolated test.
>
> Basically, you take the patch below, and set the VV_DEBUGVREF flag for
> the vnode that has characteristics as much specific for the leaked vnode
> as possible.  The patch has example of setting the flag for all new NFS
> vnodes.  You would probably want to do the same in vfs_vgetf(),
> checking  e.g. for the partition where your leaks happen.  The limiting
> of the vnodes for which the vref traces are accumulated is needed to
> save the kernel memory.
>
> Then after the leak was observed, you just print the vnode with ddb
> command 'show vnode addr' and send the output to developer.
>
> Index: sys/sys/vnode.h
> ===================================================================
> --- sys/sys/vnode.h     (revision 248723)
> +++ sys/sys/vnode.h     (working copy)
> @@ -94,6 +94,13 @@ struct vpollinfo {
>
>  #if defined(_KERNEL) || defined(_KVM_VNODE)
>
> +struct debug_ref {
> +       TAILQ_ENTRY(debug_ref) link;
> +       int val;
> +       const char *op;
> +       struct stack stack;
> +};
> +
>  struct vnode {
>         /*
>          * Fields which define the identity of the vnode.  These fields are
> @@ -169,6 +176,7 @@ struct vnode {
>         int     v_writecount;                   /* v ref count of writers */
>         u_int   v_hash;
>         enum    vtype v_type;                   /* u vnode type */
> +       TAILQ_HEAD(, debug_ref) v_debug_ref;
>  };
>
>  #endif /* defined(_KERNEL) || defined(_KVM_VNODE) */
> @@ -253,6 +261,7 @@ struct xvnode {
>  #define        VV_DELETED      0x0400  /* should be removed */
>  #define        VV_MD           0x0800  /* vnode backs the md device */
>  #define        VV_FORCEINSMQ   0x1000  /* force the insmntque to succeed */
> +#define        VV_DEBUGVREF    0x2000
>
>  /*
>   * Vnode attributes.  A field value of VNOVAL represents a field whose value
> Index: sys/kern/vfs_subr.c
> ===================================================================
> --- sys/kern/vfs_subr.c (revision 248723)
> +++ sys/kern/vfs_subr.c (working copy)
> @@ -71,6 +71,7 @@ __FBSDID("$FreeBSD$");
>  #include <sys/sched.h>
>  #include <sys/sleepqueue.h>
>  #include <sys/smp.h>
> +#include <sys/stack.h>
>  #include <sys/stat.h>
>  #include <sys/sysctl.h>
>  #include <sys/syslog.h>
> @@ -871,6 +872,23 @@ static struct kproc_desc vnlru_kp = {
>  };
>  SYSINIT(vnlru, SI_SUB_KTHREAD_UPDATE, SI_ORDER_FIRST, kproc_start,
>      &vnlru_kp);
> +
> +MALLOC_DEFINE(M_RECORD_REF, "recordref", "recordref");
> +static void
> +v_record_ref(struct vnode *vp, int val, const char *op)
> +{
> +       struct debug_ref *r;
> +
> +       if ((vp->v_type != VREG && vp->v_type != VBAD) ||
> +           (vp->v_vflag & VV_DEBUGVREF) == 0)
> +               return;
> +       r = malloc(sizeof(struct debug_ref), M_RECORD_REF, M_NOWAIT |
> +           M_USE_RESERVE);
> +       r->val = val;
> +       r->op = op;
> +       stack_save(&r->stack);
> +       TAILQ_INSERT_TAIL(&vp->v_debug_ref, r, link);
> +}
>
>  /*
>   * Routines having to do with the management of the vnode table.
> @@ -1073,6 +1091,7 @@ alloc:
>                         vp->v_vflag |= VV_NOKNOTE;
>         }
>         rangelock_init(&vp->v_rl);
> +       TAILQ_INIT(&vp->v_debug_ref);
>
>         /*
>          * For the filesystems which do not use vfs_hash_insert(),
> @@ -1082,6 +1101,7 @@ alloc:
>          */
>         vp->v_hash = (uintptr_t)vp >> vnsz2log;
>
> +       TAILQ_INIT(&vp->v_debug_ref);
>         *vpp = vp;
>         return (0);
>  }
> @@ -2197,6 +2217,7 @@ vget(struct vnode *vp, int flags, struct thread *t
>                         vinactive(vp, td);
>                 vp->v_iflag &= ~VI_OWEINACT;
>         }
> +       v_record_ref(vp, 1, "vget");
>         VI_UNLOCK(vp);
>         return (0);
>  }
> @@ -2211,6 +2232,7 @@ vref(struct vnode *vp)
>         CTR2(KTR_VFS, "%s: vp %p", __func__, vp);
>         VI_LOCK(vp);
>         v_incr_usecount(vp);
> +       v_record_ref(vp, 1, "vref");
>         VI_UNLOCK(vp);
>  }
>
> @@ -2253,6 +2275,7 @@ vputx(struct vnode *vp, int func)
>                 KASSERT(func == VPUTX_VRELE, ("vputx: wrong func"));
>         CTR2(KTR_VFS, "%s: vp %p", __func__, vp);
>         VI_LOCK(vp);
> +       v_record_ref(vp, -1, "vputx");
>
>         /* Skip this v_writecount check if we're going to panic below. */
>         VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp,
> @@ -2409,6 +2432,7 @@ void
>  vdropl(struct vnode *vp)
>  {
>         struct bufobj *bo;
> +       struct debug_ref *r, *r1;
>         struct mount *mp;
>         int active;
>
> @@ -2489,6 +2513,9 @@ vdropl(struct vnode *vp)
>         lockdestroy(vp->v_vnlock);
>         mtx_destroy(&vp->v_interlock);
>         mtx_destroy(BO_MTX(bo));
> +       TAILQ_FOREACH_SAFE(r, &vp->v_debug_ref, link, r1) {
> +               free(r, M_RECORD_REF);
> +       }
>         uma_zfree(vnode_zone, vp);
>  }
>
> @@ -2888,6 +2915,8 @@ vn_printf(struct vnode *vp, const char *fmt, ...)
>         va_list ap;
>         char buf[256], buf2[16];
>         u_long flags;
> +       int ref;
> +       struct debug_ref *r;
>
>         va_start(ap, fmt);
>         vprintf(fmt, ap);
> @@ -2960,8 +2989,21 @@ vn_printf(struct vnode *vp, const char *fmt, ...)
>                     vp->v_object->resident_page_count);
>         printf("    ");
>         lockmgr_printinfo(vp->v_vnlock);
> -       if (vp->v_data != NULL)
> -               VOP_PRINT(vp);
> +#if DDB
> +       if (kdb_active) {
> +               if (vp->v_data != NULL)
> +                       VOP_PRINT(vp);
> +       }
> +#endif
> +
> +       /* Getnewvnode() initial reference is not recorded due to VNON */
> +       ref = 1;
> +       TAILQ_FOREACH(r, &vp->v_debug_ref, link) {
> +               ref += r->val;
> +               printf("REF %d %s\n", ref, r->op);
> +               stack_print(&r->stack);
> +       }
> +
>  }
>
>  #ifdef DDB
> Index: sys/fs/nfsclient/nfs_clport.c
> ===================================================================
> --- sys/fs/nfsclient/nfs_clport.c       (revision 248723)
> +++ sys/fs/nfsclient/nfs_clport.c       (working copy)
> @@ -273,6 +273,7 @@ nfscl_nget(struct mount *mntp, struct vnode *dvp,
>                 /* vfs_hash_insert() vput()'s the losing vnode */
>                 return (0);
>         }
> +       vp->v_vflag |= VV_DEBUGVREF;
>         *npp = np;
>
>         return (0);
> Index: sys/fs/nfsclient/nfs_clnode.c
> ===================================================================
> --- sys/fs/nfsclient/nfs_clnode.c       (revision 248723)
> +++ sys/fs/nfsclient/nfs_clnode.c       (working copy)
> @@ -179,6 +179,7 @@ ncl_nget(struct mount *mntp, u_int8_t *fhp, int fh
>                 /* vfs_hash_insert() vput()'s the losing vnode */
>                 return (0);
>         }
> +       vp->v_vflag |= VV_DEBUGVREF;
>         *npp = np;
>
>         return (0);

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 11:23:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 07292DFE;
 Thu, 18 Jul 2013 11:23:33 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 2AABB82F;
 Thu, 18 Jul 2013 11:23:31 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6IBNPfn014753;
 Thu, 18 Jul 2013 14:23:25 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6IBNPfn014753
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6IBNPwO014752;
 Thu, 18 Jul 2013 14:23:25 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 18 Jul 2013 14:23:25 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Dan Thomas <godders@gmail.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
Message-ID: <20130718112325.GZ5991@kib.kiev.ua>
References: <201307151932.r6FJWSxM087108@chez.mckusick.com>
 <51E5CD7A.2020109@FreeBSD.org> <20130717053431.GN5991@kib.kiev.ua>
 <CAG8duQ2gB11=bXzJ6hFFzzNUSwofn3WGd4=EEPJuhyNr6UmjwQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="MrbiU6dcJfOZ616B"
Content-Disposition: inline
In-Reply-To: <CAG8duQ2gB11=bXzJ6hFFzzNUSwofn3WGd4=EEPJuhyNr6UmjwQ@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org,
 Palle Girgensohn <girgen@freebsd.org>, Jeff Roberson <jroberson@jroberson.net>,
 Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 11:23:33 -0000


--MrbiU6dcJfOZ616B
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 18, 2013 at 11:43:54AM +0100, Dan Thomas wrote:
> After a bit of experimentation, we've managed to nail down a
> reasonably short run that exhibits this leak. Postgres' verbose log
> output is linked below - whatever is causing the leak is in there
> somewhere, but alas I lack the necessary understanding of Postgres'
> internals to be able to pin it down any further.
>=20
> https://dl.dropboxusercontent.com/u/13916028/pg_leak_log.txt
This is of no use, at least for me.

>=20
> I've also got a 2.4M ktrace of this run, which is still pretty big,
> I'll admit. Unfortunately it's got some data in it that I'd rather not
> publish, but I'm happy to send it directly to anyone who might find it
> useful.
Such big ktrace is also unusable.

If you want me to look at the leak, use the patch which I sent earlier,
and add the flag to the vnodes which are likely to be leaked.  Then,
after 'show vnode', I would be able to see what is going on, I hope.

>=20
> Thanks,
>=20
> Dan
>=20
> On 17 July 2013 06:34, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > On Wed, Jul 17, 2013 at 12:47:22AM +0200, Palle Girgensohn wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Kirk McKusick skrev:
> >> >> Date: Mon, 15 Jul 2013 10:51:10 +0100 Subject: Re: leaking lots of
> >> >> unreferenced inodes (pg_xlog files?) From: Dan Thomas
> >> >> <godders@gmail.com> To: Kirk McKusick <mckusick@mckusick.com> Cc:
> >> >> Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org, Jeff
> >> >> Roberson <jroberson@jroberson.net>, Julian Akehurst
> >> >> <julian@pingpong.se> X-ASK-Info: Message Queued (2013/07/15
> >> >> 02:51:22) X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
> >> >>
> >> >> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com>
> >> >> wrote:
> >> >>> OK, good to have it narrowed down. I will look to devise some
> >> >>> additional diagnostics that hopefully will help tease out the
> >> >>> bug. I'll hopefully get back to you soon.
> >> >> Hi,
> >> >>
> >> >> Is there any news on this issue? We're still running several
> >> >> servers that are exhibiting this problem (most recently, one that
> >> >> seems to be leaking around 10gb/hour), and it's getting to the
> >> >> point where we're looking at moving to a different OS until it's
> >> >> resolved.
> >> >>
> >> >> We have access to several production systems with this problem and
> >> >> (at least from time to time) will have systems with a significant
> >> >> leak on them that we can experiment with. Is there any way we can
> >> >> assist with tracking this down? Any diagnostics or testing that
> >> >> would be useful?
> >> >>
> >> >> Thanks, Dan
> >> >
> >> > Hi Dan (and Palle),
> >> >
> >> > Sorry for the long delay with no help / news. I have gotten
> >> > side-tracked on several projects and have had little time to try and
> >> > devise some tests that would help find the cause of the lost space.
> >> > It almost certainly is a one-line fix (a missing vput or vrele
> >> > probably in some error path), but finding where it goes is the hard
> >> > part :-)
> >> >
> >> > I have had little success in inserting code that tracks reference
> >> > counts (too many false positives). So, I am going to need some help
> >> > from you to narrow it down. My belief is that there is some set of
> >> > filesystem operations (system calls) that are leading to the
> >> > problem. Notably, a file is being created, data put into it, then the
> >> > file is deleted (either before or after being closed).  Somehow a
> >> > reference to that file is persisting despite there being no valid
> >> > reference to it. Hence the filesystem thinks it is still live and is
> >> > not deleting it. When you do the forcible unmount, these files get
> >> > cleared and the space shows back up.
> >> >
> >> > What I need to devise is a small test program doing the set of system
> >> > calls that cause this to happen. The way that I would like to try and
> >> > get it is to have you `ktrace -i' your application and then run your
> >> > application just long enough to create at least one of these lost
> >> > files. The goal is to minimize the amount of ktrace data through
> >> > which we need to sift.
> >> >
> >> > In preparation for doing this test you need to have a kernel compiled
> >> > with `option DIAGNOSTIC' or if you prefer, just add `#define
> >> > DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you
> >> > have at least one offending file when you try to unmount the affected
> >> > filesystem and find it busy. Before doing the `umount -f', enable
> >> > busy printing using `sysctl debug.busyprt=3D1'. Then capture the
> >> > console output which will show the details of all the vnodes that had
> >> > to be forcibly flushed. Hopefully we will then be able to correlate
> >> > them back to the files (NAMI in the ktrace output) with which they
> >> > were associated. We may need to augment the NAMI data with the inode
> >> > number of the associated file to make the association with the
> >> > busyprt output. Anyway, once we have that, we can look at all the
> >> > system calls done on those files and create a small test program that
> >> > exhibits the problem. Given a small test program, Jeff or I can track
> >> > down the offending system call path and nail this pernicious bug once
> >> > and for all.
> >> >
> >> > Kirk McKusick
> >>
> >> Hi,
> >>
> >> I have run ktrace -i on pg_ctl (which forks off all the postgresql
> >> processes) and I got two "busy" files that where "lost" after a few
> >> hours. dmesg reveals this:
> >>
> >> vflush: busy vnode
> >> 0xfffffe067cdde960: tag ufs, type VREG
> >>     usecount 1, writecount 0, refcount 2 mountedhere 0
> >>     flags (VI(0x200))
> >>  VI_LOCKed    v_object 0xfffffe0335922000 ref 0 pages 0
> >>     lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
> >>       ino 11047146, on dev da2s1d
> >> vflush: busy vnode
> >> 0xfffffe039f35bb40: tag ufs, type VREG
> >>     usecount 1, writecount 0, refcount 3 mountedhere 0
> >>     flags (VI(0x200))
> >>  VI_LOCKed    v_object 0xfffffe03352701d0 ref 0 pages 0
> >>     lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
> >>       ino 11045961, on dev da2s1d
> >>
> >>
> >> I had to umount -f, so they where "lost".
> >>
> >> So, now I have 55 GB ktrace output... ;)  Is there anything I can do to
> >> filter it, or shall I compress it and put it on a web server for you to
> >> fetch as it is?
> >
> > I think that 55GB of ktrace is obviously useless.  The Kirk' idea was to
> > have an isolated test case that would only create the situation trigger=
ing
> > the leak, without irrelevant activity.  This indeed requires drilling d=
own
> > and isolating the file activities to get to the core of problem.
> >
> > FWIW, I and Peter Holm used the following alternative approach quite
> > successfully when tracking down other vnode reference leaks.  The appro=
ach
> > still requires some understanding of the specifics of the problematic
> > files to be useful, but not as much as isolated test.
> >
> > Basically, you take the patch below, and set the VV_DEBUGVREF flag for
> > the vnode that has characteristics as much specific for the leaked vnode
> > as possible.  The patch has example of setting the flag for all new NFS
> > vnodes.  You would probably want to do the same in vfs_vgetf(),
> > checking  e.g. for the partition where your leaks happen.  The limiting
> > of the vnodes for which the vref traces are accumulated is needed to
> > save the kernel memory.
> >
> > Then after the leak was observed, you just print the vnode with ddb
> > command 'show vnode addr' and send the output to developer.
> >
> > Index: sys/sys/vnode.h
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- sys/sys/vnode.h     (revision 248723)
> > +++ sys/sys/vnode.h     (working copy)
> > @@ -94,6 +94,13 @@ struct vpollinfo {
> >
> >  #if defined(_KERNEL) || defined(_KVM_VNODE)
> >
> > +struct debug_ref {
> > +       TAILQ_ENTRY(debug_ref) link;
> > +       int val;
> > +       const char *op;
> > +       struct stack stack;
> > +};
> > +
> >  struct vnode {
> >         /*
> >          * Fields which define the identity of the vnode.  These fields=
 are
> > @@ -169,6 +176,7 @@ struct vnode {
> >         int     v_writecount;                   /* v ref count of write=
rs */
> >         u_int   v_hash;
> >         enum    vtype v_type;                   /* u vnode type */
> > +       TAILQ_HEAD(, debug_ref) v_debug_ref;
> >  };
> >
> >  #endif /* defined(_KERNEL) || defined(_KVM_VNODE) */
> > @@ -253,6 +261,7 @@ struct xvnode {
> >  #define        VV_DELETED      0x0400  /* should be removed */
> >  #define        VV_MD           0x0800  /* vnode backs the md device */
> >  #define        VV_FORCEINSMQ   0x1000  /* force the insmntque to succe=
ed */
> > +#define        VV_DEBUGVREF    0x2000
> >
> >  /*
> >   * Vnode attributes.  A field value of VNOVAL represents a field whose=
 value
> > Index: sys/kern/vfs_subr.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- sys/kern/vfs_subr.c (revision 248723)
> > +++ sys/kern/vfs_subr.c (working copy)
> > @@ -71,6 +71,7 @@ __FBSDID("$FreeBSD$");
> >  #include <sys/sched.h>
> >  #include <sys/sleepqueue.h>
> >  #include <sys/smp.h>
> > +#include <sys/stack.h>
> >  #include <sys/stat.h>
> >  #include <sys/sysctl.h>
> >  #include <sys/syslog.h>
> > @@ -871,6 +872,23 @@ static struct kproc_desc vnlru_kp =3D {
> >  };
> >  SYSINIT(vnlru, SI_SUB_KTHREAD_UPDATE, SI_ORDER_FIRST, kproc_start,
> >      &vnlru_kp);
> > +
> > +MALLOC_DEFINE(M_RECORD_REF, "recordref", "recordref");
> > +static void
> > +v_record_ref(struct vnode *vp, int val, const char *op)
> > +{
> > +       struct debug_ref *r;
> > +
> > +       if ((vp->v_type !=3D VREG && vp->v_type !=3D VBAD) ||
> > +           (vp->v_vflag & VV_DEBUGVREF) =3D=3D 0)
> > +               return;
> > +       r =3D malloc(sizeof(struct debug_ref), M_RECORD_REF, M_NOWAIT |
> > +           M_USE_RESERVE);
> > +       r->val =3D val;
> > +       r->op =3D op;
> > +       stack_save(&r->stack);
> > +       TAILQ_INSERT_TAIL(&vp->v_debug_ref, r, link);
> > +}
> >
> >  /*
> >   * Routines having to do with the management of the vnode table.
> > @@ -1073,6 +1091,7 @@ alloc:
> >                         vp->v_vflag |=3D VV_NOKNOTE;
> >         }
> >         rangelock_init(&vp->v_rl);
> > +       TAILQ_INIT(&vp->v_debug_ref);
> >
> >         /*
> >          * For the filesystems which do not use vfs_hash_insert(),
> > @@ -1082,6 +1101,7 @@ alloc:
> >          */
> >         vp->v_hash =3D (uintptr_t)vp >> vnsz2log;
> >
> > +       TAILQ_INIT(&vp->v_debug_ref);
> >         *vpp =3D vp;
> >         return (0);
> >  }
> > @@ -2197,6 +2217,7 @@ vget(struct vnode *vp, int flags, struct thread *t
> >                         vinactive(vp, td);
> >                 vp->v_iflag &=3D ~VI_OWEINACT;
> >         }
> > +       v_record_ref(vp, 1, "vget");
> >         VI_UNLOCK(vp);
> >         return (0);
> >  }
> > @@ -2211,6 +2232,7 @@ vref(struct vnode *vp)
> >         CTR2(KTR_VFS, "%s: vp %p", __func__, vp);
> >         VI_LOCK(vp);
> >         v_incr_usecount(vp);
> > +       v_record_ref(vp, 1, "vref");
> >         VI_UNLOCK(vp);
> >  }
> >
> > @@ -2253,6 +2275,7 @@ vputx(struct vnode *vp, int func)
> >                 KASSERT(func =3D=3D VPUTX_VRELE, ("vputx: wrong func"));
> >         CTR2(KTR_VFS, "%s: vp %p", __func__, vp);
> >         VI_LOCK(vp);
> > +       v_record_ref(vp, -1, "vputx");
> >
> >         /* Skip this v_writecount check if we're going to panic below. =
*/
> >         VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < =
1, vp,
> > @@ -2409,6 +2432,7 @@ void
> >  vdropl(struct vnode *vp)
> >  {
> >         struct bufobj *bo;
> > +       struct debug_ref *r, *r1;
> >         struct mount *mp;
> >         int active;
> >
> > @@ -2489,6 +2513,9 @@ vdropl(struct vnode *vp)
> >         lockdestroy(vp->v_vnlock);
> >         mtx_destroy(&vp->v_interlock);
> >         mtx_destroy(BO_MTX(bo));
> > +       TAILQ_FOREACH_SAFE(r, &vp->v_debug_ref, link, r1) {
> > +               free(r, M_RECORD_REF);
> > +       }
> >         uma_zfree(vnode_zone, vp);
> >  }
> >
> > @@ -2888,6 +2915,8 @@ vn_printf(struct vnode *vp, const char *fmt, ...)
> >         va_list ap;
> >         char buf[256], buf2[16];
> >         u_long flags;
> > +       int ref;
> > +       struct debug_ref *r;
> >
> >         va_start(ap, fmt);
> >         vprintf(fmt, ap);
> > @@ -2960,8 +2989,21 @@ vn_printf(struct vnode *vp, const char *fmt, ...)
> >                     vp->v_object->resident_page_count);
> >         printf("    ");
> >         lockmgr_printinfo(vp->v_vnlock);
> > -       if (vp->v_data !=3D NULL)
> > -               VOP_PRINT(vp);
> > +#if DDB
> > +       if (kdb_active) {
> > +               if (vp->v_data !=3D NULL)
> > +                       VOP_PRINT(vp);
> > +       }
> > +#endif
> > +
> > +       /* Getnewvnode() initial reference is not recorded due to VNON =
*/
> > +       ref =3D 1;
> > +       TAILQ_FOREACH(r, &vp->v_debug_ref, link) {
> > +               ref +=3D r->val;
> > +               printf("REF %d %s\n", ref, r->op);
> > +               stack_print(&r->stack);
> > +       }
> > +
> >  }
> >
> >  #ifdef DDB
> > Index: sys/fs/nfsclient/nfs_clport.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- sys/fs/nfsclient/nfs_clport.c       (revision 248723)
> > +++ sys/fs/nfsclient/nfs_clport.c       (working copy)
> > @@ -273,6 +273,7 @@ nfscl_nget(struct mount *mntp, struct vnode *dvp,
> >                 /* vfs_hash_insert() vput()'s the losing vnode */
> >                 return (0);
> >         }
> > +       vp->v_vflag |=3D VV_DEBUGVREF;
> >         *npp =3D np;
> >
> >         return (0);
> > Index: sys/fs/nfsclient/nfs_clnode.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- sys/fs/nfsclient/nfs_clnode.c       (revision 248723)
> > +++ sys/fs/nfsclient/nfs_clnode.c       (working copy)
> > @@ -179,6 +179,7 @@ ncl_nget(struct mount *mntp, u_int8_t *fhp, int fh
> >                 /* vfs_hash_insert() vput()'s the losing vnode */
> >                 return (0);
> >         }
> > +       vp->v_vflag |=3D VV_DEBUGVREF;
> >         *npp =3D np;
> >
> >         return (0);

--MrbiU6dcJfOZ616B
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR59AsAAoJEJDCuSvBvK1BYsMQAJZj+18pwjeJQfyBoaK4Rif7
FK32mFljb85RxdGs9nX4Pmq91/00B6lv+8HmdlydYbOw4qiRm8x/hNverAr2GHSf
5ArGEJeTCwfXheG+kulivTo+sMrapeyR6XN5THIHjglBjrSBu8nUrAyNzyjaOFRq
2tLDn/NdibMJeBUKkVWMV3L7cmrIw3snF+kJc6f/1iDnBahOKPADxAHo/N1Exg2s
AC5wUuG+d5lrb/jFYaSoND1eDnWIVVu588GQuXrIo9N9GM9D+UVk1OM2pLEOISLM
7hL5mKagLD4wHpzW9FW6nlQjDcGQqJfkvYp+PqsVO6KGaVCk740N3rItDk9WGTfn
WNMGl0i9rDopvuaOBX/BwLrwdN/TQaXTHPdVdOHjDWyjqlmaG2r37AYZ2OOVfvhr
EH2UOAj1bcRfucGOrczxjSRFEI7honiOuw48RYZYNd4WUujSaA61vINdDXIhrYky
/+kTwvGpoBxNvE7pMmUg1fI0Ww/Kp1QsaObh/Kb9KmbOKDSNc8luVfLNbU+EnQaS
bfS9eUUkPvk86lgMLVoXRoXV707IF0r7SBosRgrc9IVc9xZjj4fNqDIvUiZrSoow
hQOPX6X9fZ1DSCngp9DuL4GZocv3levC/8wirpAydi5dTHs2XoP66P8Opn+SBzct
TRwTCYGSbZslIgTHuvMd
=NmSV
-----END PGP SIGNATURE-----

--MrbiU6dcJfOZ616B--

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 11:28:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 8BA2FED2;
 Thu, 18 Jul 2013 11:28:18 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id F295F85A;
 Thu, 18 Jul 2013 11:28:17 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6IBSEJA015801;
 Thu, 18 Jul 2013 14:28:14 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6IBSEJA015801
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6IBSEMo015800;
 Thu, 18 Jul 2013 14:28:14 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 18 Jul 2013 14:28:14 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
Message-ID: <20130718112814.GA5991@kib.kiev.ua>
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="GFPlsJ7YtLjXgs8j"
Content-Disposition: inline
In-Reply-To: <51E7B686.4090509@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-fs@FreeBSD.org, Adrian Chadd <adrian@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 11:28:18 -0000


--GFPlsJ7YtLjXgs8j
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 18, 2013 at 12:33:58PM +0300, Andriy Gapon wrote:
> on 17/07/2013 20:19 Adrian Chadd said the following:
> > On 17 July 2013 04:26, Andriy Gapon <avg@freebsd.org> wrote:
> >> One possibility is to add getnewvnode_reserve() calls before the ZFS t=
ransaction
> >> beginnings in the places where a new vnode/znode may have to be alloca=
ted within
> >> a transaction.
> >> This looks like a quick and cheap solution but it makes the code somew=
hat messier.
> >>
> >> Another possibility is to change something in VFS machinery, so that V=
OP_RECLAIM
> >> getting blocked for one filesystem does not prevent vnode allocation f=
or other
> >> filesystems.
> >>
> >> I could think of other possible solutions via infrastructural changes =
in VFS or
> >> ZFS...
> >=20
> > Well, what do others think? This seems like a showstopper for systems
> > with lots and lots of ZFS filesystems doing lots and lots of activity.
> >=20
>=20
> Looks like others are not speaking yet :-)
>=20
> My current idea is that ZFS should set MNTK_SUSPEND in zfs_suspend_fs() p=
ath
> before acquiring its z_teardown* locks.  This should make intentions of Z=
FS
> visible to VFS.  And thus it should prevent VOP_RECLAIM call on a suspend=
ed ZFS
> filesystem and that should prevent vnlru_free() getting stuck.
> Hopefully this should break the deadlock cycle.
>=20
> Kostik,
>=20
> what is your opinion?
> For your convenience here is a message with my analysis of this issue:
> http://thread.gmane.org/gmane.os.freebsd.current/150889/focus=3D18534

Well, I have no opinion.  Making the fs suspended, in other words, preventi=
ng
writers from entering the filesystem code, is probably good.  I do not
know zfs code to usefully comment on the approach.

Note that you must drain existing writers, i.e. call vfs_write_suspend(),
to set MNTK_SUSPEND.

--GFPlsJ7YtLjXgs8j
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR59FNAAoJEJDCuSvBvK1ByYMP/3njOyjvWN3fDjUVHiJmFgL+
9STNHDkzaHDTBd7TDtybGrqkljLoSrjkC9LVl6MyRwq2olJ1yhYQKmOlkaBeOaJt
rnuXvyGA2Wz4XTUIZVWaV/wtEPUMPskYv60ibYx00JuOFwA/oNR7J7fp/7bPirJ6
jPwQ+W9wU/Qzls3rMmhV2owqhSIUQD8egTB3Es/5Cda/+8zjR9yoQK0KLLCU4GbY
n8740XueGxZkTvM2C0ZstQ4JvRAbrRLKT7mCHadISov+ErPPwnnuWIYtYhB/gcq0
i9U5/JMNRyiTlyyDSEiePBtxf+iY9sxWYHi1hwWIWG28rLH3exEGn6kKzXB4q4Pe
NzRGJB4p8drGZb4NoUAikhqquY7Jmm8to5NMJzepV9AKa2a08WSHM4SMgk60oeUq
NO+XSpnazZK9Bu7shrYnlWdUjXAPzUzUlQArTRmI9cQjkEWiTzwpY2TFn6AFbvwM
HUu/AdDP4EBvrW/dyAeLmgocbErqZpNLlemLTBl6I3kfgB/Ytd3VcHbWZCMgP8cS
3DDbbaPqj6eFxXqObDgp+hAPhUaFvO8RW+FH3/SMj+zGjQ9+tmW9L47hB1jHbO8z
QIqXAQaAoqhATNurGVqj4qUtb3YX157Csw5+nRMTQ/IRmJghb5W5OxEALNyranmY
d4655Qai0ShZPuB/v8ZD
=aHUZ
-----END PGP SIGNATURE-----

--GFPlsJ7YtLjXgs8j--

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 13:41:49 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E90F79FC;
 Thu, 18 Jul 2013 13:41:49 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 0878DEBE;
 Thu, 18 Jul 2013 13:41:48 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA22718;
 Thu, 18 Jul 2013 16:41:39 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1UzoSg-00057j-T6; Thu, 18 Jul 2013 16:41:38 +0300
Message-ID: <51E7F05A.5020609@FreeBSD.org>
Date: Thu, 18 Jul 2013 16:40:42 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>,
 Adrian Chadd <adrian@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
In-Reply-To: <20130718112814.GA5991@kib.kiev.ua>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 13:41:50 -0000

on 18/07/2013 14:28 Konstantin Belousov said the following:
> Well, I have no opinion.  Making the fs suspended, in other words, preventing
> writers from entering the filesystem code, is probably good.  I do not
> know zfs code to usefully comment on the approach.

OK, fair.

> Note that you must drain existing writers, i.e. call vfs_write_suspend(),
> to set MNTK_SUSPEND.

Here is my take on it, not tested at all.

diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
index 0fc59cc..59c8cbd 100644
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
@@ -2263,8 +2263,12 @@ zfs_suspend_fs(zfsvfs_t *zfsvfs)
 {
 	int error;

-	if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0)
+	if ((error = vfs_write_suspend(zfsvfs->z_vfs)) != 0)
 		return (error);
+	if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0) {
+		vfs_write_resume(mp, 0);
+		return (error);
+	}
 	dmu_objset_disown(zfsvfs->z_os, zfsvfs);

 	return (0);
@@ -2339,5 +2343,6 @@ bail:
 	rrw_exit(&zfsvfs->z_teardown_lock, FTAG);

+	vfs_write_resume(mp, 0);
 	if (err) {
 		/*
 		 * Since we couldn't reopen zfsvfs::z_os, or

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 18:52:20 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id B9B36FDA;
 Thu, 18 Jul 2013 18:52:20 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 13BFFFD1;
 Thu, 18 Jul 2013 18:52:19 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6IIqF2c013999;
 Thu, 18 Jul 2013 21:52:15 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6IIqF2c013999
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6IIqFBP013998;
 Thu, 18 Jul 2013 21:52:15 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 18 Jul 2013 21:52:15 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
Message-ID: <20130718185215.GE5991@kib.kiev.ua>
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
 <51E7F05A.5020609@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="6PtN/jU//tuarfdA"
Content-Disposition: inline
In-Reply-To: <51E7F05A.5020609@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-fs@FreeBSD.org, Adrian Chadd <adrian@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 18:52:20 -0000


--6PtN/jU//tuarfdA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 18, 2013 at 04:40:42PM +0300, Andriy Gapon wrote:
> on 18/07/2013 14:28 Konstantin Belousov said the following:
> > Well, I have no opinion.  Making the fs suspended, in other words, prev=
enting
> > writers from entering the filesystem code, is probably good.  I do not
> > know zfs code to usefully comment on the approach.
>=20
> OK, fair.
>=20
> > Note that you must drain existing writers, i.e. call vfs_write_suspend(=
),
> > to set MNTK_SUSPEND.
>=20
> Here is my take on it, not tested at all.
>=20
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
> index 0fc59cc..59c8cbd 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
> @@ -2263,8 +2263,12 @@ zfs_suspend_fs(zfsvfs_t *zfsvfs)
>  {
>  	int error;
>=20
> -	if ((error =3D zfsvfs_teardown(zfsvfs, B_FALSE)) !=3D 0)
> +	if ((error =3D vfs_write_suspend(zfsvfs->z_vfs)) !=3D 0)
>  		return (error);
> +	if ((error =3D zfsvfs_teardown(zfsvfs, B_FALSE)) !=3D 0) {
> +		vfs_write_resume(mp, 0);
> +		return (error);
> +	}
>  	dmu_objset_disown(zfsvfs->z_os, zfsvfs);
>=20
>  	return (0);
> @@ -2339,5 +2343,6 @@ bail:
>  	rrw_exit(&zfsvfs->z_teardown_lock, FTAG);
>=20
> +	vfs_write_resume(mp, 0);
>  	if (err) {
>  		/*
>  		 * Since we couldn't reopen zfsvfs::z_os, or

There is VFS method VFS_SUSP_CLEAN, called when the suspension is
lifted.  UFS uses it to clean the back-queue of work which were
not performed during the suspend, mostly inactivate the postponed
inactive vnodes.  ZFS probably does not need it, since it does
not check for MNTK_SUSPEND, but if it starts care, there is a place
to put the code.

On the other hand, I believe that your patch is notoriously incomplete,
because there should be a lot of threads which mutate ZFS mounts state
and which do not call vn_start_write() around the mutations.  I.e.
all ZFS top-level code which calls into ZFS ops and which is not
coming from VFS.


--6PtN/jU//tuarfdA
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR6DlfAAoJEJDCuSvBvK1BMcAP/jw+dYeUw6RjJZCmJ7I6XrHa
aMYHMVjPuUH3h5jyE3avPZXd1OU67GkKmu+pQMF9qaMzPie6IkNK7FnK8Ey2Wy9D
1chQXv1ccaAvJTDk1VZbEWtctZ38V5CPm34ZbwGWk0wImjMU03C6D+fZRc9VzozM
hyxQAyDc89Nmmcxn6BnL4INJAAIASVB3QcRY2lO7/FKjAR/yxmy6R74/HvvRNDhk
QYvvSFJzNnB9wQByupYY69hbz18VQI3hyTMzh3xZkt9JcH2oCHanl8WkQUIxB4eJ
ZCBsKFZdV+rKLnXHfP4tlXmmXLTzFVJlf5u+vS1l4PhLWsV2IZceImFDRLaxq1OC
v2pF0DM2txWRODrsGY5Ie/DUwm7DoUghpu27rirkTcx84w3BWoD4/F5iEM1J+ZWY
MXyC7Gj4560v0lE4mDyw0ZKDVfOsmWH5dx4ElwFCk6Wvxk4/+Wg3qtnZlTsKMZkM
tJkw8lCj7xzg/FCg5ukXRnfuPixB3eiAbyEaQF1qR391YYypwfwSTfg3E2lYnSs0
BoijDzDaCj+8xlvl6CPY+/YKY6j1rTXFXcQQ5ayviOHVJjWo3kQAHukyWnZw/glx
fZR8rZkv13iAnXSxulJW4AZA3O0Ahy42IuWfVdL7g3cRFEzHPvkXkbi4h/l6J/Wf
uLiHucS8MyFMDh5GyDg/
=Bddh
-----END PGP SIGNATURE-----

--6PtN/jU//tuarfdA--

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 19:18:22 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 54DB782D;
 Thu, 18 Jul 2013 19:18:22 +0000 (UTC)
 (envelope-from joe@karthauser.co.uk)
Received: from babel.karthauser.co.uk (212-13-197-151.karthauser.co.uk
 [212.13.197.151])
 by mx1.freebsd.org (Postfix) with ESMTP id 24191188;
 Thu, 18 Jul 2013 19:18:21 +0000 (UTC)
Received: from phoenix.fritz.box (unknown [81.187.183.70])
 (Authenticated sender: joemail@tao.org.uk)
 by babel.karthauser.co.uk (Postfix) with ESMTPSA id 2C0A12AF5;
 Thu, 18 Jul 2013 19:18:21 +0000 (UTC)
Subject: Re: Drive failures with ada on FreeBSD-9.1,
 driver bug or wiring issue?
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Content-Type: text/plain; charset=windows-1252
From: Dr Josef Karthauser <joe@karthauser.co.uk>
X-Priority: 3
In-Reply-To: <33EF2240EDC1446D8E45F8C51974136B@multiplay.co.uk>
Date: Thu, 18 Jul 2013 20:18:20 +0100
Content-Transfer-Encoding: 7bit
Message-Id: <AF120A2D-17A3-498D-BE9D-DEF51A87E9D2@karthauser.co.uk>
References: <20130716225013.1C63B23A@babel.karthauser.co.uk>
 <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
 <33EF2240EDC1446D8E45F8C51974136B@multiplay.co.uk>
To: "Steven Hartland" <killing@multiplay.co.uk>
X-Mailer: Apple Mail (2.1508)
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 19:18:22 -0000

On 18 Jul 2013, at 08:33, "Steven Hartland" <killing@multiplay.co.uk> wrote:

> What chassis is this?

Hey Steven,

It's a Supermicro CSE-813MTQ-350CB.

Cheers,
Joe


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 18 19:35:45 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 12B244A2;
 Thu, 18 Jul 2013 19:35:45 +0000 (UTC)
 (envelope-from prvs=1911771df7=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 8719A2BC;
 Thu, 18 Jul 2013 19:35:44 +0000 (UTC)
Received: from r2d2 ([82.69.141.170])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50005048668.msg;
 Thu, 18 Jul 2013 20:35:41 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 18 Jul 2013 20:35:41 +0100
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 82.69.141.170
X-Return-Path: prvs=1911771df7=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <964B87D56B7C4B529E995E7A660E9AAD@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Dr Josef Karthauser" <joe@karthauser.co.uk>
References: <20130716225013.1C63B23A@babel.karthauser.co.uk>
 <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk>
 <33EF2240EDC1446D8E45F8C51974136B@multiplay.co.uk>
 <AF120A2D-17A3-498D-BE9D-DEF51A87E9D2@karthauser.co.uk>
Subject: Re: Drive failures with ada on FreeBSD-9.1,
 driver bug or wiring issue?
Date: Thu, 18 Jul 2013 20:36:03 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2013 19:35:45 -0000

----- Original Message ----- 
From: "Dr Josef Karthauser" <joe@karthauser.co.uk>
> On 18 Jul 2013, at 08:33, "Steven Hartland" <killing@multiplay.co.uk> wrote:
> 
>> What chassis is this?
> 
> Hey Steven,
> 
> It's a Supermicro CSE-813MTQ-350CB.

We've seen issues on supermicro chassis before which cause
timeouts and in extreme cases device drops so if you can try
wiring the disks up directly to the MB via sata cables bypassing
the hotswap midplane and see if that helps.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 10:19:31 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id CC5088D4;
 Fri, 19 Jul 2013 10:19:31 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id E07A5EB2;
 Fri, 19 Jul 2013 10:19:30 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA07949;
 Fri, 19 Jul 2013 13:19:27 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1V07mZ-0009cr-FX; Fri, 19 Jul 2013 13:19:27 +0300
Message-ID: <51E91277.3070309@FreeBSD.org>
Date: Fri, 19 Jul 2013 13:18:31 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-Vmomy3MrkSwJLQUGnDuD3EC3HzrudEghSDMeDwzVdaFNpLg@mail.gmail.com>
 <51DCFEDA.1090901@FreeBSD.org>
 <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
 <51E7F05A.5020609@FreeBSD.org> <20130718185215.GE5991@kib.kiev.ua>
In-Reply-To: <20130718185215.GE5991@kib.kiev.ua>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, Adrian Chadd <adrian@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 10:19:31 -0000

on 18/07/2013 21:52 Konstantin Belousov said the following:
> There is VFS method VFS_SUSP_CLEAN, called when the suspension is
> lifted.  UFS uses it to clean the back-queue of work which were
> not performed during the suspend, mostly inactivate the postponed
> inactive vnodes.  ZFS probably does not need it, since it does
> not check for MNTK_SUSPEND, but if it starts care, there is a place
> to put the code.

I will keep this in mind.

> On the other hand, I believe that your patch is notoriously incomplete,
> because there should be a lot of threads which mutate ZFS mounts state
> and which do not call vn_start_write() around the mutations.  I.e.
> all ZFS top-level code which calls into ZFS ops and which is not
> coming from VFS.

I agree.  What I am trying to fix right now is VFS<->ZFS interaction.  I think
that ZFS<->ZFS should already be fine - it's protected by internal ZFS locking.
OTOH, perhaps my understanding of what you said is incomplete or incorrect,
because VFS suspension mechanism is completely unknown to me yet.


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 10:22:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id BBB96A92;
 Fri, 19 Jul 2013 10:22:18 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id D466BED5;
 Fri, 19 Jul 2013 10:22:17 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA07981;
 Fri, 19 Jul 2013 13:22:15 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1V07pG-0009d9-Vk; Fri, 19 Jul 2013 13:22:15 +0300
Message-ID: <51E9131F.1060707@FreeBSD.org>
Date: Fri, 19 Jul 2013 13:21:19 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: zfs_rename: another zfs+vfs deadlock
References: <51E679FD.3040306@FreeBSD.org> <20130717194557.GU5991@kib.kiev.ua>
In-Reply-To: <20130717194557.GU5991@kib.kiev.ua>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, zfs-devel@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 10:22:18 -0000

on 17/07/2013 22:45 Konstantin Belousov said the following:
> On Wed, Jul 17, 2013 at 02:03:25PM +0300, Andriy Gapon wrote:
>> A scenario to reproduce this bug could be like this.
>> mkdir a
>> mkdir a/b
>> mv some-file a/b/ (in parallel with) stat a/b
>> Of course it would have to be repeated many times to hit the right timing
>> window.  Also, namecache could interfere with this scenario, but I am not sure.
>>
> 
> There is no questions or proposals on how to approach the fix, JFYI mail ?

I was just reporting the problem and my analysis of it.
A question of "how to fix" was implied.

> I recommend you to look at the ufs_checkpath() and its use in the
> ufs_rename().

Thank you.
That code is enlightening.  I do not think that the approach is directly
applicable to zfs_rename, unfortunately.  But I will try to see if the same kind
of approach could be used.

Also, I noticed that ufs_rename() checks for cross-device rename.  Should all
filesystems do that or should that check belong to VFS layer (if not already
done there)?

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 10:30:29 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 83AD6D48;
 Fri, 19 Jul 2013 10:30:29 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 2654AF3A;
 Fri, 19 Jul 2013 10:30:28 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6JAUPGc027572;
 Fri, 19 Jul 2013 13:30:25 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6JAUPGc027572
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6JAUPxH027549;
 Fri, 19 Jul 2013 13:30:25 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 19 Jul 2013 13:30:25 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
Message-ID: <20130719103025.GJ5991@kib.kiev.ua>
References: <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
 <51E7F05A.5020609@FreeBSD.org> <20130718185215.GE5991@kib.kiev.ua>
 <51E91277.3070309@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="WHG05yakhlzm8Hk1"
Content-Disposition: inline
In-Reply-To: <51E91277.3070309@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-fs@FreeBSD.org, Adrian Chadd <adrian@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 10:30:29 -0000


--WHG05yakhlzm8Hk1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 19, 2013 at 01:18:31PM +0300, Andriy Gapon wrote:
> on 18/07/2013 21:52 Konstantin Belousov said the following:
> > There is VFS method VFS_SUSP_CLEAN, called when the suspension is
> > lifted.  UFS uses it to clean the back-queue of work which were
> > not performed during the suspend, mostly inactivate the postponed
> > inactive vnodes.  ZFS probably does not need it, since it does
> > not check for MNTK_SUSPEND, but if it starts care, there is a place
> > to put the code.
>=20
> I will keep this in mind.
>=20
> > On the other hand, I believe that your patch is notoriously incomplete,
> > because there should be a lot of threads which mutate ZFS mounts state
> > and which do not call vn_start_write() around the mutations.  I.e.
> > all ZFS top-level code which calls into ZFS ops and which is not
> > coming from VFS.
>=20
> I agree.  What I am trying to fix right now is VFS<->ZFS interaction.  I =
think
> that ZFS<->ZFS should already be fine - it's protected by internal ZFS lo=
cking.
> OTOH, perhaps my understanding of what you said is incomplete or incorrec=
t,
> because VFS suspension mechanism is completely unknown to me yet.
>=20
I think that you should satisfy the VFS invariants, and prevent mutators
=66rom operating on the filesystem when MNTK_SUSPEND is set, for the
case mutators are running outside the context where VFS could call
vn_start_write() around.

--WHG05yakhlzm8Hk1
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR6RVAAAoJEJDCuSvBvK1Bjq4P/2tWXpxKdyvuEJiqeer5wEsm
DfErzv7U3fE9tDE0XNUgzDroXTWEQATJr3brdxUTpOvBVcQYKGWVR2jEAavZaWz9
YIH8tbat783yjiem0mvULlNRUJ7QRY12yPLMzetJkgmZAt3ocH4P6k+aHgqyisVu
InzNL+Ekc1+0uD4AqEShuueQ2raypLUnY8B7FfAM6APcSO4ARvo8O8Z808hjXk4g
cO8VGwvwwFxVT8j+7Woocs0pypRXyQkIhR6xVeBjst81VOzPdvvut8Ic9EH0nOdu
62YPkq4zwGQnyNLoYlWWYMYqNoA1D8AyzPpnmrT2PlVI6lZ3uBcRTRVIKxoQ37b9
h8zIrkHZK7f0o/f8X77VDVlFgzxQst637CjtDio+t9FKWYh5fG3DnR5kFqetXM6G
uRuGjn2f2YLKfL2om2bYNdb0CQePdwhehnnegiIA/atAnPGHY6+YZTLi/CTD1Aal
3RwGChQuVsFecZuhlCdaEAeiWMCj+e2wuJEnkX5zzwrWc93t7QYUqBPMXLvYEcEy
RwSlm2oN1HO3NX5q7vn9bWlRCyiYQILB4iGC5TIEM8RxYI46kQC8/+NVnIP5AC+n
JzWd7cS99E8QdCBkr1DLmEa8H9dQMAtpJegeN6pILtzh56DmASFtXgcKo58PkIHj
S8rrSKhnP9Cu5faDKVS2
=9m4K
-----END PGP SIGNATURE-----

--WHG05yakhlzm8Hk1--

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 15:36:56 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 68B131FF
 for <freebsd-fs@FreeBSD.org>; Fri, 19 Jul 2013 15:36:56 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id AD391DD
 for <freebsd-fs@FreeBSD.org>; Fri, 19 Jul 2013 15:36:55 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA11234;
 Fri, 19 Jul 2013 18:36:52 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1V0Cjk-000A2D-Md; Fri, 19 Jul 2013 18:36:52 +0300
Message-ID: <51E95CDD.7030702@FreeBSD.org>
Date: Fri, 19 Jul 2013 18:35:57 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-VmokctCmV4+y17uvqO9wXEyh0s+aXZ9nggvoAgP5+ZHSgFA@mail.gmail.com>
 <51E59FD9.4020103@FreeBSD.org>
 <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
 <51E7F05A.5020609@FreeBSD.org> <20130718185215.GE5991@kib.kiev.ua>
 <51E91277.3070309@FreeBSD.org> <20130719103025.GJ5991@kib.kiev.ua>
In-Reply-To: <20130719103025.GJ5991@kib.kiev.ua>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 15:36:56 -0000

on 19/07/2013 13:30 Konstantin Belousov said the following:
> I think that you should satisfy the VFS invariants, and prevent mutators
> from operating on the filesystem when MNTK_SUSPEND is set, for the
> case mutators are running outside the context where VFS could call
> vn_start_write() around.

I would like to inquire more about this suggestion.

With the proposed patch zfs_suspend_fs would first call vfs_write_suspend, which
would wait for all threads that came via VFS (and called vn_start_write) to
leave and it would also mark a filesystem as suspended and that would prevent
new VFS writers.
Then zfs_suspend_fs calls zfsvfs_teardown, which would wait for all threads in
ZFS vnode ops and vfs ops to leave and would block new calls to those ops.

So there is a window between the filesystem being marked as "VFS-suspended" and
it becoming fully "ZFS-suspended".  As I understand you are concerned about this
window.
I would like to understand what assumptions VFS code makes or could make about a
filesystem marked as suspended.  I also would like to be pointed to the code
that makes any such assumptions.

I need to understand this, because if there is any code that assumes that a
suspended filesystem is really frozen, then there can be a much larger problem.

Unlike UFS, ZFS does not use fs suspension for creating snapshots.  It does not
need to because of its COW nature and use of transactions.
ZFS uses suspension for rollbacks, receiving of ZFS streams and fs version
upgrades.  That is for operations that modify the on-disk and in-memory data and
metadata.

So even without that window the filesystem is going to be modified.  That's the
whole purpose of ZFS suspend.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 16:28:39 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id B970FF8;
 Fri, 19 Jul 2013 16:28:39 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id BA03A2D6;
 Fri, 19 Jul 2013 16:28:38 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA11557;
 Fri, 19 Jul 2013 19:28:36 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1V0DXo-000A5u-FQ; Fri, 19 Jul 2013 19:28:36 +0300
Message-ID: <51E968FC.20905@FreeBSD.org>
Date: Fri, 19 Jul 2013 19:27:40 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: VOP_MKDIR/VOP_CREATE and namecache
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 16:28:39 -0000


Should VOP_MKDIR and VOP_CREATE immediately insert newly created vnodes into the
namecache?  If yes, where would it be done best?  FS code, VFS code, VOP
post-hooks, something else?

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 18:35:04 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id D9C407A5;
 Fri, 19 Jul 2013 18:35:04 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 501CDB06;
 Fri, 19 Jul 2013 18:35:04 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6JIZ0E7029586;
 Fri, 19 Jul 2013 21:35:00 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6JIZ0E7029586
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6JIZ0Kr029585;
 Fri, 19 Jul 2013 21:35:00 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 19 Jul 2013 21:35:00 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: zfs_rename: another zfs+vfs deadlock
Message-ID: <20130719183500.GL5991@kib.kiev.ua>
References: <51E679FD.3040306@FreeBSD.org> <20130717194557.GU5991@kib.kiev.ua>
 <51E9131F.1060707@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="Vo48LVc30GAQuLuW"
Content-Disposition: inline
In-Reply-To: <51E9131F.1060707@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-fs@FreeBSD.org, zfs-devel@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 18:35:04 -0000


--Vo48LVc30GAQuLuW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, Jul 19, 2013 at 01:21:19PM +0300, Andriy Gapon wrote:
> Also, I noticed that ufs_rename() checks for cross-device rename.  Should all
> filesystems do that or should that check belong to VFS layer (if not already
> done there)?

In principle yes, this sounds right.

The only concern I see is layered filesystems like nullfs interaction
with filesystems below the bypass.  In other words, if any bypass
provided the aggregation, this should be checked at the bypass
layer too, in addition to the kern_renameat().

--Vo48LVc30GAQuLuW
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR6YbTAAoJEJDCuSvBvK1BhGsQAJ7PmCb57BzSzDUJCwydMcr3
fOD9M8UwR1NKxAWboxmIbhqjL7bfzzzeNGTHwhPj6NqQZQeBg8Lq0lvoKEqKTT6Z
lPl0acR7+V2IIwBD5wj7NBN6LkZvztXc92pUt7PmLOTi7sbNOC2r8eUIvEjyMjCC
O1tN4/eZiKGOk3F6ityRNjn4h2JUkwAhfn85gMrJOQvOuxVvo/AgARcxdplZdZIv
1WzZFtfWYrRGCjNwxQ0w4qE2amZ5aJudcXJdU3qiKh8Ss9s9TkLV+ZDj6+kofng+
YCbVuQ3xD9N8EpG/bmYnZV4gzWuD4hDsHBYf3Ba3DE7rdJfek7/K4TRVLnQxBCa6
toTkJijznXFjM33qpjORaNwOvFu+dWnWKmzgDMs6Ky32eeRPPqQz7Fe8IgJMD1C9
JDMZbGHJ/wqCR+vNKGaGrlZO4EL/L54IhqY2i1r3f2/fyMKBVq5bxwIxs3c3F2sw
qqF64vwsnfd1aeKUTtgCVVdaSRmrsG6hdjfgri4sMqX6GfjppAcqXf6sah9SzEcv
ibNiMut4q8Z6lfb9xPwsYrzubmyelQilf111bB9g7VzZuEDsEfTJoSbIPWKXPikP
6tn29wD6+E04zvL0KrCB5QtMHUhoS1l6mfrEMPXrm2wDgGhI1zBGNIdvZfSE4XxG
ToPDy+vHlxrf9lDmwM3J
=ojft
-----END PGP SIGNATURE-----

--Vo48LVc30GAQuLuW--

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 18:42:47 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 7E326A39;
 Fri, 19 Jul 2013 18:42:47 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id CCCABB53;
 Fri, 19 Jul 2013 18:42:46 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6JIghO5031678;
 Fri, 19 Jul 2013 21:42:43 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6JIghO5031678
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id r6JIghWh031677;
 Fri, 19 Jul 2013 21:42:43 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 19 Jul 2013 21:42:43 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: Deadlock in nullfs/zfs somewhere
Message-ID: <20130719184243.GM5991@kib.kiev.ua>
References: <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
 <51E7F05A.5020609@FreeBSD.org> <20130718185215.GE5991@kib.kiev.ua>
 <51E91277.3070309@FreeBSD.org> <20130719103025.GJ5991@kib.kiev.ua>
 <51E95CDD.7030702@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="GD0jJf8rm+K0B4Sk"
Content-Disposition: inline
In-Reply-To: <51E95CDD.7030702@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 18:42:47 -0000


--GD0jJf8rm+K0B4Sk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 19, 2013 at 06:35:57PM +0300, Andriy Gapon wrote:
> on 19/07/2013 13:30 Konstantin Belousov said the following:
> > I think that you should satisfy the VFS invariants, and prevent mutators
> > from operating on the filesystem when MNTK_SUSPEND is set, for the
> > case mutators are running outside the context where VFS could call
> > vn_start_write() around.
>=20
> I would like to inquire more about this suggestion.
>
> With the proposed patch zfs_suspend_fs would first call
> vfs_write_suspend, which would wait for all threads that came via
> VFS (and called vn_start_write) to leave and it would also mark a
> filesystem as suspended and that would prevent new VFS writers. Then
> zfs_suspend_fs calls zfsvfs_teardown, which would wait for all threads
> in ZFS vnode ops and vfs ops to leave and would block new calls to
> those ops.
>
> So there is a window between the filesystem being marked as
> "VFS-suspended" and it becoming fully "ZFS-suspended". As I understand
> you are concerned about this window. I would like to understand what
> assumptions VFS code makes or could make about a filesystem marked as
> suspended. I also would like to be pointed to the code that makes any
> such assumptions.
>
> I need to understand this, because if there is any code that assumes
> that a suspended filesystem is really frozen, then there can be a much
> larger problem.
The expectation that the suspended filesystem does not have user-visible
changes (e.g. seeing changes using the syscalls) or on-disk structures
changes is the guarantee of the suspend mechanism.

>
> Unlike UFS, ZFS does not use fs suspension for creating snapshots. It
> does not need to because of its COW nature and use of transactions.
> ZFS uses suspension for rollbacks, receiving of ZFS streams and fs
> version upgrades. That is for operations that modify the on-disk and
> in-memory data and metadata.
>
> So even without that window the filesystem is going to be modified.
> That's the whole purpose of ZFS suspend.
>

Then, you cannot use VFS suspension.  Or, in other words, you are directed
to abuse the VFS interface.  I assure you that any changes to the interface
would not take into account such abuse and probably break your hack.

--GD0jJf8rm+K0B4Sk
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR6YiiAAoJEJDCuSvBvK1BjiYP/RaiZQSt+pHZaceUt8aUrNUl
iAOoEsM+pwOzOcbHHovn4m/XnXWtC5UJDAJZH6M1HXjehOlRLx8tphJtanE9yorq
Q0mMzq3SjnoMRf9ZvUzA0xDakplA/Zlk4CfxyQ/KdizCFVM6QlrfTyw/OOQijvl+
uncNQ/6t6HYxh/UVqZPkUZvOKtlH1soG7qyBV5XDi7FVGhvweJlLdJCkKlidEaZi
XQMsLtoIYSCJrtldpZ/1Ah7sYUEPXOLbktTCdlhEr17YD+N0OPfrISEZO+vL4HW6
vK5yAAXiH730b+jgsAt/PuqIQCDjeIoWz/1v68deBQilZJQElV78aE4Iv8uP+w0e
5+4IPjvu1iM43sBzQG9f1gfUB3JuqgvgFQoQ1nDgXLhuops9+hAQpQC1Qv1Uzkrj
dYR5aoHEVHR5WIuJfunRPwpqWKPJR0VcO8YNtBzsIdbZ9Xwl+dRbSQYbHd9vY1ng
WAT/zK8PC2ntH13PQIVCHTdLU24/2gXEI6LnR8LWVm40ap0WVUn6fyDt/h55txcA
KmaSFghN21/S6atZm/Gx6vf8Y/TJAuoOLTU/ikNNCw1qY+ejpR34JeSYu/700kP+
A77JnWZP9XkwA7x7Q4HQZT5GU63Zy87uK497S/d+lDKYaCLY1xmjXyHzx+h9P1lR
hJ4/E5DCcV2dieownWKj
=Veoy
-----END PGP SIGNATURE-----

--GD0jJf8rm+K0B4Sk--

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 19:34:11 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 83BF578B
 for <freebsd-fs@FreeBSD.org>; Fri, 19 Jul 2013 19:34:11 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id CB4BDD76
 for <freebsd-fs@FreeBSD.org>; Fri, 19 Jul 2013 19:34:10 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA12708;
 Fri, 19 Jul 2013 22:34:07 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1V0GRL-000AKP-G5; Fri, 19 Jul 2013 22:34:07 +0300
Message-ID: <51E99477.1030308@FreeBSD.org>
Date: Fri, 19 Jul 2013 22:33:11 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130708 Thunderbird/17.0.7
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Deadlock in nullfs/zfs somewhere
References: <CAJ-VmokR8jJpdRc_kBJzhW4_R1pJnj3UPfsG5ANpq-kEGwCP9g@mail.gmail.com>
 <51E67F54.9080800@FreeBSD.org>
 <CAJ-Vmonk2HAzX38-mbL8hwxiUfL6JyJrMTq0dTBctW=P4dfyEQ@mail.gmail.com>
 <51E7B686.4090509@FreeBSD.org> <20130718112814.GA5991@kib.kiev.ua>
 <51E7F05A.5020609@FreeBSD.org> <20130718185215.GE5991@kib.kiev.ua>
 <51E91277.3070309@FreeBSD.org> <20130719103025.GJ5991@kib.kiev.ua>
 <51E95CDD.7030702@FreeBSD.org> <20130719184243.GM5991@kib.kiev.ua>
In-Reply-To: <20130719184243.GM5991@kib.kiev.ua>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 19:34:11 -0000

on 19/07/2013 21:42 Konstantin Belousov said the following:
> Then, you cannot use VFS suspension.  Or, in other words, you are directed
> to abuse the VFS interface.  I assure you that any changes to the interface
> would not take into account such abuse and probably break your hack.

So what would be your recommendation about this problem?
Should we add another flavor of VFS suspension?
The one that would mean "all external accesses to this fs must be put on hold",
but would not imply "this fs is frozen".

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 19 20:08:29 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D758CF95
 for <freebsd-fs@freebsd.org>; Fri, 19 Jul 2013 20:08:29 +0000 (UTC)
 (envelope-from david.i.noel@gmail.com)
Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com
 [IPv6:2a00:1450:400c:c05::235])
 by mx1.freebsd.org (Postfix) with ESMTP id 74F1CEB4
 for <freebsd-fs@freebsd.org>; Fri, 19 Jul 2013 20:08:29 +0000 (UTC)
Received: by mail-wi0-f181.google.com with SMTP id hq4so192546wib.14
 for <freebsd-fs@freebsd.org>; Fri, 19 Jul 2013 13:08:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:date:message-id
 :subject:from:to:content-type;
 bh=e5t5AjGio/+oZfur6aZ2BerWgq6RPVcFkMdDhQT7abE=;
 b=ub0W/93Uoe8hfQUUvyJSdhkZDDpwhanNczWbo2AcRhh3qGwoVkDHOfr3xYzBiUpWlq
 tPjBvaO1McWegQAKA/5WLtolruZtOMf4jcHIkZPHNBp69XayWznrof8QJ2oXZjJNpQlz
 D1dJ3W8YnQWa0DTs6E+TcGF8RtJMJoEV5b6ADdKFheZo2KOz3cIY/YrUPkgE6kuIq4ZH
 5Cx9RXmfqMaTskRAiRlZQ3yrvDBfXUo8gBWyZKrWoadq28/RvPsKAkWomTTuGH3eCCnh
 tndZ6Q/YlsckS8iUaRHOnWrBidAdCNeXBEnNtIZuCzzzb5DVDirRQWsaC/VNnsDcnPoi
 7OMg==
MIME-Version: 1.0
X-Received: by 10.180.20.228 with SMTP id q4mr12685116wie.1.1374264508482;
 Fri, 19 Jul 2013 13:08:28 -0700 (PDT)
Received: by 10.216.180.138 with HTTP; Fri, 19 Jul 2013 13:08:28 -0700 (PDT)
In-Reply-To: <CAHAXwYCLhfdQ0OOR0ctpsHGbGsp_RC6xDke7ecpX9jXdT0E_Kw@mail.gmail.com>
References: <CAHAXwYCaYGoF9N0GRFRiG_hqVxdM23NjMSfPsw+GPnfEmgtP9Q@mail.gmail.com>
 <CAHAXwYCLhfdQ0OOR0ctpsHGbGsp_RC6xDke7ecpX9jXdT0E_Kw@mail.gmail.com>
Date: Fri, 19 Jul 2013 15:08:28 -0500
Message-ID: <CAHAXwYDqSd_qtoV3u4rj7MsUcYVytSf3Lw0pVth=0tYCEPEQWg@mail.gmail.com>
Subject: Re: FreeBSD upgrade woes (8.3 -> 8.4)
From: David Noel <david.i.noel@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: David.I.Noel@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jul 2013 20:08:29 -0000

On 7/11/13, David Noel <david.i.noel@gmail.com> wrote:
> I've been directed to the freebsd-fs list, so hopefully I'm in the
> right place for this question.
>
> I have 4 servers I'm upgrading from 8.3 to 8.4. Two of them went
> without a hitch, two of them blew up in my face. The only difference
> between the two is the ones that worked have a 2-disk ZFS mirror and
> the ones that didn't have a 4-disk ZFS striped mirror configuration
> (RAID10). They both use the GPT.
>
> After installworld && installkernel they made it through boot, but
> right before the login prompt I'm getting a panic and stack dump. The
> backtrace looks something like this (roughly):
>
> 0 kdb_backtrace
> 1 panic
> 2 trap_fatal
> 3 trap_pfault
> 4 trap
> 5 calltrap
> 6 vdev_mirror_child_select
> 7 vdev_mirror_io_start
> 8 zio_vdev_io_start
> 9 zio_execute
> 10 arc_read
> 11 dbuf_read
> 12 dbuf_findbp
> 13 dbuf_hold_impl
> 14 dbuf_hold
> 15 dnode_hold_impl
> 16 dmu_buf_hold
> 17 zap_lockdir
>
> Does anyone have any idea what went wrong?
>
> Does anyone have any suggestions on how to get past this?
>
> Is there any more information I could provide to help debug this?
>
> Thanks,
>
> David

I replaced the kernel with the one on the 8.4 memstick and it booted
just fine. I then built and installed a kernel without using the j
flag to test the idea suggested on freebsd-questions@ that it could
have been a buggy kernel caused by j>1. It booted without problem.
Maybe there's something to this -j >1 causing buggy kernels rumor? At
any rate, I don't think I'll try buildkernel with j>1 again.

From owner-freebsd-fs@FreeBSD.ORG  Sat Jul 20 17:10:03 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 82BAEF55;
 Sat, 20 Jul 2013 17:10:03 +0000 (UTC)
 (envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 5E8E3BAC;
 Sat, 20 Jul 2013 17:10:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r6KHA32B086702;
 Sat, 20 Jul 2013 17:10:03 GMT
 (envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r6KHA3Q8086701;
 Sat, 20 Jul 2013 17:10:03 GMT (envelope-from linimon)
Date: Sat, 20 Jul 2013 17:10:03 GMT
Message-Id: <201307201710.r6KHA3Q8086701@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Subject: Re: kern/180678: [NFS] succesfully exported filesystems being
 reported as failed
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Jul 2013 17:10:03 -0000

Old Synopsis: succesfully exported filesystems being reported as failed
New Synopsis: [NFS] succesfully exported filesystems being reported as failed

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Sat Jul 20 17:09:44 UTC 2013
Responsible-Changed-Why: 
reclassify.

http://www.freebsd.org/cgi/query-pr.cgi?pr=180678