From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 24 17:34:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C651106564A
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 17:34:42 +0000 (UTC)
	(envelope-from minimarmot@gmail.com)
Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com
	[209.85.213.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 5F1BE8FC13
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 17:34:42 +0000 (UTC)
Received: by ywf7 with SMTP id 7so2300657ywf.13
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 10:34:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	bh=h2pWg9JZPe85fyjVxtBH7YPIA9GphDa8xaVkFZESU54=;
	b=XToE+IH1VuKU8PNSisGDSPC9shE3EbEBj5e4/o8VY0JhVTgevg72/vzXMjXV302rgo
	kPucDFOOAFdhBB8ZTiyU9nblytMz9PUg3LxIduCJ6YXQjzAEoU1Cw2KBtKOKQSsPQzFy
	wKu2el1HZl0SAcGZQZyQsY7z38pOL8NSm/2dk=
MIME-Version: 1.0
Received: by 10.236.170.7 with SMTP id o7mr4717305yhl.459.1311527068230; Sun,
	24 Jul 2011 10:04:28 -0700 (PDT)
Received: by 10.236.109.147 with HTTP; Sun, 24 Jul 2011 10:04:28 -0700 (PDT)
In-Reply-To: <201104011250.p31CoULd045353@svn.freebsd.org>
References: <201104011250.p31CoULd045353@svn.freebsd.org>
Date: Sun, 24 Jul 2011 13:04:28 -0400
Message-ID: <CAK2BMK6OrJ52jMNi8oJeAEyBAHj0AFMFRdjhUaYPZsWka7A75g@mail.gmail.com>
From: Ben Kaduk <minimarmot@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jul 2011 17:34:42 -0000

[replying to -fs since that is where the original discussion of adding
O_CLOEXEC occurred]

On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov <kib@freebsd.org> wrote=
:
> Author: kib
> Date: Fri Apr =A01 12:50:29 2011
> New Revision: 220241
> URL: http://svn.freebsd.org/changeset/base/220241
>
> Log:
> =A0MFC r219999:
> =A0Add O_CLOEXEC flag to open(2) and fhopen(2).

I saw mail go by on debian-bsd@lists.debian.org that the are going to
pick up on these O_CLOEXEC definitions and export them, which included
the comment:
No O_SEARCH yet, since FreeBSD doesn't seem to implement it.

Would there be any reason for us to support O_SEARCH?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
does not make it very clear to me whether we would want to....

-Ben Kaduk

From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 24 18:00:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7C055106564A;
	Sun, 24 Jul 2011 18:00:36 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 140E08FC1A;
	Sun, 24 Jul 2011 18:00:36 +0000 (UTC)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id EB22B153434;
	Sun, 24 Jul 2011 20:00:34 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id NEq-LaHqnyMG; Sun, 24 Jul 2011 20:00:32 +0200 (CEST)
Received: from [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc] (unknown
	[IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 235F3153433;
	Sun, 24 Jul 2011 20:00:32 +0200 (CEST)
Message-ID: <4E2C5DBF.3050104@digiware.nl>
Date: Sun, 24 Jul 2011 20:00:31 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <j09hk8$svj$1@dough.gmane.org>
	<13577F3E-DE59-44F4-98F7-9587E26499B8@gmail.com>
	<CAF-QHFUs7OAAwcFvSymc3YYX3Cdq+QUUSP_4OFXCegCbzLkOUA@mail.gmail.com>
In-Reply-To: <CAF-QHFUs7OAAwcFvSymc3YYX3Cdq+QUUSP_4OFXCegCbzLkOUA@mail.gmail.com>
X-Enigmail-Version: 1.2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS and large directories - caveat report
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jul 2011 18:00:36 -0000

On 21-7-2011 19:18, Ivan Voras wrote:
> On 21 July 2011 18:38, Luiz Otavio O Souza <lists.br@gmail.com> wrote:
> 
>> The general usage on this server is fine, but the periodic (daily) scripts take almost a day to complete and the server is slow as hell while the daily scripts are running.
> 
> Yes, this is how my problem was first diagnosed.
> 
>> So, yes, i can confirm that running 'find' on a ZFS FS with a lot of files is very, very slow (and looks like it isn't related to how the files are distributed on the FS).
> 
> Only it's not just "find" - it's any directory operations - including
> file creation and removal. I cannot say that is not related to how
> files are distributed on the file system, except the unusually long
> operations on the parent of the shard directories in my case.

A little late in the thread:

Running on 8.2-stable, ZFS version 15
Quad core, 8Gb memory, /home is on a 6-disk(sata) raidz2 fs.
The dicertory is a 3 week revolving log of images taken from a security
com. So if anything it directory-file should be horribly thrashed
It is around 170.000 files in one directory.

[/home/sonycam] wjw@zfs.digiware.nl> ls periodical | wc
  177364  177364 3369916
0.421u 6.999s 2:00.15 6.1%      37+1522k 0+0io 0pf+0w
[/home/sonycam] wjw@zfs.digiware.nl> ls -asl periodical | wc
  177401 1774002 13659785
1.747u 11.087s 1:42.98 12.4%    36+1562k 0+0io 0pf+0w

Repeated finds after this complete with 10 secs.

On average I seen about 100 IOPS/disk and readin is at 5Mbyte/disk.

But I do not feel the system is really loaded while doing the ls.
I can easily login again, and do other work.

--WjW


From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 24 18:44:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 397101065673
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 18:44:09 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 921C98FC08
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 18:44:08 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6OIi4JG014507
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 24 Jul 2011 21:44:04 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p6OIi44g057311; Sun, 24 Jul 2011 21:44:04 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6OIi4Bd057310; 
	Sun, 24 Jul 2011 21:44:04 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 24 Jul 2011 21:44:04 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Ben Kaduk <minimarmot@gmail.com>
Message-ID: <20110724184404.GB17489@deviant.kiev.zoral.com.ua>
References: <201104011250.p31CoULd045353@svn.freebsd.org>
	<CAK2BMK6OrJ52jMNi8oJeAEyBAHj0AFMFRdjhUaYPZsWka7A75g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="h0EypPxsCPvv0kw1"
Content-Disposition: inline
In-Reply-To: <CAK2BMK6OrJ52jMNi8oJeAEyBAHj0AFMFRdjhUaYPZsWka7A75g@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-fs@freebsd.org
Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jul 2011 18:44:09 -0000


--h0EypPxsCPvv0kw1
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 24, 2011 at 01:04:28PM -0400, Ben Kaduk wrote:
> [replying to -fs since that is where the original discussion of adding
> O_CLOEXEC occurred]
>=20
> On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov <kib@freebsd.org> wro=
te:
> > Author: kib
> > Date: Fri Apr =9A1 12:50:29 2011
> > New Revision: 220241
> > URL: http://svn.freebsd.org/changeset/base/220241
> >
> > Log:
> > =9AMFC r219999:
> > =9AAdd O_CLOEXEC flag to open(2) and fhopen(2).
>=20
> I saw mail go by on debian-bsd@lists.debian.org that the are going to
> pick up on these O_CLOEXEC definitions and export them, which included
> the comment:
> No O_SEARCH yet, since FreeBSD doesn't seem to implement it.
What do you mean by exporting them ?

>=20
> Would there be any reason for us to support O_SEARCH?
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
> does not make it very clear to me whether we would want to....

We do not support O_SEARCH because nobody implemented it yet.

--h0EypPxsCPvv0kw1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk4sZ/QACgkQC3+MBN1Mb4gOrQCeLt0DZEgVVKqYYHEbUlDqG4yp
e0IAoMxJSnLG5B7gsv7lA/mAoq7Z8y/3
=hLnR
-----END PGP SIGNATURE-----

--h0EypPxsCPvv0kw1--

From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 24 18:50:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 48063106566C
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 18:50:48 +0000 (UTC)
	(envelope-from kaduk@mit.edu)
Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU
	[18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id EB97A8FC1B
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 18:50:47 +0000 (UTC)
X-AuditID: 12074424-b7b0fae000000a08-3a-4e2c69999291
Received: from mailhub-auth-4.mit.edu ( [18.7.62.39])
	by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP
	id BE.7A.02568.9996C2E4; Sun, 24 Jul 2011 14:51:05 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])
	by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id p6OIok3K023144; 
	Sun, 24 Jul 2011 14:50:46 -0400
Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73])
	(authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU)
	by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p6OIojhE000239
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Sun, 24 Jul 2011 14:50:46 -0400 (EDT)
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308)
	id p6OIoiMV027798; Sun, 24 Jul 2011 14:50:44 -0400 (EDT)
Date: Sun, 24 Jul 2011 14:50:44 -0400 (EDT)
From: Benjamin Kaduk <kaduk@MIT.EDU>
To: Kostik Belousov <kostikbel@gmail.com>
In-Reply-To: <20110724184404.GB17489@deviant.kiev.zoral.com.ua>
Message-ID: <alpine.GSO.1.10.1107241447460.7526@multics.mit.edu>
References: <201104011250.p31CoULd045353@svn.freebsd.org>
	<CAK2BMK6OrJ52jMNi8oJeAEyBAHj0AFMFRdjhUaYPZsWka7A75g@mail.gmail.com>
	<20110724184404.GB17489@deviant.kiev.zoral.com.ua>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
	BOUNDARY="-559023410-1361943179-1311533444=:7526"
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmplleLIzCtJLcpLzFFi42IRYrdT152ZqeNncPKiuMWxxz/ZLBqmPWZz
	YPKY8Wk+i8fOWXfZA5iiuGxSUnMyy1KL9O0SuDJ+LW5iKnjAW/Gw+wVTA+Ny7i5GTg4JAROJ
	lv/PWCFsMYkL99azdTFycQgJ7GOU2Ll6FROEs4FRYtW3+awQzgEmiZf7LrNAOA2MEou2Hwfr
	ZxHQlpjYcpcFxGYTUJGY+WYjG4gtIqApcW3TfaBRHBzMAlISd9ZWgISFBWwkvq3YxAwS5hSw
	l5h5NhAkzCvgIHHp0CtmiPEbGSWe72gEGykqoCOxev8UFogiQYmTM5+A2cwCARKLZk9gnsAo
	OAtJahaSFIRtLvHu5iNWCFtb4v7NNrYFjCyrGGVTcqt0cxMzc4pTk3WLkxPz8lKLdM31cjNL
	9FJTSjcxgkPbRWUHY/MhpUOMAhyMSjy8mvY6fkKsiWXFlbmHGCU5mJREeW9kAIX4kvJTKjMS
	izPii0pzUosPMUpwMCuJ8Gac0/YT4k1JrKxKLcqHSUlzsCiJ85Z6//cVEkhPLEnNTk0tSC2C
	ycpwcChJ8N4GGSpYlJqeWpGWmVOCkGbi4AQZzgM0/BxIDW9xQWJucWY6RP4Uo6KUOG8+SEIA
	JJFRmgfXC0s9rxjFgV4R5r0AUsUDTFtw3a+ABjMBDZZX0AQZXJKIkJJqYDQ0arnorOubcVC1
	XKHPib/i/2vfmYsX6OT0iX+c+OpvgXhQHPOBGd4yi5jnLWAz4Y/oVo65t7T81OSrDHXbf7W/
	XZIx7eb2s8VzmfxOsJxZrbVcJ66sTKtncqtMyqr1IqrVjKFtXNcennyfZBwZbehsabY36swt
	r8nFLHlbvr6eeIxBrk/smxJLcUaioRZzUXEiAI/spkoYAwAA
Cc: freebsd-fs@freebsd.org
Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jul 2011 18:50:48 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1361943179-1311533444=:7526
Content-Type: TEXT/PLAIN; charset=koi8-r; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Sun, 24 Jul 2011, Kostik Belousov wrote:

> On Sun, Jul 24, 2011 at 01:04:28PM -0400, Ben Kaduk wrote:
>> [replying to -fs since that is where the original discussion of adding
>> O_CLOEXEC occurred]
>>
>> On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov <kib@freebsd.org> wr=
ote:
>>> Author: kib
>>> Date: Fri Apr =9A1 12:50:29 2011
>>> New Revision: 220241
>>> URL: http://svn.freebsd.org/changeset/base/220241
>>>
>>> Log:
>>> =9AMFC r219999:
>>> =9AAdd O_CLOEXEC flag to open(2) and fhopen(2).
>>
>> I saw mail go by on debian-bsd@lists.debian.org that the are going to
>> pick up on these O_CLOEXEC definitions and export them, which included
>> the comment:
>> No O_SEARCH yet, since FreeBSD doesn't seem to implement it.
> What do you mean by exporting them ?

Per http://lists.debian.org/debian-bsd/2011/07/msg00299.html , it is not=20
possible for them to use our sys/fcntl.h directly, so its contents must be=
=20
copied into a bits/fcntl.h that enters somehow into their framework.  (I=20
am not familiar with how this framework works.)

>
>>
>> Would there be any reason for us to support O_SEARCH?
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
>> does not make it very clear to me whether we would want to....
>
> We do not support O_SEARCH because nobody implemented it yet.

Sure, but is it worth filing a PR as a reminder?

-Ben
---559023410-1361943179-1311533444=:7526--

From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 24 18:56:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1A8CC1065670
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 18:56:34 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id AB0808FC0C
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 18:56:33 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6OIuT51015514
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 24 Jul 2011 21:56:29 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p6OIuT5P057865; Sun, 24 Jul 2011 21:56:29 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6OIuT1I057863; 
	Sun, 24 Jul 2011 21:56:29 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 24 Jul 2011 21:56:29 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <20110724185629.GD17489@deviant.kiev.zoral.com.ua>
References: <201104011250.p31CoULd045353@svn.freebsd.org>
	<CAK2BMK6OrJ52jMNi8oJeAEyBAHj0AFMFRdjhUaYPZsWka7A75g@mail.gmail.com>
	<20110724184404.GB17489@deviant.kiev.zoral.com.ua>
	<alpine.GSO.1.10.1107241447460.7526@multics.mit.edu>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="VQzZ2Dp9L8/8MSeh"
Content-Disposition: inline
In-Reply-To: <alpine.GSO.1.10.1107241447460.7526@multics.mit.edu>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-fs@freebsd.org
Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jul 2011 18:56:34 -0000


--VQzZ2Dp9L8/8MSeh
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 24, 2011 at 02:50:44PM -0400, Benjamin Kaduk wrote:
> On Sun, 24 Jul 2011, Kostik Belousov wrote:
>=20
> >On Sun, Jul 24, 2011 at 01:04:28PM -0400, Ben Kaduk wrote:
> >>[replying to -fs since that is where the original discussion of adding
> >>O_CLOEXEC occurred]
> >>
> >>On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov <kib@freebsd.org>=
=20
> >>wrote:
> >>>Author: kib
> >>>Date: Fri Apr =9A1 12:50:29 2011
> >>>New Revision: 220241
> >>>URL: http://svn.freebsd.org/changeset/base/220241
> >>>
> >>>Log:
> >>>MFC r219999:
> >>>Add O_CLOEXEC flag to open(2) and fhopen(2).
> >>
> >>I saw mail go by on debian-bsd@lists.debian.org that the are going to
> >>pick up on these O_CLOEXEC definitions and export them, which included
> >>the comment:
> >>No O_SEARCH yet, since FreeBSD doesn't seem to implement it.
> >What do you mean by exporting them ?
>=20
> Per http://lists.debian.org/debian-bsd/2011/07/msg00299.html , it is not=
=20
> possible for them to use our sys/fcntl.h directly, so its contents must b=
e=20
> copied into a bits/fcntl.h that enters somehow into their framework.  (I=
=20
> am not familiar with how this framework works.)
>=20
> >
> >>
> >>Would there be any reason for us to support O_SEARCH?
> >>http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
> >>does not make it very clear to me whether we would want to....
> >
> >We do not support O_SEARCH because nobody implemented it yet.
>=20
> Sure, but is it worth filing a PR as a reminder?

Without the patch ? No.

--VQzZ2Dp9L8/8MSeh
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk4satwACgkQC3+MBN1Mb4iAJgCfXlGHKhvX+8OHOcvJZsGF/Wqn
PHMAoKrjfb0EBUZuGCRwSN7vvrLciS8B
=LZvF
-----END PGP SIGNATURE-----

--VQzZ2Dp9L8/8MSeh--

From owner-freebsd-fs@FreeBSD.ORG  Sun Jul 24 22:22:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CAF99106566C
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 22:22:10 +0000 (UTC)
	(envelope-from gcorcoran@rcn.com)
Received: from ex-vmail02.lnh.mail.rcn.net (vmail02.lnh.mail.rcn.net
	[207.172.157.112])
	by mx1.freebsd.org (Postfix) with ESMTP id 886138FC13
	for <freebsd-fs@freebsd.org>; Sun, 24 Jul 2011 22:22:10 +0000 (UTC)
Received: from mr16.lnh.mail.rcn.net ([207.172.157.36])
	by smtp02.lnh.mail.rcn.net with ESMTP; 24 Jul 2011 17:53:17 -0400
Received: from smtp04.lnh.mail.rcn.net (smtp04.lnh.mail.rcn.net
	[207.172.157.104]) by mr16.lnh.mail.rcn.net (MOS 4.2.3-GA)
	with ESMTP id BFB63136; Sun, 24 Jul 2011 17:53:16 -0400
X-Auth-ID: gcorcoran
Received: from 64-121-74-167.c3-0.tlg-ubr2.atw-tlg.pa.cable.rcn.com (HELO
	[10.56.78.179]) ([64.121.74.167])
	by smtp04.lnh.mail.rcn.net with ESMTP; 24 Jul 2011 17:53:16 -0400
Message-ID: <4E2C9419.4000205@rcn.com>
Date: Sun, 24 Jul 2011 17:52:25 -0400
From: Gary Corcoran <gcorcoran@rcn.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Junkmail-Whitelist: YES (by domain whitelist at mr16.lnh.mail.rcn.net)
Subject: 3TB drives on ZFS and booting
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jul 2011 22:22:10 -0000

I have seen conflicting information on the internet about this, and so
I would like a direct answer from someone who knows for sure.  Does FreeBSD's
ZFS work with 3TB drives, and is it possible to do a ZFS-only (i.e. boot from
ZFS) installation with 3TB drives on FreeBSD?  I presume that since ZFS was designed
to handle huge filesystems, it would have no problem with 3TB drives, but I guess
the real question is the ZFS boot code - can it currently handle >2TB drives?
Bottom line: would I be able to successfully build (and of course boot) a FreeBSD
ZFS-only system using only 3TB drives?

Thanks,
Gary


From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 05:42:50 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E201E1065672
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 05:42:50 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id EE3ED8FC0C
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 05:42:49 +0000 (UTC)
Received: by people.fsn.hu (Postfix, from userid 1001)
	id 304F49478D1; Mon, 25 Jul 2011 07:42:48 +0200 (CEST)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.2
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR:
	14.1348]
X-CRM114-CacheID: sfid-20110725_07424_61E03616 
X-CRM114-Status: Good  ( pR: 14.1348 )
X-DSPAM-Result: Whitelisted
X-DSPAM-Processed: Mon Jul 25 07:42:48 2011
X-DSPAM-Confidence: 0.9957
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 4e2d025823651866976671
X-DSPAM-Factors: 27, From*Attila Nagy <bra@fsn.hu>, 0.00010, >+I, 0.00102,
	>+I, 0.00102, >+On, 0.00111, com>+wrote, 0.00176,
	wrote+>, 0.00197, wrote+>>, 0.00315, >+>, 0.00321,
	>+>, 0.00321, References*mail.gmail.com>, 0.00348,
	References*mail.gmail.com>, 0.00348, org>+wrote, 0.00352,
	this+>, 0.00460, On+Thu, 0.00460, >>+>>, 0.00478,
	>>+>>, 0.00478, wrote, 0.00481, wrote, 0.00481,
	with+>, 0.00543, In-Reply-To*mail.gmail.com>, 0.00543,
	files+and, 0.00597, default, 0.00663, for+>, 0.00663,
	files, 0.00808, files, 0.00808, Is+there, 0.00851,
X-Spambayes-Classification: ham; 0.00
Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99])
	by people.fsn.hu (Postfix) with ESMTPSA id 69CD69478C4;
	Mon, 25 Jul 2011 07:42:47 +0200 (CEST)
Message-ID: <4E2D0257.8080608@fsn.hu>
Date: Mon, 25 Jul 2011 07:42:47 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <j09hk8$svj$1@dough.gmane.org>
	<CAOjFWZ7x7AM1BvR0KBWM4669rtMgcBBt+BP96RppvtSP_gRhJg@mail.gmail.com>
	<CAF-QHFVJpFkMO94SFg403v-mNGRbn8soih67gUrujnzC=tSPFA@mail.gmail.com>
In-Reply-To: <CAF-QHFVJpFkMO94SFg403v-mNGRbn8soih67gUrujnzC=tSPFA@mail.gmail.com>
X-Stationery: 0.7.1
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS and large directories - caveat report
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 05:42:51 -0000

On 07/21/11 18:38, Ivan Voras wrote:
> On 21 July 2011 17:50, Freddie Cash<fjwcash@gmail.com>  wrote:
>> On Thu, Jul 21, 2011 at 8:45 AM, Ivan Voras<ivoras@freebsd.org>  wrote:
>>> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. the
>>> size of the metadata cache)
>> vfs.zfs.arc_meta_limit
>>
>> This sets the amount of ARC that can be used for metadata.  The default is
>> 1/8th of ARC, I believe.  This setting lets you use "primarycache=all"
>> (store metadata and file data in ARC) but then tune how much is used for
>> each.
>>
>> Not sure if that will help in your case or not, but it's a sysctl you can
>> play with.
> I don't think that it works, or at least is not as efficient as dirhash:
>
> www:~>  sysctl -a | grep meta
> kern.metadelay: 28
> vfs.zfs.mfu_ghost_metadata_lsize: 129082368
> vfs.zfs.mfu_metadata_lsize: 116224
> vfs.zfs.mru_ghost_metadata_lsize: 113958912
> vfs.zfs.mru_metadata_lsize: 16384
> vfs.zfs.anon_metadata_lsize: 0
> vfs.zfs.arc_meta_limit: 322412800
> vfs.zfs.arc_meta_used: 506907792
> kstat.zfs.misc.arcstats.demand_metadata_hits: 4471705
> kstat.zfs.misc.arcstats.demand_metadata_misses: 2110328
> kstat.zfs.misc.arcstats.prefetch_metadata_hits: 27
> kstat.zfs.misc.arcstats.prefetch_metadata_misses: 51
>
> arc_meta_used is nearly 500 MB which should be enough even in this
> case. With filenames of 32 characters, all the filenames alone for
> 130,000 files in a directory take about 4 MB - I doubt the ZFS
> introduces so much extra metadata it doesn't fit in 500 MB.
>
> I am now deleting the session files, and I hope it will not take days
> to complete...
>
Worse than that, I've seen a similar issue, hashed directories with 
about 1M+ files. After deleting all those files, even a find on the 
empty directories took ages...

From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 11:07:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6E9F4106564A
	for <freebsd-fs@FreeBSD.org>; Mon, 25 Jul 2011 11:07:06 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 5CD058FC19
	for <freebsd-fs@FreeBSD.org>; Mon, 25 Jul 2011 11:07:06 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6PB769n046361
	for <freebsd-fs@FreeBSD.org>; Mon, 25 Jul 2011 11:07:06 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6PB75Dj046359
	for freebsd-fs@FreeBSD.org; Mon, 25 Jul 2011 11:07:05 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 25 Jul 2011 11:07:05 GMT
Message-Id: <201107251107.p6PB75Dj046359@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 11:07:06 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         [amd] amd(8) ICMP storm and unkillable process.
o kern/158711  fs         [ffs] [panic] panic in ffs_blkfree and ffs_valloc
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157728  fs         [zfs] zfs (v28) incremental receive may leave behind t
o kern/157722  fs         [geli] unable to newfs a geli encrypted partition
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156933  fs         [zfs] ZFS receive after read on readonly=on filesystem
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156168  fs         [nfs] [panic] Kernel panic under concurrent access ove
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
o kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
o kern/154447  fs         [zfs] [panic] Occasional panics - solaris assert somew
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153847  fs         [nfs] [panic] Kernel panic from incorrect m_free in nf
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
p kern/152488  fs         [tmpfs] [patch] mtime of file updated when only inode 
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o kern/151845  fs         [smbfs] [patch] smbfs should be upgraded to support Un
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o bin/148296   fs         [zfs] [loader] [patch] Very slow probe in /usr/src/sys
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147790  fs         [zfs] zfs set acl(mode|inherit) fails on existing zfs
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142914  fs         [zfs] ZFS performance degradation over time
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
f kern/130133  fs         [panic] [zfs] 'kmem_map too small' caused by make clea
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
f kern/127375  fs         [zfs] If vm.kmem_size_max>"1073741823" then write spee
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
f kern/126703  fs         [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
f sparc/123566 fs         [zfs] zpool import issue: EOVERFLOW
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121366   fs         [zfs] [patch] Automatic disk scrubbing from periodic(8
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
f kern/120210  fs         [zfs] [panic] reboot after panic: solaris assert: arc_
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o kern/33464   fs         [ufs] soft update inconsistencies after system crash
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

237 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 16:30:24 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 786F2106566B
	for <freebsd-fs@hub.freebsd.org>; Mon, 25 Jul 2011 16:30:24 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E12F8FC0C
	for <freebsd-fs@hub.freebsd.org>; Mon, 25 Jul 2011 16:30:24 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6PGUOeM049882
	for <freebsd-fs@freefall.freebsd.org>; Mon, 25 Jul 2011 16:30:24 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6PGUOV6049876;
	Mon, 25 Jul 2011 16:30:24 GMT (envelope-from gnats)
Date: Mon, 25 Jul 2011 16:30:24 GMT
Message-Id: <201107251630.p6PGUOV6049876@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Gary Palmer <gpalmer@freebsd.org>
Cc: 
Subject: Re: kern/159077: Can't cd .. with latest zfs version
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Gary Palmer <gpalmer@freebsd.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 16:30:24 -0000

The following reply was made to PR kern/159077; it has been noted by GNATS.

From: Gary Palmer <gpalmer@freebsd.org>
To: Michael Haro <mharo@FreeBSD.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: kern/159077: Can't cd .. with latest zfs version
Date: Mon, 25 Jul 2011 12:25:34 -0400

 On Wed, Jul 20, 2011 at 11:37:21PM -0700, Michael Haro wrote:
 > 
 > >Number:         159077
 > >Category:       kern
 > >Synopsis:       Can't cd .. with latest zfs version
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    freebsd-bugs
 > >State:          open
 > >Quarter:        
 > >Keywords:       
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Thu Jul 21 07:10:07 UTC 2011
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Michael Haro
 > >Release:        FreeBSD 8.2-STABLE amd64
 > >Organization:
 > >Environment:
 > System: FreeBSD backups.mtv.bitsurf.net 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Jul 16 19:26:28 PDT 2011 root@backups.mtv.bitsurf.net:/usr/obj/usr/src/sys/KERNEL amd64
 > 
 > 
 > freebsd 8.2 stable as of july 16th
 > zpool version 28
 > zfs version 3
 > 
 > >Description:
 > 
 > trying to cd up one level using 'cd ..' gives permission denied
 > 	
 > >How-To-Repeat:
 > 
 > use sh or tcsh, not bash...
 > 
 > $ pwd
 > /home/mharo
 > $ cd ..
 > cd: can't cd to ..
 > $ ls -ald /home
 > drwxr-xr-x  4 root  wheel  4 Nov 29  2009 /home
 > $ ls -ald /home/mharo
 > drwxr-xr-x  3 mharo  users  15 Jul 20 22:49 /home/mharo
 > $ cd /home
 > $ pwd
 > /home
 > $ ls -ald mharo
 > drwxr-xr-x  3 mharo  users  15 Jul 20 22:49 mharo
 > $ cd mharo
 > $ cd ..
 > cd: can't cd to ..
 > 
 > so obviously I can cd into /home, just not via ..
 > 
 > $ zfs list -r zroot/home
 > NAME               USED  AVAIL  REFER  MOUNTPOINT
 > zroot/home         162K  2.70G    26K  /home
 > zroot/home/mharo   119K  2.70G  35.5K  /home/mharo
 
 It may be worth unmounting /home/mharo and checking the permissions
 of the directory underneath the mount point.  e.g.
 
 % mkdir /tmp/159077
 % chmod 0 /tmp/159077
 % ls -la /tmp/159077
 total 274
 d---------   2 root  wheel     512 Jul 25 17:22 .
 drwxrwxrwt  57 root  wheel  249344 Jul 25 17:22 ..
 % mount /dev/md0 /tmp/159077
 % ls -la /tmp/159077
 total 276
 drwxr-xr-x   3 root  wheel        512 Jul 25 17:21 .
 drwxrwxrwt  57 root  wheel     249344 Jul 25 17:22 ..
 drwxrwxr-x   2 root  operator     512 Jul 25 17:21 .snap
 %
 
 and then as a regular user:
 
 $ cd /tmp/159077/
 $ pwd
 /tmp/159077
 $ ls -la
 ls: ..: Permission denied
 total 4
 drwxr-xr-x  3 root  wheel     512 Jul 25 17:21 .
 drwxrwxr-x  2 root  operator  512 Jul 25 17:21 .snap
 $ ls -la ..
 ls: ..: Permission denied
 $ cd ..
 cd: can't cd to ..
 $ 
 
 Regards
 
 Gary

From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 17:01:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E89E11065672
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 17:01:33 +0000 (UTC)
	(envelope-from clinton.adams@gmail.com)
Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 78EEA8FC08
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 17:01:33 +0000 (UTC)
Received: by ewy1 with SMTP id 1so3594315ewy.13
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 10:01:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=QUrFy+eef4UVMZfHa2r2LL32e6vxO+6Tc4GJy3tuWjQ=;
	b=A9oIeL4fUSB4v97kG+/WEI1j8dCJXKw8l9DWFb8xOJTujdiDlBtVEm449/Rs9m1iyU
	EaBLStD8ff+JA29RV+zlHhAvRxqnH5mORLwVxKKoZ5quZ2YsLXe7dtKD3REFla67uIfr
	a196/J8qWFMB2Bwyjb8mr50zgq88fwXOkYYqI=
MIME-Version: 1.0
Received: by 10.14.47.200 with SMTP id t48mr1727898eeb.147.1311613292164; Mon,
	25 Jul 2011 10:01:32 -0700 (PDT)
Received: by 10.14.22.76 with HTTP; Mon, 25 Jul 2011 10:01:32 -0700 (PDT)
In-Reply-To: <1730895125.912894.1311373009726.JavaMail.root@erie.cs.uoguelph.ca>
References: <CAEuopLZMEvm56s3N5MgN+7mGCdoP_RkZa3R2zd5QG1dbLtVqaA@mail.gmail.com>
	<1730895125.912894.1311373009726.JavaMail.root@erie.cs.uoguelph.ca>
Date: Mon, 25 Jul 2011 19:01:32 +0200
Message-ID: <CAEuopLZV7NBRkp-Fbs_Tim2_DzqtqwJZWp-4iR4UReFdwjQgAw@mail.gmail.com>
From: Clinton Adams <clinton.adams@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 17:01:34 -0000

On Sat, Jul 23, 2011 at 12:16 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote=
:
> Clinton Adams wrote:
> [stuff snipped for brevity]
>>
>> Running four clients now and the LockOwners are steadily climbing,
>> nfsstat consistently reported it as 0 prior to users logging into the
>> nfsv4 test systems - my testing via ssh didn't show anything like
>> this. Attached tcpdump file is from when I first noticed the jump in
>> LockOwners from 0 to ~600. I tried wireshark on this and didn't see
>> any releaselockowner operations.
>>
> [stuff snipped for brevity]
>> OpenOwner Opens LockOwner Locks Delegs
>> 6 242 2481 22 0
>> Server Cache Stats:
>> Inprog Idem Non-idem Misses CacheSize TCPPeak
>> 0 0 2 2518251 2502 4772
>>
> I've written a small test program:
> =A0http://people.freebsd.org/~rmacklem/childlock.c (also attached)
>
> where a parent process opens a file and then forks children that do
> lock ops and then exit. (I'm guessing that this is what some process
> in your clients are doing, that result in the LockOwner count growing.)
>
> When I run this program on Fedora15, it generates ReleaseLockOwner Ops
> and the LockOwner count doesn't increase as it runs.
>
> You can run this program by giving it an argument that can be any file
> on the nfsv4 mount for which you have read/write access, then watch
> the server via "nfsstat -e -s" to see if the LockOwner count increases.
>
> If the LockOwner count does increase, then it appears that a newer Linux
> kernel will avoid the problem.

Yes, a client running a newer kernel (2.6.38) does generate the
release_lockowner ops.

Thanks for all the help!

>
> If you are interested in what the packet trace looks like when running th=
e
> program on Fedora15, it's at:
> =A0http://people.freebsd.org/~rmacklem/childlock.pcap
>
> rick
> ps: The FreeBSD NFSv4 client doesn't currently generate the ReleaseLockOw=
ner
> =A0 =A0Ops for this case either. I need to come up with a patch that does=
 that.
>
>

From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 17:30:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1C3691065673
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 17:30:05 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id E9CF78FC17
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 17:30:04 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 8B27246B06;
	Mon, 25 Jul 2011 13:30:04 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1E1548A02C;
	Mon, 25 Jul 2011 13:30:04 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Date: Mon, 25 Jul 2011 09:39:31 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E2C9419.4000205@rcn.com>
In-Reply-To: <4E2C9419.4000205@rcn.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201107250939.31746.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Mon, 25 Jul 2011 13:30:04 -0400 (EDT)
Cc: 
Subject: Re: 3TB drives on ZFS and booting
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 17:30:05 -0000

On Sunday, July 24, 2011 5:52:25 pm Gary Corcoran wrote:
> I have seen conflicting information on the internet about this, and so
> I would like a direct answer from someone who knows for sure.  Does FreeBSD's
> ZFS work with 3TB drives, and is it possible to do a ZFS-only (i.e. boot from
> ZFS) installation with 3TB drives on FreeBSD?  I presume that since ZFS was designed
> to handle huge filesystems, it would have no problem with 3TB drives, but I guess
> the real question is the ZFS boot code - can it currently handle >2TB drives?
> Bottom line: would I be able to successfully build (and of course boot) a FreeBSD
> ZFS-only system using only 3TB drives?

You probably want to use GPT instead of MBR, but the GPT ZFS boot code shoul
fully handle 64-bit LBAs just fine.

-- 
John Baldwin

From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 21:58:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9EA0106566C
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 21:58:36 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 949228FC14
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 21:58:36 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EADPmLU6DaFvO/2dsb2JhbAA0AQEEASlPDQUYGAICDSUCFlEHhG2jfIh8r2qRFoErhAWBDwSScIgxiEs
X-IronPort-AV: E=Sophos;i="4.67,265,1309752000"; d="scan'208";a="128529439"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Jul 2011 17:58:35 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ECC5DB3F3C;
	Mon, 25 Jul 2011 17:58:35 -0400 (EDT)
Date: Mon, 25 Jul 2011 17:58:35 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Zack Kirsch <zack.kirsch@isilon.com>
Message-ID: <957583241.989932.1311631115955.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <476FC2247D6C7843A4814ED64344560C04443EAA@seaxch10.desktop.isilon.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 21:58:36 -0000

Zack Kirsch wrote:
> Just wanted to add a bit of Isilon color. We've hit this limit before,
> but I believe it was mostly due to strange client behavior of 1) Using
> a new lockowner for each lock and 2) Using a new TCP connection for
> each 'test run'.
When I saw this before, I remarked that this shouldn't be relevant. I
realize now that you were referring to a test environment (not a real
NFS client) where it keeps creating new TCP connections, even if the
previous connection wasn't broken due to a network partitioning or similar.

Sorry about that.

> As far as I know, we haven't hit this in the field.
> 
It appears that this case was a result of use of an old Linux NFSv4 client
and was resolved via a kernel upgrade. (ie. I suspect there are others out
there that will run into the same thing sooner or later.)

> We've done a few things to combat this problem:
> 1) We increased the floodlevel to 65536.
> 2) We made the floodlevel configurable via sysctl.
> 3) We made significant changes to the replay cache itself. Specific
> gains were drastic performance improvements and freeing of cache
> entries from stale TCP connections.
> 
It is important to note that the request cache holds onto replies for
inactive TCP connections because it assumes that the client might be
network partitioned for long enough that it is forced to reconnect using
a fresh TCP connection and will then retry all outstanding RPCs. This
could take a looonnngggg time to happen, so these replies can't be free'd
quickly, or the whole purpose of the cache (avoiding redoing non-idempotent
operations when an RPC is retried) is defeated.

The fact that some artificial test program (pynfs maybe?) chooses to do
fresh TCP connections isn't relevant imho, since it isn't a real client
and, as far as I know, real clients only reconnect when the old TCP connection
no longer works.

I thought I'd try and clarify this for anyone interested, rick


From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 25 23:22:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08E73106566B
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 23:22:42 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id BA2508FC0C
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 23:22:41 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap0EABb6LU6DaFvO/2dsb2JhbABkgRMCDQceAhZYhG2jfKo6jkaRH4ErgXuCCoEPBJJyiDGISw
X-IronPort-AV: E=Sophos;i="4.67,265,1309752000"; d="scan'208";a="128537753"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Jul 2011 19:22:40 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BBA73B3F9B
	for <freebsd-fs@freebsd.org>; Mon, 25 Jul 2011 19:22:40 -0400 (EDT)
Date: Mon, 25 Jul 2011 19:22:40 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: FreeBSD FS <freebsd-fs@freebsd.org>
Message-ID: <2086374310.991475.1311636160720.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Subject: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2011 23:22:42 -0000

Hi,

Currently both NFS servers set the vnode lock LK_SHARED
and so do the local syscalls (at least that's how it looks
by inspection?).

Peter Holm just posted me this panic, where a test for an
exclusive vnode lock fails in msdosfs_readdir().
KDB: stack backtrace:
db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) at db_trace_self_wrapper+0x26
kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at kdb_backtrace+0x2a
vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23
assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at assert_vop_elocked+0x55
pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45
msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at msdosfs_readdir+0x528
VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at VOP_READDIR_APV+0xc5
nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at nfsrvd_readdir+0x38e
nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at nfsrvd_dorpc+0x1f79
nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at nfssvc_program+0x40f
svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) at svc_run_internal+0x952
svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at svc_thread_start+0x10
fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
fork_trampoline() at fork_trampoline+0x8
--- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 ---
pcbmap: 0xc7f20ae0 is not exclusive locked but should be
KDB: enter: lock violation

So, does anyone know if the msdosfs_readdir() really requires a LK_EXCLUSIVE
locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()?

rick

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 08:30:20 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EAEE01065674;
	Tue, 26 Jul 2011 08:30:20 +0000 (UTC) (envelope-from dim@FreeBSD.org)
Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net
	[IPv6:2001:7b8:2ff:146::2])
	by mx1.freebsd.org (Postfix) with ESMTP id AF8118FC1A;
	Tue, 26 Jul 2011 08:30:20 +0000 (UTC)
Received: from [IPv6:2001:7b8:3a7:0:8c40:e6eb:d519:2a58] (unknown
	[IPv6:2001:7b8:3a7:0:8c40:e6eb:d519:2a58])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by tensor.andric.com (Postfix) with ESMTPSA id 7FA945C37;
	Tue, 26 Jul 2011 10:30:19 +0200 (CEST)
Message-ID: <4E2E7B1B.2020906@FreeBSD.org>
Date: Tue, 26 Jul 2011 10:30:19 +0200
From: Dimitry Andric <dim@FreeBSD.org>
Organization: The FreeBSD Project
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <4E2C9419.4000205@rcn.com> <201107250939.31746.jhb@freebsd.org>
In-Reply-To: <201107250939.31746.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: 3TB drives on ZFS and booting
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 08:30:21 -0000

On 2011-07-25 15:39, John Baldwin wrote:
> On Sunday, July 24, 2011 5:52:25 pm Gary Corcoran wrote:
...
>> Bottom line: would I be able to successfully build (and of course boot) a FreeBSD
>> ZFS-only system using only 3TB drives?
> You probably want to use GPT instead of MBR, but the GPT ZFS boot code shoul
> fully handle 64-bit LBAs just fine.

Isn't that also dependent on the BIOS's ability to handle 64-bit LBA's?

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 09:04:46 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 939E5106566B
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 09:04:46 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 2F1378FC14
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 09:04:45 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6Q94g8r008125
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 26 Jul 2011 12:04:42 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p6Q94fUS017456; Tue, 26 Jul 2011 12:04:41 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6Q94fx7017455; 
	Tue, 26 Jul 2011 12:04:41 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 26 Jul 2011 12:04:41 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20110726090441.GD17489@deviant.kiev.zoral.com.ua>
References: <2086374310.991475.1311636160720.JavaMail.root@erie.cs.uoguelph.ca>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="6iAIcqJi9p/aaN8Y"
Content-Disposition: inline
In-Reply-To: <2086374310.991475.1311636160720.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 09:04:46 -0000


--6iAIcqJi9p/aaN8Y
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
> Hi,
>=20
> Currently both NFS servers set the vnode lock LK_SHARED
> and so do the local syscalls (at least that's how it looks
> by inspection?).
>=20
> Peter Holm just posted me this panic, where a test for an
> exclusive vnode lock fails in msdosfs_readdir().
> KDB: stack backtrace:
> db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) a=
t db_trace_self_wrapper+0x26
> kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at kdb_ba=
cktrace+0x2a
> vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23
> assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at assert_vop_eloc=
ked+0x55
> pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45
> msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at msdosfs_rea=
ddir+0x528
> VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at VOP_READDIR=
_APV+0xc5
> nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at nfsrvd_readd=
ir+0x38e
> nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at nfsrvd_dorpc+0x1f79
> nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at nfssvc_program+0x=
40f
> svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) at svc=
_run_internal+0x952
> svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at svc_thre=
ad_start+0x10
> fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x804c12e, eip =3D 0xc, esp =3D 0x33, ebp =3D 0x1 ---
> pcbmap: 0xc7f20ae0 is not exclusive locked but should be
> KDB: enter: lock violation
>=20
> So, does anyone know if the msdosfs_readdir() really requires a LK_EXCLUS=
IVE
> locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()?

Yes, msdosfs currently requires all vnode locks to be exclusive. One of
the reasons is that each denode (the msdosfs-private vnode data) carries
the fat entries cache, and this cache is updated even by the operations
that do not modify vnode from the VFS POV.

The locking regime is enforced by the getnewvnode() initializing the vnode
lock with LK_NOSHARE flag, and msdosfs code not calling VN_LOCK_ASHARE()
on the newly instantiated vnode.

My question is, was the vnode in question locked at all ?

--6iAIcqJi9p/aaN8Y
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk4ugykACgkQC3+MBN1Mb4hh2ACfS72MfHc6jb7XUh7FsaqkV8py
0lsAn1QwwRgW1mdqjxD5ACBsWz35fci2
=/7qP
-----END PGP SIGNATURE-----

--6iAIcqJi9p/aaN8Y--

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 13:22:35 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9862D1065670;
	Tue, 26 Jul 2011 13:22:35 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 70F398FC0C;
	Tue, 26 Jul 2011 13:22:35 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 11F2946B62;
	Tue, 26 Jul 2011 09:22:35 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 9DE8C8A02C;
	Tue, 26 Jul 2011 09:22:34 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Dimitry Andric <dim@freebsd.org>
Date: Tue, 26 Jul 2011 09:03:58 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <4E2C9419.4000205@rcn.com> <201107250939.31746.jhb@freebsd.org>
	<4E2E7B1B.2020906@FreeBSD.org>
In-Reply-To: <4E2E7B1B.2020906@FreeBSD.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201107260903.58265.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 26 Jul 2011 09:22:34 -0400 (EDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: 3TB drives on ZFS and booting
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 13:22:35 -0000

On Tuesday, July 26, 2011 4:30:19 am Dimitry Andric wrote:
> On 2011-07-25 15:39, John Baldwin wrote:
> > On Sunday, July 24, 2011 5:52:25 pm Gary Corcoran wrote:
> ...
> >> Bottom line: would I be able to successfully build (and of course boot) a FreeBSD
> >> ZFS-only system using only 3TB drives?
> > You probably want to use GPT instead of MBR, but the GPT ZFS boot code shoul
> > fully handle 64-bit LBAs just fine.
> 
> Isn't that also dependent on the BIOS's ability to handle 64-bit LBA's?

Yes, but the original EDD 1.0 spec that included the 'packet' and extended
INT 13h functions included 64-bit LBAs, so at this point I would expect most
BIOSes to support that.  Also, only BIOSes for controllers that support
logical disks > 2TB (either RAID volumes or large physical disks) have to
actually support having the upper 32-bits be non-zero.  I strongly suspect
that that is in fact true.  That is, if you have a controller new enough to
support a 3 TB drive, it's accompanying BIOS ROM should support 64-bit LBAs.

-- 
John Baldwin

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 13:27:03 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D3A51065670
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 13:27:03 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 1BD298FC16
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 13:27:03 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-fs@m.gmane.org>) id 1Qlhf3-0003b0-26
	for freebsd-fs@freebsd.org; Tue, 26 Jul 2011 15:27:01 +0200
Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 15:27:01 +0200
Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 15:27:01 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Johannes Totz <jtotz@imperial.ac.uk>
Date: Tue, 26 Jul 2011 14:26:48 +0100
Lines: 39
Message-ID: <j0mfao$hrk$1@dough.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;
	rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11
Subject: panic: snapacct_ufs2: bad block
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 13:27:03 -0000

Hi!

Just got a panic on my 8-stable box:

panic: snapacct_ufs2: bad block
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff805fd350 at kdb_backtrace+0x60
#1 0xffffffff805cb194 at panic+0x1b4
#2 0xffffffff807eed0e at snapacct_ufs2+0xfe
#3 0xffffffff807ee55f at indiracct_ufs2+0x2ff
#4 0xffffffff807ee4f7 at indiracct_ufs2+0x297
#5 0xffffffff807ef0ce at expunge_ufs2+0x30e
#6 0xffffffff807f2f79 at ffs_snapshot+0x1e59
#7 0xffffffff80802e38 at ffs_mount+0x1628
#8 0xffffffff80654f1c at vfs_donmount+0xf9c
#9 0xffffffff80655903 at nmount+0x73
#10 0xffffffff8060a17e at syscallenter+0x2fe
#11 0xffffffff808bcd31 at syscall+0x41
#12 0xffffffff808a52f2 at Xfast_syscall+0xe2

This box is running:
FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #0 r224227: Wed Jul 20
16:55:23 BST 2011     root@XXX:/usr/obj/usr/src/sys/GENERIC  amd64

The crash happened during automated backup with dump(8) on the root file
system. There was plenty of free space left.
I have deleted all remaining snapshot files now.

Either dump or savecore didnt work, so that's the only info i have
(interestingly the above backtrace ended up in the logs anyway).

This panic has happened before (twice or so) with older versions of
fbsd. Would it be prudent to newfs and restore from backup, just to make
sure there are no remaining glitches?


Johannes


From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 14:07:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E9111106566C
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 14:07:29 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id A3F218FC0C
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 14:07:29 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EAILJLk6DaFvO/2dsb2JhbAA1AQEFKQRGEh0YAgINBx4CFlEHhG2jfrkTkUCBK4F7gguBDwSScogxiEs
X-IronPort-AV: E=Sophos;i="4.67,269,1309752000"; d="scan'208";a="132289229"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 26 Jul 2011 10:07:28 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C2301B3F08;
	Tue, 26 Jul 2011 10:07:28 -0400 (EDT)
Date: Tue, 26 Jul 2011 10:07:28 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Kostik Belousov <kostikbel@gmail.com>
Message-ID: <429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110726090441.GD17489@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 14:07:30 -0000

Kostik Belousov wrote:
> On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
> > Hi,
> >
> > Currently both NFS servers set the vnode lock LK_SHARED
> > and so do the local syscalls (at least that's how it looks
> > by inspection?).
> >
> > Peter Holm just posted me this panic, where a test for an
> > exclusive vnode lock fails in msdosfs_readdir().
> > KDB: stack backtrace:
> > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...)
> > at db_trace_self_wrapper+0x26
> > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at
> > kdb_backtrace+0x2a
> > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23
> > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at
> > assert_vop_elocked+0x55
> > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45
> > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at
> > msdosfs_readdir+0x528
> > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at
> > VOP_READDIR_APV+0xc5
> > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at
> > nfsrvd_readdir+0x38e
> > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at
> > nfsrvd_dorpc+0x1f79
> > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at
> > nfssvc_program+0x40f
> > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...)
> > at svc_run_internal+0x952
> > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at
> > svc_thread_start+0x10
> > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 ---
> > pcbmap: 0xc7f20ae0 is not exclusive locked but should be
> > KDB: enter: lock violation
> >
> > So, does anyone know if the msdosfs_readdir() really requires a
> > LK_EXCLUSIVE
> > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()?
> 
> Yes, msdosfs currently requires all vnode locks to be exclusive. One
> of
> the reasons is that each denode (the msdosfs-private vnode data)
> carries
> the fat entries cache, and this cache is updated even by the
> operations
> that do not modify vnode from the VFS POV.
> 
> The locking regime is enforced by the getnewvnode() initializing the
> vnode
> lock with LK_NOSHARE flag, and msdosfs code not calling
> VN_LOCK_ASHARE()
> on the newly instantiated vnode.
> 
> My question is, was the vnode in question locked at all ?
I think the problem is that I do a LK_DOWNGRADE. From a quick
look at __lockmgr_args(), it doesn't check LK_NOSHARE for a
LK_DOWNGRADE.

Maybe __lockmgr_args() should have something like:
   if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags & LK_NOSHARE))
        return (0);   /* noop */
after the
   if (op == LK_SHARED && (lk->lock_object.lo_flags & LK_NOSHARE))
        op = LK_EXCLUSIVE;
lines?

Anyhow, I'll get pho@ to test a patch without the LK_DOWNGRADE in
it. (It was pretty useless and would go away soon anyhow, once the
lkflags argument to VFS_FHTOVP() gets used.)

Thanks for the info, rick

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 14:22:00 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 809B9106564A;
	Tue, 26 Jul 2011 14:22:00 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 1EABA8FC19;
	Tue, 26 Jul 2011 14:21:59 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6QELuVH036984
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 26 Jul 2011 17:21:56 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p6QELubY077960; Tue, 26 Jul 2011 17:21:56 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6QELuP2077959; 
	Tue, 26 Jul 2011 17:21:56 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 26 Jul 2011 17:21:56 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20110726142156.GJ17489@deviant.kiev.zoral.com.ua>
References: <20110726090441.GD17489@deviant.kiev.zoral.com.ua>
	<429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="b4Xfh4GKY2byHbNw"
Content-Disposition: inline
In-Reply-To: <429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: FreeBSD FS <freebsd-fs@freebsd.org>, attilio@freebsd.org
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 14:22:00 -0000


--b4Xfh4GKY2byHbNw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote:
> Kostik Belousov wrote:
> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
> > > Hi,
> > >
> > > Currently both NFS servers set the vnode lock LK_SHARED
> > > and so do the local syscalls (at least that's how it looks
> > > by inspection?).
> > >
> > > Peter Holm just posted me this panic, where a test for an
> > > exclusive vnode lock fails in msdosfs_readdir().
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,..=
.)
> > > at db_trace_self_wrapper+0x26
> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at
> > > kdb_backtrace+0x2a
> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23
> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at
> > > assert_vop_elocked+0x55
> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45
> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at
> > > msdosfs_readdir+0x528
> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at
> > > VOP_READDIR_APV+0xc5
> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at
> > > nfsrvd_readdir+0x38e
> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at
> > > nfsrvd_dorpc+0x1f79
> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at
> > > nfssvc_program+0x40f
> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...)
> > > at svc_run_internal+0x952
> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at
> > > svc_thread_start+0x10
> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
> > > fork_trampoline() at fork_trampoline+0x8
> > > --- trap 0x804c12e, eip =3D 0xc, esp =3D 0x33, ebp =3D 0x1 ---
> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be
> > > KDB: enter: lock violation
> > >
> > > So, does anyone know if the msdosfs_readdir() really requires a
> > > LK_EXCLUSIVE
> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()?
> >=20
> > Yes, msdosfs currently requires all vnode locks to be exclusive. One
> > of
> > the reasons is that each denode (the msdosfs-private vnode data)
> > carries
> > the fat entries cache, and this cache is updated even by the
> > operations
> > that do not modify vnode from the VFS POV.
> >=20
> > The locking regime is enforced by the getnewvnode() initializing the
> > vnode
> > lock with LK_NOSHARE flag, and msdosfs code not calling
> > VN_LOCK_ASHARE()
> > on the newly instantiated vnode.
> >=20
> > My question is, was the vnode in question locked at all ?
> I think the problem is that I do a LK_DOWNGRADE. From a quick
> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a
> LK_DOWNGRADE.
>=20
> Maybe __lockmgr_args() should have something like:
>    if (op =3D=3D LK_DOWNGRADE && (lk->lock_object.lo_flags & LK_NOSHARE))
>         return (0);   /* noop */
> after the
>    if (op =3D=3D LK_SHARED && (lk->lock_object.lo_flags & LK_NOSHARE))
>         op =3D LK_EXCLUSIVE;
> lines?
The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade,
but I agree with the essence of your proposal.

>=20
> Anyhow, I'll get pho@ to test a patch without the LK_DOWNGRADE in
> it. (It was pretty useless and would go away soon anyhow, once the
> lkflags argument to VFS_FHTOVP() gets used.)
>=20
> Thanks for the info, rick

--b4Xfh4GKY2byHbNw
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk4uzYQACgkQC3+MBN1Mb4jUqwCfd0psq10eFKVOBjT6Ih4XKH55
THAAoPXF5vfaXy/LPtnjRmSK9i2d4IdK
=TV5Y
-----END PGP SIGNATURE-----

--b4Xfh4GKY2byHbNw--

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 20:07:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1DDE5106564A
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 20:07:28 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D27F48FC15
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 20:07:27 +0000 (UTC)
Received: by gyf3 with SMTP id 3so703080gyf.13
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 13:07:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=E3mXvGBf5f3u1wP19CHgp8dr/Jaf1PM2VUBGAGQRwjw=;
	b=hi8Xz1t+tBb1oGLoWPSSG6tDlX6IZYIgcliXLsM+gzfipVFaO2RCmiOQ0K09sFggMA
	K7NHsp19Am4puuzYGZMa2pU7Q/VFieAhb1wNXVKox9D1uk3SzQLpUYS5zLcGlAKcPnkB
	zI8rKyRq4cFmkkFujWCDUcjlbiKmIFQ+4g4EI=
MIME-Version: 1.0
Received: by 10.236.137.140 with SMTP id y12mr7253226yhi.191.1311710846912;
	Tue, 26 Jul 2011 13:07:26 -0700 (PDT)
Received: by 10.236.103.15 with HTTP; Tue, 26 Jul 2011 13:07:26 -0700 (PDT)
In-Reply-To: <4E2C9419.4000205@rcn.com>
References: <4E2C9419.4000205@rcn.com>
Date: Tue, 26 Jul 2011 21:07:26 +0100
Message-ID: <CALfReyfLzWnCdzFxgE+9yqKEBBW2u-ieFagmkrREJ-t_fEmp-A@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: Gary Corcoran <gcorcoran@rcn.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: 3TB drives on ZFS and booting
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 20:07:28 -0000

On 24 July 2011 22:52, Gary Corcoran <gcorcoran@rcn.com> wrote:

> I have seen conflicting information on the internet about this, and so
> I would like a direct answer from someone who knows for sure.  Does
> FreeBSD's
> ZFS work with 3TB drives, and is it possible to do a ZFS-only (i.e. boot
> from
> ZFS) installation with 3TB drives on FreeBSD?  I presume that since ZFS was
> designed
> to handle huge filesystems, it would have no problem with 3TB drives, but I
> guess
> the real question is the ZFS boot code - can it currently handle >2TB
> drives?
> Bottom line: would I be able to successfully build (and of course boot) a
> FreeBSD
> ZFS-only system using only 3TB drives?
>
> Thanks,
> Gary
>
> ______________________________**_________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-fs<http://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org<freebsd-fs-unsubscribe@freebsd.org>
> "
>


The main issue with 3tb drives will be the 4k sector size. This applies to
most of the 2tb drives as well. Make sure you use GPT layout and align it.
IE make sure each gpt partiion start sector is / by 8 and its size is. Here
is mine for an example

$ gpart show ada0
=>        34  3907029101  ada0  GPT  (1.8T)
          34           6        - free -  (3.0k)
          40         128     1  freebsd-boot  (64k)
         168     6291456     2  freebsd-swap  (3.0G)
     6291624  3900213229     3  freebsd-zfs  (1.8T)

also make sure you use gpt boot blocks that can cope with 4k aligned drives.
Use the ones from current or these binary ones
http://people.freebsd.org/~pjd/zfsboot/

From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 26 20:18:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 025B1106566B
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 20:18:56 +0000 (UTC)
	(envelope-from mike@sentex.net)
Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca
	[IPv6:2607:f3e0:0:1::12])
	by mx1.freebsd.org (Postfix) with ESMTP id BCD3B8FC0A
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 20:18:55 +0000 (UTC)
Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca
	[IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a])
	by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p6QKIsCI081218
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 16:18:54 -0400 (EDT)
	(envelope-from mike@sentex.net)
Message-ID: <4E2F2122.6080204@sentex.net>
Date: Tue, 26 Jul 2011 16:18:42 -0400
From: Mike Tancsa <mike@sentex.net>
Organization: Sentex Communications
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.71 on IPv6:2607:f3e0:0:1::12
Subject: zfs error -  snapshot: Bad file descriptor
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jul 2011 20:18:56 -0000

I googled around for an answer to this, but other than reboot, I never found any other strategies.

On my backup server (RELENG_8 from Jun 20th, AMD64 8G of RAM), I have one big pool

# zpool status -v
  pool: zbackup1
 state: ONLINE
 scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011
config:

        NAME        STATE     READ WRITE CKSUM
        zbackup1    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0

errors: No known data errors

and a number of file systems


zbackup1                         5241690240 2248788502 2992901738    43%    /zbackup1
zbackup1/archive                 2992901771         33 2992901738     0%    /zbackup1/archive
zbackup1/cust1                   3254254853  261353115 2992901738     8%    /zbackup1/cust1


When I would change to /zbackup1/cust1/.zfs

and do a
ls -l 

# ls -l
ls: snapshot: Bad file descriptor
total 4
dr-xr-xr-x   4 root  wheel  -  4 Mar  4 08:43 .
drwxr-xr-x  21 root  wheel  - 21 Jun 29 11:46 ..
dr-xr-xr-x   2 root  wheel  -  2 Mar  4 08:43 shares


snapshot was set to visible

zbackup1/cust1               snapdir               visible                            inherited from zbackup1

And I could even list them in zfs get all

zbackup1/cust1@20110715       type                  snapshot                           -
zbackup1/cust1@20110715       creation              Fri Jul 15  8:10 2011              -
zbackup1/cust1@20110715       used                  7.41G                              -


But I could never change to the directory and do an ls -l, let along get files


I ran a full scrub, but it did not help.  I did a reboot and all worked after that.  

# ls -l
total 4
dr-xr-xr-x   4 root  wheel  -  4 Mar  4 08:43 .
drwxr-xr-x  21 root  wheel  - 21 Jun 29 11:46 ..
dr-xr-xr-x   2 root  wheel  -  2 Mar  4 08:43 shares
dr-xr-xr-x   5 root  wheel  -  5 Jul 26 12:11 snapshot
# cd snapshot/
# ls -l
total 6
dr-xr-xr-x   5 root  wheel  -  5 Jul 26 12:11 .
dr-xr-xr-x   4 root  wheel  -  4 Mar  4 08:43 ..
drwxr-xr-x  20 root  wheel  - 21 Jun 29 11:46 20110715
drwxr-xr-x  20 root  wheel  - 21 Jun 29 11:46 20110722
drwxr-xr-x  20 root  wheel  - 21 Jun 29 11:46 test


In the future, are there any other things I can do to fix the issue short of rebooting ?

	---Mike


-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 02:21:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0A22D106567A
	for <freebsd-fs@freebsd.org>; Wed, 27 Jul 2011 02:21:28 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id BC9268FC17
	for <freebsd-fs@freebsd.org>; Wed, 27 Jul 2011 02:21:27 +0000 (UTC)
Received: by yic13 with SMTP id 13so955952yic.13
	for <freebsd-fs@freebsd.org>; Tue, 26 Jul 2011 19:21:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=OWP5iWE+/Ai/6dO5Kbhdt1zcily/Cnl3UxokZIHFcnE=;
	b=Jlza4NTimdiKa3TWnKY3n5OfJAp8dZtsKFNNGVJFrDDkuTbW3E0oKO0cMyE8AGrMwi
	RCL0Qn8XWpeiZwxVs29BRWBTy6HJ2xtNxIGkl4awz6nhPWztoaJ/Ps1ZY/Eo4L6/DE7t
	4PiQdRw5ctCHY3tQ6F+LyCDvl1vIz8Kz9kUn0=
MIME-Version: 1.0
Received: by 10.236.136.226 with SMTP id w62mr8008259yhi.93.1311731797540;
	Tue, 26 Jul 2011 18:56:37 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.236.108.129 with HTTP; Tue, 26 Jul 2011 18:56:37 -0700 (PDT)
In-Reply-To: <20110726142156.GJ17489@deviant.kiev.zoral.com.ua>
References: <20110726090441.GD17489@deviant.kiev.zoral.com.ua>
	<429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca>
	<20110726142156.GJ17489@deviant.kiev.zoral.com.ua>
Date: Wed, 27 Jul 2011 03:56:37 +0200
X-Google-Sender-Auth: 8Uk8oDYksBsC2CPzvXTYBymqk-c
Message-ID: <CAJ-FndCWcxRGAuNN=OtKZnWr3JQvcWr969pDqm7KN+ig5xSFdQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Kostik Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 02:21:28 -0000

2011/7/26 Kostik Belousov <kostikbel@gmail.com>:
> On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote:
>> Kostik Belousov wrote:
>> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
>> > > Hi,
>> > >
>> > > Currently both NFS servers set the vnode lock LK_SHARED
>> > > and so do the local syscalls (at least that's how it looks
>> > > by inspection?).
>> > >
>> > > Peter Holm just posted me this panic, where a test for an
>> > > exclusive vnode lock fails in msdosfs_readdir().
>> > > KDB: stack backtrace:
>> > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,.=
..)
>> > > at db_trace_self_wrapper+0x26
>> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at
>> > > kdb_backtrace+0x2a
>> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23
>> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at
>> > > assert_vop_elocked+0x55
>> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45
>> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at
>> > > msdosfs_readdir+0x528
>> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at
>> > > VOP_READDIR_APV+0xc5
>> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at
>> > > nfsrvd_readdir+0x38e
>> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at
>> > > nfsrvd_dorpc+0x1f79
>> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at
>> > > nfssvc_program+0x40f
>> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...)
>> > > at svc_run_internal+0x952
>> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at
>> > > svc_thread_start+0x10
>> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
>> > > fork_trampoline() at fork_trampoline+0x8
>> > > --- trap 0x804c12e, eip =3D 0xc, esp =3D 0x33, ebp =3D 0x1 ---
>> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be
>> > > KDB: enter: lock violation
>> > >
>> > > So, does anyone know if the msdosfs_readdir() really requires a
>> > > LK_EXCLUSIVE
>> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()?
>> >
>> > Yes, msdosfs currently requires all vnode locks to be exclusive. One
>> > of
>> > the reasons is that each denode (the msdosfs-private vnode data)
>> > carries
>> > the fat entries cache, and this cache is updated even by the
>> > operations
>> > that do not modify vnode from the VFS POV.
>> >
>> > The locking regime is enforced by the getnewvnode() initializing the
>> > vnode
>> > lock with LK_NOSHARE flag, and msdosfs code not calling
>> > VN_LOCK_ASHARE()
>> > on the newly instantiated vnode.
>> >
>> > My question is, was the vnode in question locked at all ?
>> I think the problem is that I do a LK_DOWNGRADE. From a quick
>> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a
>> LK_DOWNGRADE.
>>
>> Maybe __lockmgr_args() should have something like:
>> =C2=A0 =C2=A0if (op =3D=3D LK_DOWNGRADE && (lk->lock_object.lo_flags & L=
K_NOSHARE))
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (0); =C2=A0 /* noop */
>> after the
>> =C2=A0 =C2=A0if (op =3D=3D LK_SHARED && (lk->lock_object.lo_flags & LK_N=
OSHARE))
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 op =3D LK_EXCLUSIVE;
>> lines?
> The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade,
> but I agree with the essence of your proposal.

As long as the difference in semantic with the old lockmgr is
correctly stressed out in the doc (and eventually comments) I'm fine
with this change.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 03:10:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5FB86106566C;
	Wed, 27 Jul 2011 03:10:51 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 36BAB8FC12;
	Wed, 27 Jul 2011 03:10:51 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6R3Apxc002723;
	Wed, 27 Jul 2011 03:10:51 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6R3AoV1002664;
	Wed, 27 Jul 2011 03:10:50 GMT (envelope-from linimon)
Date: Wed, 27 Jul 2011 03:10:50 GMT
Message-Id: <201107270310.p6R3AoV1002664@freefall.freebsd.org>
To: universite@ukr.net, linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org,
	freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/159210: [zfs] [hang] ZFS (scrub???) freezes system
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 03:10:51 -0000

Old Synopsis: [ZFS] ZFS (scrub???) freezes system
New Synopsis: [zfs] [hang] ZFS (scrub???) freezes system

State-Changed-From-To: open->closed
State-Changed-By: linimon
State-Changed-When: Wed Jul 27 03:09:35 UTC 2011
State-Changed-Why: 
Duplicate of kern/159045.


Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed Jul 27 03:09:35 UTC 2011
Responsible-Changed-Why: 

http://www.freebsd.org/cgi/query-pr.cgi?pr=159210

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 07:52:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1BC03106564A
	for <freebsd-fs@freebsd.org>; Wed, 27 Jul 2011 07:52:14 +0000 (UTC)
	(envelope-from gerrit@pmp.uni-hannover.de)
Received: from mrelay1.uni-hannover.de (mrelay1.uni-hannover.de [130.75.2.106])
	by mx1.freebsd.org (Postfix) with ESMTP id A3E0C8FC08
	for <freebsd-fs@freebsd.org>; Wed, 27 Jul 2011 07:52:13 +0000 (UTC)
Received: from www.pmp.uni-hannover.de (www.pmp.uni-hannover.de [130.75.117.2])
	by mrelay1.uni-hannover.de (8.14.4/8.14.4) with ESMTP id p6R7q3qR026990;
	Wed, 27 Jul 2011 09:52:07 +0200
Received: from pmp.uni-hannover.de (unknown [130.75.117.3])
	by www.pmp.uni-hannover.de (Postfix) with SMTP
	id 9682E72; Wed, 27 Jul 2011 09:52:03 +0200 (CEST)
Date: Wed, 27 Jul 2011 09:52:03 +0200
From: Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit@pmp.uni-hannover.de>
To: Mike Tancsa <mike@sentex.net>
Message-Id: <20110727095203.50f3c0d6.gerrit@pmp.uni-hannover.de>
In-Reply-To: <4E2F2122.6080204@sentex.net>
References: <4E2F2122.6080204@sentex.net>
Organization: Albert-Einstein-Institut (MPI =?ISO-8859-1?Q?f=FCr?=
	Gravitationsphysik & IGP =?ISO-8859-1?Q?Universit=E4t?= Hannover)
X-Mailer: Sylpheed 3.0.3 (GTK+ 2.22.1; amd64-portbld-freebsd8.1)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-PMX-Version: 5.5.9.395186, Antispam-Engine: 2.7.2.376379,
	Antispam-Data: 2011.7.27.73314
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs error -  snapshot: Bad file descriptor
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: gerrit.kuehn@aei.mpg.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 07:52:14 -0000

On Tue, 26 Jul 2011 16:18:42 -0400 Mike Tancsa <mike@sentex.net> wrote
about zfs error -  snapshot: Bad file descriptor:

MT> I googled around for an answer to this, but other than reboot, I never
MT> found any other strategies.

MT> When I would change to /zbackup1/cust1/.zfs
MT> 
MT> and do a
MT> ls -l 
MT> 
MT> # ls -l
MT> ls: snapshot: Bad file descriptor

Just for the record: I have exactly the same issue here with zfsv28 and
8-stable from 14th of July.


cu
  Gerrit

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 08:06:31 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0E95A1065673;
	Wed, 27 Jul 2011 08:06:31 +0000 (UTC) (envelope-from mm@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id DB9808FC0C;
	Wed, 27 Jul 2011 08:06:30 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6R86Ukh008412;
	Wed, 27 Jul 2011 08:06:30 GMT (envelope-from mm@freefall.freebsd.org)
Received: (from mm@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6R86UiA008408;
	Wed, 27 Jul 2011 08:06:30 GMT (envelope-from mm)
Date: Wed, 27 Jul 2011 08:06:30 GMT
Message-Id: <201107270806.p6R86UiA008408@freefall.freebsd.org>
To: miks.mikelsons@gmail.com, mm@FreeBSD.org, freebsd-fs@FreeBSD.org
From: mm@FreeBSD.org
Cc: 
Subject: Re: kern/142914: [zfs] ZFS performance degradation over time
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 08:06:31 -0000

Synopsis: [zfs] ZFS performance degradation over time

State-Changed-From-To: open->closed
State-Changed-By: mm
State-Changed-When: Wed Jul 27 08:06:30 UTC 2011
State-Changed-Why: 
Closed on submitter request.

http://www.freebsd.org/cgi/query-pr.cgi?pr=142914

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 11:27:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C4CC71065672
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 11:27:11 +0000 (UTC)
	(envelope-from prvs=11896da94d=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id C99718FC0C
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 11:27:10 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 12:16:25 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 12:16:25 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014339848.msg
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 12:16:24 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG
Message-ID: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: <freebsd-fs@FreeBSD.ORG>
Date: Wed, 27 Jul 2011 12:16:50 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: 
Subject: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 11:27:11 -0000

Got a machine which is hung accessing a specific zfs pool, other
volumes seem unaffected, there's no errors showing on zpool status and
no errors in /var/log/messages

Processes seem to be hung in a variaty of stats including:-
STOP, zfs, zio->i, db->db, tx->tx

zfs list also hangs.

Here's some procstat -k -k <pid> from some hung processes and the output
from zfs-stats -a

The machine is running 8.2-RELEASE

procstat -k -k 94003
  PID    TID COMM             TDNAME           KSTACK
94003 100341 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 100417 java             initial thread   mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_wait+0x72a __umtx_op_wait+0x5e syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 100459 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 100536 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 100544 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 
_cv_timedwait_sig+0x134 seltdwait+0x98 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 100751 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 100925 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101268 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101319 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101417 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101486 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101498 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101555 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_wait+0x72a __umtx_op_wait_uint_private+0x64 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101563 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101565 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101566 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101804 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101897 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101971 java             -                mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e 
txg_hold_open+0x4f dmu_tx_assign+0x189 zfs_freebsd_create+0x2f2 VOP_CREATE_APV+0x31 vn_open_cred+0x4ab kern_openat+0x181 
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 101984 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 102164 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _cv_wait_sig+0x128 
seltdwait+0x110 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 102627 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103546 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103636 java             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a 
dbuf_findbp+0xf7 dbuf_hold_impl+0xc2 dbuf_hold_level+0x1a dmu_tx_check_ioerr+0x52 dmu_tx_count_write+0x297 dmu_tx_hold_write+0x4a 
zfs_freebsd_write+0x397 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x8b kern_writev+0x60 write+0x55 syscallenter+0x1e5
94003 103676 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103728 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103748 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103749 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103750 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
94003 103751 java             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a 
dbuf_findbp+0xf7 dbuf_hold_impl+0xc2 dbuf_hold_level+0x1a dmu_tx_check_ioerr+0x52 dmu_tx_count_write+0x297 dmu_tx_hold_write+0x4a 
zfs_freebsd_write+0x397 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x8b kern_writev+0x60 write+0x55 syscallenter+0x1e5


procstat -k -k 39568
  PID    TID COMM             TDNAME           KSTACK
39568 100303 find             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a 
dmu_buf_hold+0xcc zap_lockdir+0x55 zap_cursor_retrieve+0x194 zfs_freebsd_readdir+0x2b6 kern_getdirentries+0x217 getdirentries+0x23 
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2


zfs-stats -a
------------------------------------------------------------------------
ZFS Subsystem Report                            Wed Jul 27 10:57:12 2011
------------------------------------------------------------------------
System Information:

        Kernel Version:                         802000 (osreldate)
        Hardware Platform:                      amd64
        Processor Architecture:                 amd64

FreeBSD 8.2-RELEASE #0: Fri Mar 18 10:58:44 UTC 2011 root
10:57AM  up 112 days, 21 mins, 1 user, load averages: 0.56, 0.39, 0.34
------------------------------------------------------------------------
System Memory Statistics:
        Physical Memory:                        24555.62M
        Kernel Memory:                          2076.86M
        DATA:                           99.50%  2066.54M
        TEXT:                           0.50%   10.32M
------------------------------------------------------------------------
ZFS pool information:
        Storage pool Version (spa):             15
        Filesystem Version (zpl):               4
------------------------------------------------------------------------
ARC Misc:
        Deleted:                                122887476
        Recycle Misses:                         2985802
        Mutex Misses:                           51968
        Evict Skips:                            51968

ARC Size:
        Current Size (arcsize):         12.50%  2847.37M
        Target Size (Adaptive, c):      12.50%  2847.48M
        Min Size (Hard Limit, c_min):   12.50%  2847.48M
        Max Size (High Water, c_max):   ~8:1    22779.81M

ARC Size Breakdown:
        Recently Used Cache Size (p):   29.47%  839.09M
        Freq. Used Cache Size (c-p):    70.53%  2008.38M

ARC Hash Breakdown:
        Elements Max:                           1093246
        Elements Current:               34.92%  381795
        Collisions:                             496215463
        Chain Max:                              11
        Chains:                                 86474

ARC Eviction Statistics:
        Evicts Total:                           2581704887296
        Evicts Eligible for L2:         93.73%  2419921712128
        Evicts Ineligible for L2:       6.27%   161783175168
        Evicts Cached to L2:                    0

ARC Efficiency:
        Cache Access Total:                     17042848480
        Cache Hit Ratio:                99.85%  17017691729
        Cache Miss Ratio:               0.15%   25156751
        Actual Hit Ratio:               88.30%  15049590290

        Data Demand Efficiency:         99.90%
        Data Prefetch Efficiency:       86.74%

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             11.48%  1952878601
          Most Recently Used (mru):     5.30%   901233481
          Most Frequently Used (mfu):   83.14%  14148356809
          MRU Ghost (mru_ghost):        0.03%   5560514
          MFU Ghost (mfu_ghost):        0.06%   9662324

        CACHE HITS BY DATA TYPE:
          Demand Data:                  65.68%  11176417108
          Prefetch Data:                0.26%   44828929
          Demand Metadata:              6.89%   1173279354
          Prefetch Metadata:            27.17%  4623166338

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  44.65%  11232287
          Prefetch Data:                27.24%  6852876
          Demand Metadata:              23.17%  5829877
          Prefetch Metadata:            4.94%   1241711
------------------------------------------------------------------------
VDEV Cache Summary:
        Access Total:                           6646743
        Hits Ratio:                     67.29%  4472518
        Miss Ratio:                     32.71%  2174225
        Delegations:                            350939
------------------------------------------------------------------------
File-Level Prefetch Stats (DMU):

DMU Efficiency:
        Access Total:                           119890457768
        Hit Ratio:                      91.27%  109423001264
        Miss Ratio:                     8.73%   10467456504

        Colinear Access Total:                  10467456504
        Colinear Hit Ratio:             0.01%   632312
        Colinear Miss Ratio:            99.99%  10466824192

        Stride Access Total:                    107333142359
        Stride Hit Ratio:               99.99%  107326197075
        Stride Miss Ratio:              0.01%   6945284

DMU misc:
        Reclaim successes:                      2967491512
        Reclaim failures:                       7499332680
        Stream resets:                          58396
        Stream noresets:                        1273657946
        Bogus streams:                          0
------------------------------------------------------------------------
ZFS Tunable (sysctl):
        kern.maxusers=384
        vfs.zfs.l2c_only_size=0
        vfs.zfs.mfu_ghost_data_lsize=495015424
        vfs.zfs.mfu_ghost_metadata_lsize=96888320
        vfs.zfs.mfu_ghost_size=591903744
        vfs.zfs.mfu_data_lsize=17817088
        vfs.zfs.mfu_metadata_lsize=286173184
        vfs.zfs.mfu_size=563621376
        vfs.zfs.mru_ghost_data_lsize=1540046336
        vfs.zfs.mru_ghost_metadata_lsize=849328128
        vfs.zfs.mru_ghost_size=2389374464
        vfs.zfs.mru_data_lsize=232126464
        vfs.zfs.mru_metadata_lsize=118643712
        vfs.zfs.mru_size=531284992
        vfs.zfs.anon_data_lsize=0
        vfs.zfs.anon_metadata_lsize=0
        vfs.zfs.anon_size=68533248
        vfs.zfs.l2arc_norw=1
        vfs.zfs.l2arc_feed_again=1
        vfs.zfs.l2arc_noprefetch=0
        vfs.zfs.l2arc_feed_min_ms=200
        vfs.zfs.l2arc_feed_secs=1
        vfs.zfs.l2arc_headroom=2
        vfs.zfs.l2arc_write_boost=8388608
        vfs.zfs.l2arc_write_max=8388608
        vfs.zfs.arc_meta_limit=5971591168
        vfs.zfs.arc_meta_used=2665786824
        vfs.zfs.mdcomp_disable=0
        vfs.zfs.arc_min=2985795584
        vfs.zfs.arc_max=23886364672
        vfs.zfs.zfetch.array_rd_sz=1048576
        vfs.zfs.zfetch.block_cap=256
        vfs.zfs.zfetch.min_sec_reap=2
        vfs.zfs.zfetch.max_streams=8
        vfs.zfs.prefetch_disable=0
        vfs.zfs.check_hostid=1
        vfs.zfs.recover=0
        vfs.zfs.txg.write_limit_override=0
        vfs.zfs.txg.synctime=5
        vfs.zfs.txg.timeout=30
        vfs.zfs.scrub_limit=10
        vfs.zfs.vdev.cache.bshift=16
        vfs.zfs.vdev.cache.size=10485760
        vfs.zfs.vdev.cache.max=16384
        vfs.zfs.vdev.aggregation_limit=131072
        vfs.zfs.vdev.ramp_rate=2
        vfs.zfs.vdev.time_shift=6
        vfs.zfs.vdev.min_pending=4
        vfs.zfs.vdev.max_pending=10
        vfs.zfs.cache_flush_disable=0
        vfs.zfs.zil_disable=0
        vfs.zfs.zio.use_uma=0
        vfs.zfs.version.zpl=4
        vfs.zfs.version.spa=15
        vfs.zfs.version.dmu_backup_stream=1
        vfs.zfs.version.dmu_backup_header=2
        vfs.zfs.version.acl=1
        vfs.zfs.debug=0
        vfs.zfs.super_owner=0
        vm.kmem_size=24960106496
        vm.kmem_size_scale=1
        vm.kmem_size_min=0
        vm.kmem_size_max=329853485875
------------------------------------------------------------------------


procstat -k -k 36541
  PID    TID COMM             TDNAME           KSTACK
36541 100971 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 101251 java             initial thread   mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_wait+0x72a __umtx_op_wait+0x5e syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 101591 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 101648 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 101781 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 101798 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 101993 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102037 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102059 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102191 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_wait+0x72a __umtx_op_wait_uint_private+0x64 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102208 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102210 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102221 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102253 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102279 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102298 java             -                mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e 
txg_hold_open+0x4f dmu_tx_assign+0x189 zfs_freebsd_create+0x2f2 VOP_CREATE_APV+0x31 vn_open_cred+0x4ab kern_openat+0x181 
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102426 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102594 java             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 
dmu_buf_hold_array_by_dnode+0x217 dmu_buf_hold_array+0x6a dmu_read_uio+0x3f zfs_freebsd_read+0x5d3 vn_read+0x2cc dofileread+0xa1 
kern_readv+0x60 read+0x55 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 102775 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 103293 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 103790 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 103791 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36541 103797 java             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a 
dmu_buf_hold+0xcc zap_lockdir+0x55 zap_lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x534 zfs_dirlook+0x69 zfs_lookup+0x26b 
zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf0 VOP_LOOKUP_APV+0x40 lookup+0x452 namei+0x53a kern_statat_vnhook+0x8f
36541 103809 java             -                mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x75a vop_stdlock+0x39 
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup+0x50f vfs_cache_lookup+0xc0 VOP_LOOKUP_APV+0x40 lookup+0x452 namei+0x53a 
vn_open_cred+0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2


 procstat -k -k 36732
  PID    TID COMM             TDNAME           KSTACK
36732 100369 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100790 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100794 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100830 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100853 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100873 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100957 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 100977 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 101124 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_wait+0x72a __umtx_op_wait_uint_private+0x64 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 101130 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 101190 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 101326 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 102146 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 102257 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 102316 java             -                mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e 
txg_hold_open+0x4f dmu_tx_assign+0x189 zfs_freebsd_create+0x2f2 VOP_CREATE_APV+0x31 vn_open_cred+0x4ab kern_openat+0x181 
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 102484 java             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 dmu_buf_hold_array_by_dnode+0x28f 
dmu_buf_hold_array+0x6a dmu_read_uio+0x3f zfs_freebsd_read+0x5d3 vn_read+0x2cc dofileread+0xa1 kern_readv+0x60 read+0x55 
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 102943 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _cv_wait_sig+0x128 
seltdwait+0x110 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 103102 java             initial thread   mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 
do_wait+0x72a __umtx_op_wait+0x5e syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 103255 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 
do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 103292 java             -                mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 
_cv_timedwait_sig+0x134 seltdwait+0x98 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
36732 103481 java             -                mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 dmu_buf_hold_array_by_dnode+0x28f 
dmu_buf_hold_array+0x6a dmu_read_uio+0x3f zfs_freebsd_read+0x5d3 vn_read+0x2cc dofileread+0xa1 kern_readv+0x60 read+0x55 
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 12:06:46 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4188E106567C
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 12:06:46 +0000 (UTC)
	(envelope-from prvs=11896da94d=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id C037D8FC19
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 12:06:45 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 13:06:14 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 13:06:14 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014340588.msg
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 13:06:13 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG
Message-ID: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: <freebsd-fs@FreeBSD.ORG>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
Date: Wed, 27 Jul 2011 13:06:52 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: 
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 12:06:46 -0000

I've checked the raw disk and all seems fine there, so does look like its
some sort of zfs livelock.

I'm trying to keep the machine available in case someone needs more information,
but its a production machine so I'm going to have to reboot it in the next
few hours.

Disk tests:-

dd if=/dev/da1 of=/dev/null bs=10m 
5724+1 records in
5724+1 records out
60022480896 bytes transferred in 430.479894 secs (139431555 bytes/sec)


smartctl -a /dev/da1
smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-RELEASE amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     Corsair CSSD-F60GB2
Serial Number:    10446509320009990024
Firmware Version: 1.1
User Capacity:    60,022,480,896 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Wed Jul 27 11:27:30 2011 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (   0) seconds.
Offline data collection
capabilities:                    (0x7f) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   100   050    Pre-fail  Always       -       0/238293224
  5 Retired_Block_Count     0x0033   097   097   003    Pre-fail  Always       -       256
  9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always       -       5513h+00m+39.450s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2
171 Program_Fail_Count      0x0000   000   000   000    Old_age   Offline      -       0
172 Erase_Fail_Count        0x0000   000   000   000    Old_age   Offline      -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       0
177 Wear_Range_Delta        0x0000   000   000   ---    Old_age   Offline      -       1
181 Program_Fail_Count      0x0000   000   000   000    Old_age   Offline      -       0
182 Erase_Fail_Count        0x0000   000   000   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   022   026   000    Old_age   Always       -       22 (Min/Max 0/26)
195 ECC_Uncorr_Error_Count  0x001c   119   100   000    Old_age   Offline      -       0/238293224
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
231 SSD_Life_Left           0x0013   057   057   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       152704
234 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       90688
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       90688
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       3584

Error SMART Error Log Read failed: Input/output error
Smartctl: SMART Error Log Read Failed
Error SMART Error Self-Test Log Read failed: Input/output error
Smartctl: SMART Self Test Log Read Failed
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 12:50:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3214A106564A;
	Wed, 27 Jul 2011 12:50:34 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id E748C8FC16;
	Wed, 27 Jul 2011 12:50:33 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 6BED346B5B;
	Wed, 27 Jul 2011 08:50:33 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id AC4CC8A02F;
	Wed, 27 Jul 2011 08:50:32 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Date: Wed, 27 Jul 2011 08:40:56 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; )
References: <20110726090441.GD17489@deviant.kiev.zoral.com.ua>
	<20110726142156.GJ17489@deviant.kiev.zoral.com.ua>
	<CAJ-FndCWcxRGAuNN=OtKZnWr3JQvcWr969pDqm7KN+ig5xSFdQ@mail.gmail.com>
In-Reply-To: <CAJ-FndCWcxRGAuNN=OtKZnWr3JQvcWr969pDqm7KN+ig5xSFdQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201107270840.57104.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Wed, 27 Jul 2011 08:50:32 -0400 (EDT)
Cc: Attilio Rao <attilio@freebsd.org>
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 12:50:34 -0000

On Tuesday, July 26, 2011 9:56:37 pm Attilio Rao wrote:
> 2011/7/26 Kostik Belousov <kostikbel@gmail.com>:
> > On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote:
> >> Kostik Belousov wrote:
> >> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
> >> > > Hi,
> >> > >
> >> > > Currently both NFS servers set the vnode lock LK_SHARED
> >> > > and so do the local syscalls (at least that's how it looks
> >> > > by inspection?).
> >> > >
> >> > > Peter Holm just posted me this panic, where a test for an
> >> > > exclusive vnode lock fails in msdosfs_readdir().
> >> > > KDB: stack backtrace:
> >> > > 
db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...)
> >> > > at db_trace_self_wrapper+0x26
> >> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at
> >> > > kdb_backtrace+0x2a
> >> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23
> >> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at
> >> > > assert_vop_elocked+0x55
> >> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45
> >> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at
> >> > > msdosfs_readdir+0x528
> >> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at
> >> > > VOP_READDIR_APV+0xc5
> >> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at
> >> > > nfsrvd_readdir+0x38e
> >> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at
> >> > > nfsrvd_dorpc+0x1f79
> >> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at
> >> > > nfssvc_program+0x40f
> >> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...)
> >> > > at svc_run_internal+0x952
> >> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at
> >> > > svc_thread_start+0x10
> >> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
> >> > > fork_trampoline() at fork_trampoline+0x8
> >> > > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 ---
> >> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be
> >> > > KDB: enter: lock violation
> >> > >
> >> > > So, does anyone know if the msdosfs_readdir() really requires a
> >> > > LK_EXCLUSIVE
> >> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()?
> >> >
> >> > Yes, msdosfs currently requires all vnode locks to be exclusive. One
> >> > of
> >> > the reasons is that each denode (the msdosfs-private vnode data)
> >> > carries
> >> > the fat entries cache, and this cache is updated even by the
> >> > operations
> >> > that do not modify vnode from the VFS POV.
> >> >
> >> > The locking regime is enforced by the getnewvnode() initializing the
> >> > vnode
> >> > lock with LK_NOSHARE flag, and msdosfs code not calling
> >> > VN_LOCK_ASHARE()
> >> > on the newly instantiated vnode.
> >> >
> >> > My question is, was the vnode in question locked at all ?
> >> I think the problem is that I do a LK_DOWNGRADE. From a quick
> >> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a
> >> LK_DOWNGRADE.
> >>
> >> Maybe __lockmgr_args() should have something like:
> >>    if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags & LK_NOSHARE))
> >>         return (0);   /* noop */
> >> after the
> >>    if (op == LK_SHARED && (lk->lock_object.lo_flags & LK_NOSHARE))
> >>         op = LK_EXCLUSIVE;
> >> lines?
> > The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade,
> > but I agree with the essence of your proposal.
> 
> As long as the difference in semantic with the old lockmgr is
> correctly stressed out in the doc (and eventually comments) I'm fine
> with this change.

I think it is a bug in the LK_NOSHARE implementation if the old lockmgr()
didn't silently nop downgrade requests when LK_NOSHARE was set. :)  We
should definitely fix it to ignore downgrades for LK_NOSHARE.

-- 
John Baldwin

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 13:45:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 52B11106566C
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 13:45:42 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id AEFEF8FC1B
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 13:45:41 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04058;
	Wed, 27 Jul 2011 16:34:24 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E3013DF.10803@FreeBSD.org>
Date: Wed, 27 Jul 2011 16:34:23 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
In-Reply-To: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 13:45:42 -0000

on 27/07/2011 15:06 Steven Hartland said the following:
> I've checked the raw disk and all seems fine there, so does look like its
> some sort of zfs livelock.
> 
> I'm trying to keep the machine available in case someone needs more information,
> but its a production machine so I'm going to have to reboot it in the next
> few hours.
> 
> Disk tests:-
> 
> dd if=/dev/da1 of=/dev/null bs=10m 5724+1 records in
> 5724+1 records out
> 60022480896 bytes transferred in 430.479894 secs (139431555 bytes/sec)
> 
> 
> smartctl -a /dev/da1

Is this the only disk associated with the troubled pool?

> smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-RELEASE amd64] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     SandForce Driven SSDs
> Device Model:     Corsair CSSD-F60GB2
> Serial Number:    10446509320009990024
> Firmware Version: 1.1
> User Capacity:    60,022,480,896 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 6
> Local Time is:    Wed Jul 27 11:27:30 2011 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00) Offline data collection activity
>                                        was never started.
>                                        Auto Offline Data Collection: Disabled.
> Self-test execution status:      (   0) The previous self-test routine completed
>                                        without error or no self-test has ever
>                                        been run.
> Total time to complete Offline data collection:                 (   0) seconds.
> Offline data collection
> capabilities:                    (0x7f) SMART execute Offline immediate.
>                                        Auto Offline data collection on/off support.
>                                        Abort Offline collection upon new
>                                        command.
>                                        Offline surface scan supported.
>                                        Self-test supported.
>                                        Conveyance Self-test supported.
>                                        Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                        power-saving mode.
>                                        Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                        General Purpose Logging supported.
> Short self-test routine recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        (  48) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
> SCT capabilities:              (0x003d) SCT Status supported.
>                                        SCT Error Recovery Control supported.
>                                        SCT Feature Control supported.
>                                        SCT Data Table supported.
> 
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
> WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   119   100   050    Pre-fail  Always      
> -       0/238293224
>  5 Retired_Block_Count     0x0033   097   097   003    Pre-fail  Always      
> -       256
>  9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always      
> -       5513h+00m+39.450s
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always      
> -       2
> 171 Program_Fail_Count      0x0000   000   000   000    Old_age   Offline     
> -       0
> 172 Erase_Fail_Count        0x0000   000   000   000    Old_age   Offline     
> -       0
> 174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline     
> -       0
> 177 Wear_Range_Delta        0x0000   000   000   ---    Old_age   Offline     
> -       1
> 181 Program_Fail_Count      0x0000   000   000   000    Old_age   Offline     
> -       0
> 182 Erase_Fail_Count        0x0000   000   000   000    Old_age   Offline     
> -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always      
> -       0
> 194 Temperature_Celsius     0x0022   022   026   000    Old_age   Always      
> -       22 (Min/Max 0/26)
> 195 ECC_Uncorr_Error_Count  0x001c   119   100   000    Old_age   Offline     
> -       0/238293224
> 196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always      
> -       0
> 231 SSD_Life_Left           0x0013   057   057   010    Pre-fail  Always      
> -       0
> 233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline     
> -       152704
> 234 SandForce_Internal      0x0000   000   000   000    Old_age   Offline     
> -       90688
> 241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always      
> -       90688
> 242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always      
> -       3584
> 
> Error SMART Error Log Read failed: Input/output error
> Smartctl: SMART Error Log Read Failed
> Error SMART Error Self-Test Log Read failed: Input/output error
> Smartctl: SMART Self Test Log Read Failed
> SMART Selective self-test log data structure revision number 1
> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 13:55:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D2296106566B
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 13:55:30 +0000 (UTC)
	(envelope-from prvs=11896da94d=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 5FACC8FC12
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 13:55:29 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 14:54:58 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 14:54:57 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014341803.msg;
	Wed, 27 Jul 2011 14:54:56 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
Date: Wed, 27 Jul 2011 14:55:36 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 13:55:30 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
>> smartctl -a /dev/da1
> 
> Is this the only disk associated with the troubled pool?

Yes, there's two disks in the machine 1 x 500GB HD (root etc) and
1 x 60GB SSD which is the the pool we're having issues with.

As you can see a full raw device dd completed fine. My admins tell
me they have had a number of cases like this requiring a power cycle
after which all is fine. Apparently it seems to affect machines
with high uptimes, if thats of help. This machine shows:-

uptime
 1:55PM  up 112 days,  3:19, 1 user, load averages: 0.00, 0.00, 0.00

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 14:10:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EBD921065670
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 14:10:33 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 2F01C8FC13
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 14:10:32 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA04530;
	Wed, 27 Jul 2011 17:10:30 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E301C55.7090105@FreeBSD.org>
Date: Wed, 27 Jul 2011 17:10:29 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
	<3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
In-Reply-To: <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 14:10:34 -0000

on 27/07/2011 16:55 Steven Hartland said the following:
> Apparently it seems to affect machines
> with high uptimes, if thats of help. This machine shows:-
> 
> uptime
> 1:55PM  up 112 days,  3:19, 1 user, load averages: 0.00, 0.00, 0.00

Just a guess, perhaps it's another manifestation of this issue:
http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 14:18:32 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 111261065670;
	Wed, 27 Jul 2011 14:18:32 +0000 (UTC)
	(envelope-from prvs=11896da94d=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 5FE5D8FC13;
	Wed, 27 Jul 2011 14:18:31 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:17:59 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:17:59 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014342069.msg;
	Wed, 27 Jul 2011 15:17:59 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
	<3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
	<4E301C55.7090105@FreeBSD.org>
Date: Wed, 27 Jul 2011 15:18:38 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 14:18:32 -0000


----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-fs@FreeBSD.org>
Sent: Wednesday, July 27, 2011 3:10 PM
Subject: Re: zfs process hang on pool access


> on 27/07/2011 16:55 Steven Hartland said the following:
>> Apparently it seems to affect machines
>> with high uptimes, if thats of help. This machine shows:-
>> 
>> uptime
>> 1:55PM  up 112 days,  3:19, 1 user, load averages: 0.00, 0.00, 0.00
> 
> Just a guess, perhaps it's another manifestation of this issue:
> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html

Not sure looking at that thread and comparing to the machine:-

top -SHb 500 | grep arc
    7 root      -8    -     0K    88K arc_re  0  11:07  0.00% {arc_reclaim_thre}
    7 root      -8    -     0K    88K l2arc_  2   0:52  0.00% {l2arc_feed_threa}

So no excessive cpu for reclaim is present and evict_skip is not
incrementing:

sysctl kstat.zfs.misc.arcstats.evict_skip
kstat.zfs.misc.arcstats.evict_skip: 235572240
sleep 60
sysctl kstat.zfs.misc.arcstats.evict_skip
kstat.zfs.misc.arcstats.evict_skip: 235572240

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 14:22:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E1C82106564A
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 14:22:13 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 3E3298FC15
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 14:22:12 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA04726;
	Wed, 27 Jul 2011 17:22:09 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E301F10.6060708@FreeBSD.org>
Date: Wed, 27 Jul 2011 17:22:08 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
	<3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
	<4E301C55.7090105@FreeBSD.org>
	<5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk>
In-Reply-To: <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 14:22:14 -0000

on 27/07/2011 17:18 Steven Hartland said the following:
> 
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
> To: "Steven Hartland" <killing@multiplay.co.uk>
> Cc: <freebsd-fs@FreeBSD.org>
> Sent: Wednesday, July 27, 2011 3:10 PM
> Subject: Re: zfs process hang on pool access
> 
> 
>> on 27/07/2011 16:55 Steven Hartland said the following:
>>> Apparently it seems to affect machines
>>> with high uptimes, if thats of help. This machine shows:-
>>>
>>> uptime
>>> 1:55PM  up 112 days,  3:19, 1 user, load averages: 0.00, 0.00, 0.00
>>
>> Just a guess, perhaps it's another manifestation of this issue:
>> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html
> 
> Not sure looking at that thread and comparing to the machine:-

I meant the same root cause, not the same symptoms, of course.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 14:32:35 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 201C31065670;
	Wed, 27 Jul 2011 14:32:35 +0000 (UTC)
	(envelope-from prvs=11896da94d=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 6E32E8FC0C;
	Wed, 27 Jul 2011 14:32:34 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:32:02 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:32:02 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014342226.msg;
	Wed, 27 Jul 2011 15:32:02 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@FreeBSD.org>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
	<3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
	<4E301C55.7090105@FreeBSD.org>
	<5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk>
	<4E301F10.6060708@FreeBSD.org>
Date: Wed, 27 Jul 2011 15:32:41 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 14:32:35 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>
>>> Just a guess, perhaps it's another manifestation of this issue:
>>> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html
>> 
>> Not sure looking at that thread and comparing to the machine:-
> 
> I meant the same root cause, not the same symptoms, of course.

Ahh, is there anyway to confirm that before I reboot, or any other
information we could glean that might be useful?

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 14:34:49 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67770106564A
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 14:34:49 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id A60438FC0A
	for <freebsd-fs@FreeBSD.org>; Wed, 27 Jul 2011 14:34:48 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05040;
	Wed, 27 Jul 2011 17:34:44 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E302204.2030009@FreeBSD.org>
Date: Wed, 27 Jul 2011 17:34:44 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
	<3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
	<4E301C55.7090105@FreeBSD.org>
	<5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk>
	<4E301F10.6060708@FreeBSD.org>
	<63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk>
In-Reply-To: <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 14:34:49 -0000

on 27/07/2011 17:32 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org>
>>>> Just a guess, perhaps it's another manifestation of this issue:
>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html
>>>
>>> Not sure looking at that thread and comparing to the machine:-
>>
>> I meant the same root cause, not the same symptoms, of course.
> 
> Ahh, is there anyway to confirm that before I reboot, or any other
> information we could glean that might be useful?

No quick ideas, unfortunately.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 17:56:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3CD781065673;
	Wed, 27 Jul 2011 17:56:10 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 70FB08FC12;
	Wed, 27 Jul 2011 17:56:09 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: As4AAPpPME6DaFvO/2dsb2JhbAA1AQEFKQRGEh0OCgICDQceAhYSPwcXhFaTLJA/uWyRSIErgXuCC4EPBJJ1iDOBOIcT
X-IronPort-AV: E=Sophos;i="4.67,277,1309752000"; d="scan'208";a="132455253"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 27 Jul 2011 13:56:08 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 57F7BB40F7;
	Wed, 27 Jul 2011 13:56:08 -0400 (EDT)
Date: Wed, 27 Jul 2011 13:56:08 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <1847245041.1083168.1311789368340.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201107270840.57104.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: Attilio Rao <attilio@freebsd.org>, freebsd-fs@freebsd.org
Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 17:56:10 -0000

John Baldwin wrote:
> On Tuesday, July 26, 2011 9:56:37 pm Attilio Rao wrote:
> > 2011/7/26 Kostik Belousov <kostikbel@gmail.com>:
> > > On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote:
> > >> Kostik Belousov wrote:
> > >> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote:
> > >> > > Hi,
> > >> > >
> > >> > > Currently both NFS servers set the vnode lock LK_SHARED
> > >> > > and so do the local syscalls (at least that's how it looks
> > >> > > by inspection?).
> > >> > >
> > >> > > Peter Holm just posted me this panic, where a test for an
> > >> > > exclusive vnode lock fails in msdosfs_readdir().
> > >> > > KDB: stack backtrace:
> > >> > >
> db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...)
> > >> > > at db_trace_self_wrapper+0x26
> > >> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...)
> > >> > > at
> > >> > > kdb_backtrace+0x2a
> > >> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at
> > >> > > vfs_badlock+0x23
> > >> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at
> > >> > > assert_vop_elocked+0x55
> > >> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at
> > >> > > pcbmap+0x45
> > >> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at
> > >> > > msdosfs_readdir+0x528
> > >> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at
> > >> > > VOP_READDIR_APV+0xc5
> > >> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at
> > >> > > nfsrvd_readdir+0x38e
> > >> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at
> > >> > > nfsrvd_dorpc+0x1f79
> > >> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at
> > >> > > nfssvc_program+0x40f
> > >> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...)
> > >> > > at svc_run_internal+0x952
> > >> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...)
> > >> > > at
> > >> > > svc_thread_start+0x10
> > >> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8
> > >> > > fork_trampoline() at fork_trampoline+0x8
> > >> > > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 ---
> > >> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be
> > >> > > KDB: enter: lock violation
> > >> > >
> > >> > > So, does anyone know if the msdosfs_readdir() really requires
> > >> > > a
> > >> > > LK_EXCLUSIVE
> > >> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in
> > >> > > pcbmap()?
> > >> >
> > >> > Yes, msdosfs currently requires all vnode locks to be
> > >> > exclusive. One
> > >> > of
> > >> > the reasons is that each denode (the msdosfs-private vnode
> > >> > data)
> > >> > carries
> > >> > the fat entries cache, and this cache is updated even by the
> > >> > operations
> > >> > that do not modify vnode from the VFS POV.
> > >> >
> > >> > The locking regime is enforced by the getnewvnode()
> > >> > initializing the
> > >> > vnode
> > >> > lock with LK_NOSHARE flag, and msdosfs code not calling
> > >> > VN_LOCK_ASHARE()
> > >> > on the newly instantiated vnode.
> > >> >
> > >> > My question is, was the vnode in question locked at all ?
> > >> I think the problem is that I do a LK_DOWNGRADE. From a quick
> > >> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a
> > >> LK_DOWNGRADE.
> > >>
> > >> Maybe __lockmgr_args() should have something like:
> > >>    if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags &
> > >>    LK_NOSHARE))
> > >>         return (0); /* noop */
> > >> after the
> > >>    if (op == LK_SHARED && (lk->lock_object.lo_flags &
> > >>    LK_NOSHARE))
> > >>         op = LK_EXCLUSIVE;
> > >> lines?
> > > The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade,
> > > but I agree with the essence of your proposal.
> >
> > As long as the difference in semantic with the old lockmgr is
> > correctly stressed out in the doc (and eventually comments) I'm fine
> > with this change.
> 
> I think it is a bug in the LK_NOSHARE implementation if the old
> lockmgr()
> didn't silently nop downgrade requests when LK_NOSHARE was set. :) We
> should definitely fix it to ignore downgrades for LK_NOSHARE.
> 
By the way, I think that __lockmgr_args() in -current doesn't check for
LK_NOSHARE. That was what pho@ was testing when he found the problem.

At this point, I believe that the new NFS server (which I have a patch for
that pho@ is testing to avoid LK_DOWNGRADE) is the only place that is
broken. (compute_cn_lkflags() only sets LK_SHARED if MNT_LOOKUP_SHARED
is set and the only LK_DOWNGRADE I see is in vfs_cache.c when
cn_lkflags == LK_SHARED. The rest are in file systems that handle LK_SHARED
locked vnodes, from what I can see at a glance.)

So, it isn't a difference between old/current behaviour, just a suggestion
that adding a check in __lockmgr_args() might be a nice safety belt for
the future, since __lockargs_mgr() already checks for the LK_SHARED case.

rick, who will get the fix for the new NFS server to re@ soon


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 20:41:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A06C91065672
	for <freebsd-fs@freebsd.org>; Wed, 27 Jul 2011 20:41:43 +0000 (UTC)
	(envelope-from dpd@bitgravity.com)
Received: from mail1.sjc1.bitgravity.com (mail1.sjc1.bitgravity.com
	[209.131.97.19])
	by mx1.freebsd.org (Postfix) with ESMTP id 81A228FC1D
	for <freebsd-fs@freebsd.org>; Wed, 27 Jul 2011 20:41:43 +0000 (UTC)
Received: from mail-pz0-f52.google.com ([209.85.210.52])
	by mail1.sjc1.bitgravity.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.69 (FreeBSD)) (envelope-from <dpd@bitgravity.com>)
	id 1QmAvH-000JWN-57; Wed, 27 Jul 2011 13:41:43 -0700
Received: by pzd13 with SMTP id 13so2863640pzd.25
	for <multiple recipients>; Wed, 27 Jul 2011 13:41:37 -0700 (PDT)
Received: by 10.68.40.131 with SMTP id x3mr424521pbk.128.1311799296984;
	Wed, 27 Jul 2011 13:41:36 -0700 (PDT)
Received: from netops-153.sfo1.bitgravity.com (netops-153.sfo1.bitgravity.com
	[209.131.110.153])
	by mx.google.com with ESMTPS id m7sm217166pbk.70.2011.07.27.13.41.35
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 27 Jul 2011 13:41:36 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: David P Discher <dpd@bitgravity.com>
In-Reply-To: <4E302204.2030009@FreeBSD.org>
Date: Wed, 27 Jul 2011 13:41:34 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com>
References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk>
	<0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
	<4E3013DF.10803@FreeBSD.org>
	<3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk>
	<4E301C55.7090105@FreeBSD.org>
	<5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk>
	<4E301F10.6060708@FreeBSD.org>
	<63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk>
	<4E302204.2030009@FreeBSD.org>
To: Steven Hartland <killing@multiplay.co.uk>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@FreeBSD.org, Andriy Gapon <avg@freebsd.org>
Subject: Re: zfs process hang on pool access
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 20:41:43 -0000

The way I found this was breaking into the debugger, do some back =
traces, continue, break in again, do some more back traces on the hung =
processes ... see what is going on, then walk through the code.=20

Then what I had specific loops and code locations, asking the higher =
powers of the freebsd kernel world.

Of course, I had the high cpu and was peaking at the arc_reclaim_thread.=20=


I've seen this nearly like clockwork in production at 106-107 days. If =
it goes on too much longer than that, then things deadlock.=20

But 112 days, and 8.2 ... you for sure have the LBOLT overflow.=20

Otherwise, reboot and patch.  However, I have not fully vetted the patch =
under heavily load, and currently seeing another deadlock issue with =
8.1+ zfs v14 - but seemly durning writes after 6-40 hours.  Still =
investigating.=20

Note, my proposal of "time_uptime" doesn't work - as it causes a =
buildworld error in zfs userland tools.

This is what I'm currently running to fix the 26 day issue with l2arc =
feeder and arc_reclaim_thread with LBOLT in 8.1.=20


Index: sys/cddl/compat/opensolaris/sys/time.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/cddl/compat/opensolaris/sys/time.h      (.../8.1-BGOS-20110105) =
(revision 3322)
+++ sys/cddl/compat/opensolaris/sys/time.h      (.../8.1-BGOS-20110613) =
(working copy)
@@ -38,7 +38,7 @@
=20
 typedef longlong_t     hrtime_t;
=20
-#define        LBOLT   ((gethrtime() * hz) / NANOSEC)
+#define        LBOLT   (gethrtime() * (NANOSEC/hz))
=20
 #if defined(__i386__) || defined(__powerpc__)
 #define        TIMESPEC_OVERFLOW(ts)                                    =
       \

Index: sys/cddl/compat/opensolaris/sys/types.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/cddl/compat/opensolaris/sys/types.h     (.../8.1-BGOS-20110105) =
(revision 3322)
+++ sys/cddl/compat/opensolaris/sys/types.h     (.../8.1-BGOS-20110613) =
(working copy)
@@ -34,6 +34,12 @@
  */
=20
 #include <sys/stdint.h>
+
+#ifdef _KERNEL
+typedef        int64_t         clock_t;
+#define        _CLOCK_T_DECLARED
+#endif
+
 #include_next <sys/types.h>
=20
 #define        MAXNAMELEN      256


---
David P. Discher
dpd@bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com

On Jul 27, 2011, at 7:34 AM, Andriy Gapon wrote:

>> Ahh, is there anyway to confirm that before I reboot, or any other
>> information we could glean that might be useful?
>=20
> No quick ideas, unfortunately.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 22:39:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A6D33106566C
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 22:39:44 +0000 (UTC)
	(envelope-from prvs=11896da94d=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 322758FC0A
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 22:39:43 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 23:39:11 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 23:39:11 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014349253.msg
	for <freebsd-fs@FreeBSD.ORG>; Wed, 27 Jul 2011 23:39:10 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG
Message-ID: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: <freebsd-fs@FreeBSD.ORG>
Date: Wed, 27 Jul 2011 23:39:46 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: 
Subject: Questions about erasing an ssd to restore performance under FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 22:39:44 -0000

There seems to be loads of info about this but nothing concrete so
I'm hoping someone here can answer some questions:-

1. Does newfs -E work on all controllers or only in combination
with ahci ada driver? In our case the drivers are off an LSI controller
using the mpt driver

mpt0: <LSILogic SAS/SATA Adapter> port 0xfc00-0xfcff mem 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci2
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.18.0
mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
mpt0: 0 Active Volumes (2 Max)
mpt0: 0 Hidden Drive Members (14 Max)

2. If newfs -E doesn't work, which I suspect is the case, is using
something like partedmagic boot cd and the secure erase app in that
still an option or is that again thwarted by the LSI controller?

3. If neither #1 or #2 work is there an alternative which will
without taking the drive out of the machine putting it in something
which supports ada and running one of the above on that machine?

My current testing seems to indicate neither #1 or #2 work in this
case as write performance on Corsair SSD is still terrible after
both. If #1 does require ata then it would be good to note this in
the man page for newfs which currently indicates it will just work.

da1 at mpt0 bus 0 scbus0 target 1 lun 0
da1: <ATA Corsair CSSD-F60 2.0> Fixed Direct Access SCSI-5 device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 57241MB (117231408 512 byte sectors: 255H 63S/T 7297C)

By terrible I mean under 20MB/s sequential write speed where as a
new drive in a similar machine is showing closer to 200MB/s write

oldssd# dd if=/data/test of=/ssd/test bs=1m
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 60.430616 secs (17351734 bytes/sec)

newssd# dd if=/data/test of=/ssd/test bs=1m 
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 0.555287 secs (1888349211 bytes/sec)

In both tests /data/test was just created from /dev/random onto
a standard HD but is still in ARC so read speed is very high, hence
not the limiting factor.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 23:50:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A15921065670;
	Wed, 27 Jul 2011 23:50:02 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 70F238FC13;
	Wed, 27 Jul 2011 23:50:02 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6RNo217080334;
	Wed, 27 Jul 2011 23:50:02 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6RNo2rE080330;
	Wed, 27 Jul 2011 23:50:02 GMT (envelope-from linimon)
Date: Wed, 27 Jul 2011 23:50:02 GMT
Message-Id: <201107272350.p6RNo2rE080330@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/159232: [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite
	into ext2_vnops
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 23:50:02 -0000

Old Synopsis: fs/ext2fs: merge ext2_readwrite into ext2_vnops
New Synopsis: [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into ext2_vnops

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed Jul 27 23:49:41 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=159232

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 27 23:50:35 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0A294106568A;
	Wed, 27 Jul 2011 23:50:35 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id D66978FC13;
	Wed, 27 Jul 2011 23:50:34 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6RNoY9p083051;
	Wed, 27 Jul 2011 23:50:34 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6RNoY2C083039;
	Wed, 27 Jul 2011 23:50:34 GMT (envelope-from linimon)
Date: Wed, 27 Jul 2011 23:50:34 GMT
Message-Id: <201107272350.p6RNoY2C083039@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/159233: [ext2fs] [patch] fs/ext2fs: finish reallocblk
	implementation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2011 23:50:35 -0000

Old Synopsis: fs/ext2fs: finish reallocblk implementation
New Synopsis: [ext2fs] [patch] fs/ext2fs: finish reallocblk implementation

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed Jul 27 23:50:21 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=159233

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 01:24:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8CCE01065673
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 01:24:59 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta03.westchester.pa.mail.comcast.net
	(qmta03.westchester.pa.mail.comcast.net [76.96.62.32])
	by mx1.freebsd.org (Postfix) with ESMTP id 4D3ED8FC08
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 01:24:59 +0000 (UTC)
Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89])
	by qmta03.westchester.pa.mail.comcast.net with comcast
	id DDPt1h0011vXlb853DQzih; Thu, 28 Jul 2011 01:24:59 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta17.westchester.pa.mail.comcast.net with comcast
	id DDQf1h00C1t3BNj3dDQg9Y; Thu, 28 Jul 2011 01:24:41 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 91798102C36; Wed, 27 Jul 2011 18:24:37 -0700 (PDT)
Date: Wed, 27 Jul 2011 18:24:37 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <20110728012437.GA23430@icarus.home.lan>
References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Questions about erasing an ssd to restore performance under
 FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 01:24:59 -0000

On Wed, Jul 27, 2011 at 11:39:46PM +0100, Steven Hartland wrote:
> There seems to be loads of info about this but nothing concrete so
> I'm hoping someone here can answer some questions:-
> 
> 1. Does newfs -E work on all controllers or only in combination
> with ahci ada driver? In our case the drivers are off an LSI controller
> using the mpt driver
> 
> mpt0: <LSILogic SAS/SATA Adapter> port 0xfc00-0xfcff mem 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci2
> mpt0: [ITHREAD]
> mpt0: MPI Version=1.5.18.0
> mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
> mpt0: 0 Active Volumes (2 Max)
> mpt0: 0 Hidden Drive Members (14 Max)
> 
> 2. If newfs -E doesn't work, which I suspect is the case, is using
> something like partedmagic boot cd and the secure erase app in that
> still an option or is that again thwarted by the LSI controller?

newfs -E is not the same thing as "Secure Erase" (issuing SECURE ERASE
UNIT ATA command per ATA security data set spec).  newfs -E does exactly
what the man page says it does: it writes zeros over every LBA on the
disk (but it does so in blocks, not on a literal per-LBA basis; e.g. it
does not write 512 bytes (LBA size) of zeros to LBA 0, then 512 bytes of
zeros to LBA 1, etc. -- it does so in larger chunks).

The important thing to take away from this is that the FTL will not be
reset to its factory-default configuration when erasing in this fashion.

> 3. If neither #1 or #2 work is there an alternative which will
> without taking the drive out of the machine putting it in something
> which supports ada and running one of the above on that machine?
> 
> My current testing seems to indicate neither #1 or #2 work in this
> case as write performance on Corsair SSD is still terrible after
> both. If #1 does require ata then it would be good to note this in
> the man page for newfs which currently indicates it will just work.
> 
> da1 at mpt0 bus 0 scbus0 target 1 lun 0
> da1: <ATA Corsair CSSD-F60 2.0> Fixed Direct Access SCSI-5 device
> da1: 300.000MB/s transfers
> da1: Command Queueing enabled
> da1: 57241MB (117231408 512 byte sectors: 255H 63S/T 7297C)
> 
> By terrible I mean under 20MB/s sequential write speed where as a
> new drive in a similar machine is showing closer to 200MB/s write
> 
> oldssd# dd if=/data/test of=/ssd/test bs=1m
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 60.430616 secs (17351734 bytes/sec)

There are many factors to consider with SSDs when write speeds plummet.

The biggest and most noticeable is how much free space is available on
the drive itself.  The less free space available, the worse wear
levelling performs.  I just got done dealing with a person on the Intel
Community Forums who complained of shoddy write performance, where lots
of "techs" completely ignored the fact that his drive was showing 90%
full (only 7GB left).

Is the /ssd partition actually aligned properly?  I want to assume it's
UFS, not ZFS, given your earlier questions, but is the partition aligned
to a 8KByte boundary?  (Most consumers tend to start their partitions at
the 1MByte mark, but this is a bit overkill; I don't know what Corsair
uses for NAND cell size nor erase page size, but with Intel the drives
use 8KByte cells).

Also, PRIOR to performing these tests, did you tunefs -t enable the
filesystem?  It matters; TRIM is a much nicer way to ensure the drive
restores itself to performance when LBAs on the drive become unused by
the filesystem (rather than letting the internal drive GC "figure it
out" as best as it can, it's always best to just tell the drive up front
with TRIM what's no longer used.  Saves the FTL extra work)

> newssd# dd if=/data/test of=/ssd/test bs=1m 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 0.555287 secs (1888349211 bytes/sec)
> 
> In both tests /data/test was just created from /dev/random onto
> a standard HD but is still in ARC so read speed is very high, hence
> not the limiting factor.

Is there some reason your tests couldn't just use /dev/urandom directly
to absolutely positively rule out read I/O (from if=) being a potential
limiting factor?  I absolutely believe you, but just sayin'...

Worth reading is this whitepaper, by the way.

http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf

By the way, your above dd is the first time I've seen an SSD write
1.8GBytes in 0.5 seconds.  Though I cannot rely entirely on benchmark
reviews, the one I just skimmed indicated a fresh drive of your model
tends to write, sequentially, at about 60MBytes/sec..

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 09:09:50 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AC6B5106566C
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 09:09:50 +0000 (UTC)
	(envelope-from mavbsd@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 3BD0E8FC0C
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 09:09:50 +0000 (UTC)
Received: by fxe4 with SMTP id 4so1402843fxe.13
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 02:09:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:x-enigmail-version:content-type
	:content-transfer-encoding;
	bh=pQT8aa34JxgFKqhcpbbBYTsV/8/tXzw/cuPEX8ACQ2Y=;
	b=bM0pUAg6F23Le5oQu2tisbQYvcjgm61me//vv5uJ3BQ7QziDNpp8fiCdriXKo+Kx3S
	839oTw63Lk2G5VYrpR7Y4qJz1/mDPQHZiW+uxXJp6TdZyTbC+GBSF19QJwcaz+BY3GOT
	NH4xUf0YrYPrYAEuG6VB+Yg+hp6oWAslX38Y8=
Received: by 10.204.32.201 with SMTP id e9mr235937bkd.392.1311844189109;
	Thu, 28 Jul 2011 02:09:49 -0700 (PDT)
Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226])
	by mx.google.com with ESMTPS id sz1sm205187bkb.58.2011.07.28.02.09.46
	(version=SSLv3 cipher=OTHER); Thu, 28 Jul 2011 02:09:47 -0700 (PDT)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <4E312747.3020009@FreeBSD.org>
Date: Thu, 28 Jul 2011 12:09:27 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.23 (X11/20091212)
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <mailpost.1311806406.6677649.16811.mailing.freebsd.fs@FreeBSD.cs.nctu.edu.tw>
In-Reply-To: <mailpost.1311806406.6677649.16811.mailing.freebsd.fs@FreeBSD.cs.nctu.edu.tw>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: Questions about erasing an ssd to restore performance under
	FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 09:09:50 -0000

Steven Hartland wrote:
> There seems to be loads of info about this but nothing concrete so
> I'm hoping someone here can answer some questions:-
> 
> 1. Does newfs -E work on all controllers or only in combination
> with ahci ada driver? In our case the drivers are off an LSI controller
> using the mpt driver
> 
> mpt0: <LSILogic SAS/SATA Adapter> port 0xfc00-0xfcff mem
> 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci2
> mpt0: [ITHREAD]
> mpt0: MPI Version=1.5.18.0
> mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
> mpt0: 0 Active Volumes (2 Max)
> mpt0: 0 Hidden Drive Members (14 Max)

`newfs -E` depends on disk driver's support for BIO_DELETE request. For
now, AFAIR it is supported at least by ada, mmcsd, some cases of ad and
few other cases. da driver doesn't support it now. Also, except da
driver, TRIM command should be supported by the controller firmware,
that implements SCSI<->ATA protocol translation, and AFAIK often it
isn't so.

> 2. If newfs -E doesn't work, which I suspect is the case, is using
> something like partedmagic boot cd and the secure erase app in that
> still an option or is that again thwarted by the LSI controller?

Secure erase for the whole disk can be done using special ATA commands,
unrelated to TRIM, but with the same end result. I have no idea if those
commands have SCSI alternatives, but if so and they are used by
mentioned software, there is a chance that controller firmware support them.

-- 
Alexander Motin

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 09:20:15 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3E307106564A
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 09:20:15 +0000 (UTC)
	(envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id BED7B8FC17
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 09:20:14 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:09:26 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:09:25 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014354658.msg
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 10:09:24 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG
Message-ID: <FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Jeremy Chadwick" <freebsd@jdc.parodius.com>
References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
	<20110728012437.GA23430@icarus.home.lan>
Date: Thu, 28 Jul 2011 10:10:03 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Questions about erasing an ssd to restore performance under
	FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 09:20:15 -0000

----- Original Message ----- 
From: "Jeremy Chadwick" <freebsd@jdc.parodius.com>

> newfs -E is not the same thing as "Secure Erase" (issuing SECURE ERASE
> UNIT ATA command per ATA security data set spec).  newfs -E does exactly
> what the man page says it does: it writes zeros over every LBA on the
> disk (but it does so in blocks, not on a literal per-LBA basis; e.g. it
> does not write 512 bytes (LBA size) of zeros to LBA 0, then 512 bytes of
> zeros to LBA 1, etc. -- it does so in larger chunks).
> 
> The important thing to take away from this is that the FTL will not be
> reset to its factory-default configuration when erasing in this fashion.

It was my impression this was combined with a BIO_DELETE, which is
the key part I thought, but that seems to only be supported under ada?

> There are many factors to consider with SSDs when write speeds plummet.
> 
> The biggest and most noticeable is how much free space is available on
> the drive itself.  The less free space available, the worse wear
> levelling performs.  I just got done dealing with a person on the Intel
> Community Forums who complained of shoddy write performance, where lots
> of "techs" completely ignored the fact that his drive was showing 90%
> full (only 7GB left).

Not the case here, the drive has over 60% free space, but a large move of
data from one volume to many smaller volumes had just taken place. Still
suprisingly large drop, over 10x slower that it was.

> Is the /ssd partition actually aligned properly?  I want to assume it's
> UFS, not ZFS, given your earlier questions, but is the partition aligned
> to a 8KByte boundary?  (Most consumers tend to start their partitions at
> the 1MByte mark, but this is a bit overkill; I don't know what Corsair
> uses for NAND cell size nor erase page size, but with Intel the drives
> use 8KByte cells).

No its ZFS, and not exhibiting performance problems in the initial 6 months
or so alignment is not the issue in this case, only after the data move
yesterday did the lower performance get noticed.

> 
> Also, PRIOR to performing these tests, did you tunefs -t enable the
> filesystem?  It matters; TRIM is a much nicer way to ensure the drive
> restores itself to performance when LBAs on the drive become unused by
> the filesystem (rather than letting the internal drive GC "figure it
> out" as best as it can, it's always best to just tell the drive up front
> with TRIM what's no longer used.  Saves the FTL extra work)

ZFS so no TRIM support :(

> Is there some reason your tests couldn't just use /dev/urandom directly
> to absolutely positively rule out read I/O (from if=) being a potential
> limiting factor?  I absolutely believe you, but just sayin'...

Didn't realise there was a /dev/urandom, but /dev/random was very much
limited, which reading the man page makes sense now, something to remember
for next time :)

> Worth reading is this whitepaper, by the way.
> 
> http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf
> 
> By the way, your above dd is the first time I've seen an SSD write
> 1.8GBytes in 0.5 seconds.  Though I cannot rely entirely on benchmark
> reviews, the one I just skimmed indicated a fresh drive of your model
> tends to write, sequentially, at about 60MBytes/sec..

Hmm, I must have copied the wrong results there some where, here's
the correct one which shows 180MB/s, which is still lower than the spec's
285MB/s but its random data so not benefiting as much as it can from the
compression on the sandforce controller, most defintielty not 1.8GB/s ;-)

dd if=/data/test of=/ssd/test bs=1m         
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 5.542815 secs (189177506 bytes/sec)

As an update I've manged to get the drive back to full performance using
Parted Magic boot cd, but using the manual process shown on the following
page "instead" of using Disk Erase utility. Not sure why this didnt work
yet.
https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

Obviously having to boot to an alternative OS is far from ideal, so could
really do with a BSD solution that has the ability to secure erase the disk,
to restore performance, given the lack of TRIM in ZFS.

Is this something that could be added to camcontrol or may be its already
possible with "camcontrol cmd"?

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 09:26:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 44A67106566B;
	Thu, 28 Jul 2011 09:26:14 +0000 (UTC)
	(envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 957448FC16;
	Thu, 28 Jul 2011 09:26:13 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:25:42 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:25:41 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014354790.msg;
	Thu, 28 Jul 2011 10:25:40 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <1E487B0F985745459272F052426964C7@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Alexander Motin" <mav@FreeBSD.org>
References: <mailpost.1311806406.6677649.16811.mailing.freebsd.fs@FreeBSD.cs.nctu.edu.tw>
	<4E312747.3020009@FreeBSD.org>
Date: Thu, 28 Jul 2011 10:26:19 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@freebsd.org
Subject: Re: Questions about erasing an ssd to restore performance under
	FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 09:26:14 -0000

----- Original Message ----- 
From: "Alexander Motin" <mav@FreeBSD.org>
> `newfs -E` depends on disk driver's support for BIO_DELETE request. For
> now, AFAIR it is supported at least by ada, mmcsd, some cases of ad and
> few other cases. da driver doesn't support it now. Also, except da
> driver, TRIM command should be supported by the controller firmware,
> that implements SCSI<->ATA protocol translation, and AFAIK often it
> isn't so.

That's what I thought might be the case, would be good to mention this in
the man page as atm its very misleading.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 10:32:39 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BE26A106566B
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 10:32:39 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta06.westchester.pa.mail.comcast.net
	(qmta06.westchester.pa.mail.comcast.net [76.96.62.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 6A39A8FC1F
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 10:32:38 +0000 (UTC)
Received: from omta04.westchester.pa.mail.comcast.net ([76.96.62.35])
	by qmta06.westchester.pa.mail.comcast.net with comcast
	id DNWl1h0020ldTLk56NYeyx; Thu, 28 Jul 2011 10:32:38 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta04.westchester.pa.mail.comcast.net with comcast
	id DNYc1h00T1t3BNj3QNYdfL; Thu, 28 Jul 2011 10:32:38 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 02132102C36; Thu, 28 Jul 2011 03:32:35 -0700 (PDT)
Date: Thu, 28 Jul 2011 03:32:34 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <20110728103234.GA33275@icarus.home.lan>
References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
	<20110728012437.GA23430@icarus.home.lan>
	<FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Questions about erasing an ssd to restore performance under
 FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 10:32:39 -0000

On Thu, Jul 28, 2011 at 10:10:03AM +0100, Steven Hartland wrote:
> ----- Original Message ----- From: "Jeremy Chadwick"
> <freebsd@jdc.parodius.com>
>
>> [snipping parts about BIO_DELETE and details pertaining to ZFS,
>> hoping TRIM support gets added eventually, or possibly through GEOM
>> directly someday...]
> 
> Didn't realise there was a /dev/urandom, but /dev/random was very much
> limited, which reading the man page makes sense now, something to remember
> for next time :)

Well, on FreeBSD /dev/urandom is a symlink to /dev/random.  I've
discussed in the past why I use /dev/urandom instead of /dev/random (I
happen to work in a heterogeneous OS environment at work, where urandom
and random are different things).

I was mainly curious why you were using if=/some/actual/file rather than
if=/dev/urandom directly.  'tis okay, not of much importance.

> >Worth reading is this whitepaper, by the way.
> >
> >http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf
> >
> >By the way, your above dd is the first time I've seen an SSD write
> >1.8GBytes in 0.5 seconds.  Though I cannot rely entirely on benchmark
> >reviews, the one I just skimmed indicated a fresh drive of your model
> >tends to write, sequentially, at about 60MBytes/sec..
> 
> Hmm, I must have copied the wrong results there some where, here's
> the correct one which shows 180MB/s, which is still lower than the spec's
> 285MB/s but its random data so not benefiting as much as it can from the
> compression on the sandforce controller, most defintielty not 1.8GB/s ;-)
> 
> dd if=/data/test of=/ssd/test bs=1m         1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 5.542815 secs (189177506 bytes/sec)
> 
> As an update I've manged to get the drive back to full performance using
> Parted Magic boot cd, but using the manual process shown on the following
> page "instead" of using Disk Erase utility. Not sure why this didnt work
> yet.
> https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

Okay, so it sounds like what happened -- if I understand correctly -- is
that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of
data copied to it.  It still had 60% free space available.  After, the
SSD performance for writes really plummeted (~20MByte/sec), but reads
were still decent.  Performing an actual ATA-level secure erase brought
the drive back to normal write performance (~190MByte/sec).

If all of that is correct, then I would say the issue is that the
internal GC on the Corsair SSD in question sucks.  With 60% of the drive
still available, performance should not have dropped to such an abysmal
rate; the FTL and wear levelling should have, ideally, dealt with this
just fine.  But it didn't.

Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever,
that's an engineering discussion for elsewhere) lacks TRIM.  The
underlying filesystem is therefore unable to tell the drive "hey, these
LBAs aren't used any more, you can consider them free and perform a NAND
page erase when an entire NAND page is unused".  The FTL has to track
all LBAs you've written to, otherwise if erasing a NAND page which still
had used data in it (for the filesystem) it would result in loss of
data.

So in summary I'm not too surprised by this situation happening, but I
*AM* surprised at just how horrible writes became for you.  The white
paper I linked you goes over this to some degree -- it talks about how
everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks
or talks about how horrible they perform when very little free space is
available, or if the GC is badly implemented.  Maybe Corsair's GC is
badly implemented -- I don't know.

I would see if there are any F/W updates for that model of drive.  The
firmware controls the GC model/method.  Otherwise, if this issue is
reproducible, I'll add this model of Corsair SSD to my list of drives to
avoid.

> Obviously having to boot to an alternative OS is far from ideal, so could
> really do with a BSD solution that has the ability to secure erase the disk,
> to restore performance, given the lack of TRIM in ZFS.
> 
> Is this something that could be added to camcontrol or may be its already
> possible with "camcontrol cmd"?

Is it possible to accomplish Secure Erase via "camcontrol cmd" with
ada(4)?  Yes, but the procedure will be extremely painful, drawn out,
and very error-prone.

Given that you've followed the procedure on the Linux hdparm/ATA Secure
Erase web page, you're aware of the security and "locked" status one has
to deal with using password-protection to accomplish the erase.  hdparm
makes this easy because it's just a bunch of command-line flags; the
""heavy lifting"" on the ATA layer is done elsewhere.  With "camcontrol
cmd", you get to submit the raw ATA CDB yourself, multiple times, at
different phases.  Just how familiar with the ATA protocol are you?  :-)

Why I sound paranoid: a typo could potentially "brick" your drive.  If
you issue a set-password on the drive, ***ALL*** LBA accesses (read and
write) return I/O errors from that point forward.  Make a typo in the
password, formulate the CDB wrong, whatever -- suddenly you have a drive
that you can't access or use any more because the password was wrong,
etc...  If the user doesn't truly understand what they're doing
(including the formulation of the CDB), then they're going to panic.

camcontrol and atacontrol could both be modified to do the heavy
lifting, making similar options/arguments that would mimic hdparm in
operation.  This would greatly diminish the risks, but the *EXACT
PROCEDURE* would need to be explained in the man page.  But keep reading
for why that may not be enough.

I've been in the situation where I've gone through the procedure you
followed on said web page, only to run into a quirk with the ATA/IDE
subsystem on Windows XP, requiring a power-cycle of the system.  The
secure erase finished, but I was panicking when I saw the drive spitting
out I/O errors on every LBA.  I realised that I needed to unlock the
drive using --security-unlock then disable security by using
--security-disable.  Once I did that it was fine.  The web page omits
that part, in the case of emergency or anomalies are witnessed.  This
ordeal happened to me today, no joke, while tinkering with my new Intel
510 SSD.  So here's a better page:

http://tinyapps.org/docs/wipe_drives_hdparm.html

Why am I pointing this out?  Because, in effect, an entire "HOW TO DO
THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be
added to camcontrol/atacontrol to ensure people don't end up with
"bricked" drives and blame FreeBSD.  Trust me, it will happen.  Give
users tools to shoot themselves in the foot and they will do so.

Furthermore, SCSI drives (which is what camcontrol has historically been
for up until recently) have a completely different secure erase CDB
command for them.  ATA has SECURITY ERASE UNIT, SCSI has SECURITY
INITIALIZE -- and in the SCSI realm, this feature is optional!  So
there's that error-prone issue as well.  Do you know how many times I've
issued "camcontrol inquiry" instead of "camcontrol identify" on my
ada(4)-based systems?  Too many.  Food for thought.  :-)

Anyway, this is probably the only time you will ever find me saying
this, but: if improving camcontrol/atacontrol to accomplish the above is
what you want, patches are welcome.  I could try to spend some time on
this if there is great interest in the community for such (I'm more
familiar with atacontrol's code given my SMART work in the past), and I
do have an unused Intel 320-series SSD which I can test with.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 11:32:54 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B10E8106564A
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 11:32:54 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 7179A8FC16
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 11:32:54 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-fs@m.gmane.org>) id 1QmOph-00044i-Dh
	for freebsd-fs@freebsd.org; Thu, 28 Jul 2011 13:32:53 +0200
Received: from 52-212.dsl.iskon.hr ([89.164.52.212])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 13:32:53 +0200
Received: from ivoras by 52-212.dsl.iskon.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 13:32:53 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Thu, 28 Jul 2011 13:32:41 +0200
Lines: 11
Message-ID: <j0rhcq$78q$1@dough.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: 52-212.dsl.iskon.hr
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20110624 Thunderbird/5.0
Subject: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 11:32:54 -0000

Grepping for "zil" in sysctls doesn't give anything useful:

# sysctl -a | grep zil
vfs.zfs.zil_replay_disable: 0

(its description is "Disable intent logging replay" so it looks like a
crash recovery option)

... so is there a way to find out if ZIL is enabled?

I can look at kenv but for some reason I can't trust its value right now.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 12:24:03 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 774011065672;
	Thu, 28 Jul 2011 12:24:03 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 7BB238FC0C;
	Thu, 28 Jul 2011 12:24:02 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA22839;
	Thu, 28 Jul 2011 15:24:01 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E3154E0.1030206@FreeBSD.org>
Date: Thu, 28 Jul 2011 15:24:00 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Ivan Voras <ivoras@FreeBSD.org>
References: <j0rhcq$78q$1@dough.gmane.org>
In-Reply-To: <j0rhcq$78q$1@dough.gmane.org>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 12:24:03 -0000

on 28/07/2011 14:32 Ivan Voras said the following:
> Grepping for "zil" in sysctls doesn't give anything useful:
> 
> # sysctl -a | grep zil
> vfs.zfs.zil_replay_disable: 0
> 
> (its description is "Disable intent logging replay" so it looks like a
> crash recovery option)
> 
> ... so is there a way to find out if ZIL is enabled?
> 
> I can look at kenv but for some reason I can't trust its value right now.

Here is a hammer: kgdb.
But perhaps there is a more suitable tool :)

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 12:49:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E89C6106564A;
	Thu, 28 Jul 2011 12:49:28 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 8495F8FC14;
	Thu, 28 Jul 2011 12:49:28 +0000 (UTC)
Received: by yic13 with SMTP id 13so2251290yic.13
	for <multiple recipients>; Thu, 28 Jul 2011 05:49:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type;
	bh=M5WN/R5T0gaVem4blEAcqF4ORKMuqRPlNcvPr8LJWvM=;
	b=V2zAZL77MR3xgfYEmOENxwUCVLmg3Q3Oxljs8sXmN86VVIDNsIXJSfxwqOo9brc+eO
	wCVRRiuTlKeEnWlqWgBvnRrpAn9Fl2GnUqNjsdC3RzUmkea5ZMzmMZ2cfKhQo3/A5Qml
	/sl67+ociHu/EWz05P7W8BRyA3kylNeDfTlNw=
Received: by 10.100.233.21 with SMTP id f21mr734003anh.83.1311857367877; Thu,
	28 Jul 2011 05:49:27 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 05:48:47 -0700 (PDT)
In-Reply-To: <4E3154E0.1030206@FreeBSD.org>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
From: Ivan Voras <ivoras@freebsd.org>
Date: Thu, 28 Jul 2011 14:48:47 +0200
X-Google-Sender-Auth: pOKrl9paXayALItLrjGF-XMzFjQ
Message-ID: <CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 12:49:29 -0000

On 28 July 2011 14:24, Andriy Gapon <avg@freebsd.org> wrote:
> on 28/07/2011 14:32 Ivan Voras said the following:
>> Grepping for "zil" in sysctls doesn't give anything useful:
>>
>> # sysctl -a | grep zil
>> vfs.zfs.zil_replay_disable: 0
>>
>> (its description is "Disable intent logging replay" so it looks like a
>> crash recovery option)
>>
>> ... so is there a way to find out if ZIL is enabled?
>>
>> I can look at kenv but for some reason I can't trust its value right now.
>
> Here is a hammer: kgdb.
> But perhaps there is a more suitable tool :)

Hmmm, no, it looks like the zil_disable code is missing in the latest
8-stable! This confirmes what I noticed in operation and why I didn't
trust kenv.

>From the various csup dates I have on the servers it looks like it's
been removed somewhere between April and now, possibly with ZFS 28
MFC?

I.e. this code is missing:

*:/sys> grep -rn zil_disable *
cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h:382:extern int zil_disable;
cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:897:    if
(zil_disable) {
cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:69:int zil_disable =
0;        /* disable intent logging */
cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:71:TUNABLE_INT("vfs.zfs.zil_disable",
&zil_disable);
cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:72:SYSCTL_INT(_vfs_zfs,
OID_AUTO, zil_disable, CTLFLAG_RW, &zil_disable, 0,
cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:450:          if
(bp->bio_cmd == BIO_FLUSH && !zil_disable)

Any ideas?

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 13:00:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 77762106566B;
	Thu, 28 Jul 2011 13:00:43 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 601D98FC15;
	Thu, 28 Jul 2011 13:00:42 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA23637;
	Thu, 28 Jul 2011 16:00:40 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E315D78.90209@FreeBSD.org>
Date: Thu, 28 Jul 2011 16:00:40 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Ivan Voras <ivoras@FreeBSD.org>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
	<CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
In-Reply-To: <CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, Martin Matuska <mm@FreeBSD.org>
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 13:00:43 -0000

on 28/07/2011 15:48 Ivan Voras said the following:
> On 28 July 2011 14:24, Andriy Gapon <avg@freebsd.org> wrote:
>> on 28/07/2011 14:32 Ivan Voras said the following:
>>> Grepping for "zil" in sysctls doesn't give anything useful:
>>>
>>> # sysctl -a | grep zil
>>> vfs.zfs.zil_replay_disable: 0
>>>
>>> (its description is "Disable intent logging replay" so it looks like a
>>> crash recovery option)
>>>
>>> ... so is there a way to find out if ZIL is enabled?
>>>
>>> I can look at kenv but for some reason I can't trust its value right now.
>>
>> Here is a hammer: kgdb.
>> But perhaps there is a more suitable tool :)
> 
> Hmmm, no, it looks like the zil_disable code is missing in the latest
> 8-stable! This confirmes what I noticed in operation and why I didn't
> trust kenv.
> 
>>From the various csup dates I have on the servers it looks like it's
> been removed somewhere between April and now, possibly with ZFS 28
> MFC?

http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html

> I.e. this code is missing:
> 
> *:/sys> grep -rn zil_disable *
> cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h:382:extern int zil_disable;
> cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:897:    if
> (zil_disable) {
> cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:69:int zil_disable =
> 0;        /* disable intent logging */
> cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:71:TUNABLE_INT("vfs.zfs.zil_disable",
> &zil_disable);
> cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:72:SYSCTL_INT(_vfs_zfs,
> OID_AUTO, zil_disable, CTLFLAG_RW, &zil_disable, 0,
> cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:450:          if
> (bp->bio_cmd == BIO_FLUSH && !zil_disable)
> 
> Any ideas?


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 13:22:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6D8D2106564A
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 13:22:19 +0000 (UTC)
	(envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id DE0218FC12
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 13:22:18 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 14:21:46 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 14:21:46 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014357285.msg
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 14:21:44 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG
Message-ID: <A6828B6CE6764E13A44B1ABF61CF3FED@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Jeremy Chadwick" <freebsd@jdc.parodius.com>
References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
	<20110728012437.GA23430@icarus.home.lan>
	<FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
	<20110728103234.GA33275@icarus.home.lan>
Date: Thu, 28 Jul 2011 14:22:21 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Questions about erasing an ssd to restore performance under
	FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 13:22:19 -0000

----- Original Message ----- 
From: "Jeremy Chadwick" <freebsd@jdc.parodius.com>
> Well, on FreeBSD /dev/urandom is a symlink to /dev/random.  I've
> discussed in the past why I use /dev/urandom instead of /dev/random (I
> happen to work in a heterogeneous OS environment at work, where urandom
> and random are different things).
> 
> I was mainly curious why you were using if=/some/actual/file rather than
> if=/dev/urandom directly.  'tis okay, not of much importance.

/dev/urandom seems to bottle neck at ~60MB/s a cached file generated from
it doesn't e.g.
dd if=/dev/random of=/dev/null bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 16.152686 secs (64916509 bytes/sec)

dd if=/dev/random of=/data/test bs=1m count=1000               
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 16.178811 secs (64811685 bytes/sec)

dd if=/data/test of=/dev/null bs=1m
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 0.240348 secs (4362738865 bytes/sec)

> Okay, so it sounds like what happened -- if I understand correctly -- is
> that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of
> data copied to it.  It still had 60% free space available.  After, the
> SSD performance for writes really plummeted (~20MByte/sec), but reads
> were still decent.  Performing an actual ATA-level secure erase brought
> the drive back to normal write performance (~190MByte/sec).

Yes this is correct.

> If all of that is correct, then I would say the issue is that the
> internal GC on the Corsair SSD in question sucks.  With 60% of the drive
> still available, performance should not have dropped to such an abysmal
> rate; the FTL and wear levelling should have, ideally, dealt with this
> just fine.  But it didn't.

Agreed

> Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever,
> that's an engineering discussion for elsewhere) lacks TRIM.  The
> underlying filesystem is therefore unable to tell the drive "hey, these
> LBAs aren't used any more, you can consider them free and perform a NAND
> page erase when an entire NAND page is unused".  The FTL has to track
> all LBAs you've written to, otherwise if erasing a NAND page which still
> had used data in it (for the filesystem) it would result in loss of
> data.
> 
> So in summary I'm not too surprised by this situation happening, but I
> *AM* surprised at just how horrible writes became for you.  The white
> paper I linked you goes over this to some degree -- it talks about how
> everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks
> or talks about how horrible they perform when very little free space is
> available, or if the GC is badly implemented.  Maybe Corsair's GC is
> badly implemented -- I don't know.

Agreed again, we've seen a few disks now drop to this level of performance
at first we thought the disk was failing, as the newfs -E didn't fix it when
the man page indicates it should. But seems thats explained now, only
works if its ada not da, and also not quite as good as a secure erase.

> I would see if there are any F/W updates for that model of drive.  The
> firmware controls the GC model/method.  Otherwise, if this issue is
> reproducible, I'll add this model of Corsair SSD to my list of drives to
> avoid.

Its the latest firmware version, already checked that. The performance
has been good till now and I suspect it could be a generic sandforce
thing if its a firmware issue.

> Is it possible to accomplish Secure Erase via "camcontrol cmd" with
> ada(4)?  Yes, but the procedure will be extremely painful, drawn out,
> and very error-prone.
> 
> Given that you've followed the procedure on the Linux hdparm/ATA Secure
> Erase web page, you're aware of the security and "locked" status one has
> to deal with using password-protection to accomplish the erase.  hdparm
> makes this easy because it's just a bunch of command-line flags; the
> ""heavy lifting"" on the ATA layer is done elsewhere.  With "camcontrol
> cmd", you get to submit the raw ATA CDB yourself, multiple times, at
> different phases.  Just how familiar with the ATA protocol are you?  :-)
> 
> Why I sound paranoid: a typo could potentially "brick" your drive.  If
> you issue a set-password on the drive, ***ALL*** LBA accesses (read and
> write) return I/O errors from that point forward.  Make a typo in the
> password, formulate the CDB wrong, whatever -- suddenly you have a drive
> that you can't access or use any more because the password was wrong,
> etc...  If the user doesn't truly understand what they're doing
> (including the formulation of the CDB), then they're going to panic.
> 
> camcontrol and atacontrol could both be modified to do the heavy
> lifting, making similar options/arguments that would mimic hdparm in
> operation.  This would greatly diminish the risks, but the *EXACT
> PROCEDURE* would need to be explained in the man page.  But keep reading
> for why that may not be enough.
> 
> I've been in the situation where I've gone through the procedure you
> followed on said web page, only to run into a quirk with the ATA/IDE
> subsystem on Windows XP, requiring a power-cycle of the system.  The
> secure erase finished, but I was panicking when I saw the drive spitting
> out I/O errors on every LBA.  I realised that I needed to unlock the
> drive using --security-unlock then disable security by using
> --security-disable.  Once I did that it was fine.  The web page omits
> that part, in the case of emergency or anomalies are witnessed.  This
> ordeal happened to me today, no joke, while tinkering with my new Intel
> 510 SSD.  So here's a better page:
> 
> http://tinyapps.org/docs/wipe_drives_hdparm.html
> 
> Why am I pointing this out?  Because, in effect, an entire "HOW TO DO
> THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be
> added to camcontrol/atacontrol to ensure people don't end up with
> "bricked" drives and blame FreeBSD.  Trust me, it will happen.  Give
> users tools to shoot themselves in the foot and they will do so.
> 
> Furthermore, SCSI drives (which is what camcontrol has historically been
> for up until recently) have a completely different secure erase CDB
> command for them.  ATA has SECURITY ERASE UNIT, SCSI has SECURITY
> INITIALIZE -- and in the SCSI realm, this feature is optional!  So
> there's that error-prone issue as well.  Do you know how many times I've
> issued "camcontrol inquiry" instead of "camcontrol identify" on my
> ada(4)-based systems?  Too many.  Food for thought.  :-)
> 
> Anyway, this is probably the only time you will ever find me saying
> this, but: if improving camcontrol/atacontrol to accomplish the above is
> what you want, patches are welcome.  I could try to spend some time on
> this if there is great interest in the community for such (I'm more
> familiar with atacontrol's code given my SMART work in the past), and I
> do have an unused Intel 320-series SSD which I can test with.

This is of definite of interest here and I suspect to the rest of the
community as well. I'm not at all familiar with ATA codes etc so I
expect it would take me ages to come up with this.

In our case SSD's are a must as HD's don't have the IOPs to deal with
our application, we'll just need to manage the write speed drop offs.

Performing offline maintenance to have them run at good speed is
not ideal but much easier and more acceptable than booting another OS,
which would a total PITA as some machines don't have IPMI with virtual
media so means remote hands etc.

Using a Backup -> Erase -> Restore direct from BSD would hence be my
preferred workaround until TRIM support is added, but I guess that could
well be some time for ZFS.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 13:35:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F29E106564A;
	Thu, 28 Jul 2011 13:35:45 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E3FA88FC1F;
	Thu, 28 Jul 2011 13:35:44 +0000 (UTC)
Received: by yxl31 with SMTP id 31so1905745yxl.13
	for <multiple recipients>; Thu, 28 Jul 2011 06:35:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type;
	bh=SLKq5rNuC+cQaVhJeVz8M0zBh2uz2nhytsbarKKU2+U=;
	b=O9K3gaHQnFYIBZ/oAtzmJkwQM1NLBohx5qFa+XNGyGI3JmV06XSdMAaaeykiiPhpsp
	NkFR/Z2PHA41C9vZqp3RUAHF9Y+zgluia8xxFez3rCRvW4UYj3qGTbvOJ7gFAGsKCceX
	93Rw6aRgQNnN8kWo4hgik2D8oX0v/6LtUunnY=
Received: by 10.101.158.19 with SMTP id k19mr27538ano.61.1311860144139; Thu,
	28 Jul 2011 06:35:44 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 06:35:04 -0700 (PDT)
In-Reply-To: <4E315D78.90209@FreeBSD.org>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
	<CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
	<4E315D78.90209@FreeBSD.org>
From: Ivan Voras <ivoras@freebsd.org>
Date: Thu, 28 Jul 2011 15:35:04 +0200
X-Google-Sender-Auth: UqEDwuNPHgOAGyi48aDrh1UyNfY
Message-ID: <CAF-QHFV12fw8cxOF-q7T4z4OFngnjcYbWROTj83yc4ZnGdsqhw@mail.gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 13:35:45 -0000

On 28 July 2011 15:00, Andriy Gapon <avg@freebsd.org> wrote:
> on 28/07/2011 15:48 Ivan Voras said the following:

>>>From the various csup dates I have on the servers it looks like it's
>> been removed somewhere between April and now, possibly with ZFS 28
>> MFC?
>
> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html
>
>> I.e. this code is missing:

I don't suppose that complaining about the removal of useful code will
do any good?

Sometimes you consciously need performance more than 100% reliability
(and if the old documentation is right, disabling ZIL will not damage
the file system itself, just increase the risk of user data loss).

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 14:05:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 076261065670;
	Thu, 28 Jul 2011 14:05:59 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id EC9B58FC13;
	Thu, 28 Jul 2011 14:05:57 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA24612;
	Thu, 28 Jul 2011 17:05:55 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E316CC3.6070604@FreeBSD.org>
Date: Thu, 28 Jul 2011 17:05:55 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110705 Thunderbird/5.0
MIME-Version: 1.0
To: Ivan Voras <ivoras@FreeBSD.org>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
	<CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
	<4E315D78.90209@FreeBSD.org>
	<CAF-QHFV12fw8cxOF-q7T4z4OFngnjcYbWROTj83yc4ZnGdsqhw@mail.gmail.com>
In-Reply-To: <CAF-QHFV12fw8cxOF-q7T4z4OFngnjcYbWROTj83yc4ZnGdsqhw@mail.gmail.com>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, Martin Matuska <mm@FreeBSD.org>
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 14:05:59 -0000

on 28/07/2011 16:35 Ivan Voras said the following:
> On 28 July 2011 15:00, Andriy Gapon <avg@freebsd.org> wrote:
>> on 28/07/2011 15:48 Ivan Voras said the following:
> 
>>> >From the various csup dates I have on the servers it looks like it's
>>> been removed somewhere between April and now, possibly with ZFS 28
>>> MFC?
>>
>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html
>>
>>> I.e. this code is missing:
> 
> I don't suppose that complaining about the removal of useful code will
> do any good?

The question is obviously not directed to me? :-)

> Sometimes you consciously need performance more than 100% reliability
> (and if the old documentation is right, disabling ZIL will not damage
> the file system itself, just increase the risk of user data loss).


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 14:24:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7784E106564A;
	Thu, 28 Jul 2011 14:24:36 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: from mail-gw0-f50.google.com (mail-gw0-f50.google.com [74.125.83.50])
	by mx1.freebsd.org (Postfix) with ESMTP id EF8B88FC18;
	Thu, 28 Jul 2011 14:24:35 +0000 (UTC)
Received: by gwj16 with SMTP id 16so2392149gwj.37
	for <multiple recipients>; Thu, 28 Jul 2011 07:24:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type;
	bh=SUCGkfXFsAIQUWTZsq5RiA9wKDKbx1WitagPJxygiIE=;
	b=Y0GncMH68hsLX8FueBw/jkPLIgnG9YGqu/FfrZjHmHL6UvvqyiJdNVbra/6jIpmaWM
	iU3wqToF/snNUIpVWj6H3GKIxQ5GlFL8BMpKK8gL95ysYv+rzzzcpHn2bsKPqj2HhY+4
	4iVru7BQjlS2ySCk1cltuYEvp5Y+fgO+cbOOc=
Received: by 10.101.18.6 with SMTP id v6mr82499ani.39.1311863075153; Thu, 28
	Jul 2011 07:24:35 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 07:23:55 -0700 (PDT)
In-Reply-To: <4E316CC3.6070604@FreeBSD.org>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
	<CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
	<4E315D78.90209@FreeBSD.org>
	<CAF-QHFV12fw8cxOF-q7T4z4OFngnjcYbWROTj83yc4ZnGdsqhw@mail.gmail.com>
	<4E316CC3.6070604@FreeBSD.org>
From: Ivan Voras <ivoras@freebsd.org>
Date: Thu, 28 Jul 2011 16:23:55 +0200
X-Google-Sender-Auth: B7Cx1M7zYSik2x_GwMzJMiNJrS8
Message-ID: <CAF-QHFX9g6C_vONrYmdQYP92wUKuiF+hyK-FaT3mxr99HqQGNQ@mail.gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 14:24:36 -0000

On 28 July 2011 16:05, Andriy Gapon <avg@freebsd.org> wrote:
> on 28/07/2011 16:35 Ivan Voras said the following:
>> On 28 July 2011 15:00, Andriy Gapon <avg@freebsd.org> wrote:
>>> on 28/07/2011 15:48 Ivan Voras said the following:
>>
>>>> >From the various csup dates I have on the servers it looks like it's
>>>> been removed somewhere between April and now, possibly with ZFS 28
>>>> MFC?
>>>
>>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html
>>>
>>>> I.e. this code is missing:
>>
>> I don't suppose that complaining about the removal of useful code will
>> do any good?
>
> The question is obviously not directed to me? :-)

No, it was a question to the ZFS cabal :) We'll see if it remains rhetorical :)

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 14:59:26 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2A4ED106566B
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 14:59:26 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta14.emeryville.ca.mail.comcast.net
	(qmta14.emeryville.ca.mail.comcast.net [76.96.27.212])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F40A8FC0A
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 14:59:25 +0000 (UTC)
Received: from omta05.emeryville.ca.mail.comcast.net ([76.96.30.43])
	by qmta14.emeryville.ca.mail.comcast.net with comcast
	id DSxZ1h0090vp7WLAESzNfo; Thu, 28 Jul 2011 14:59:22 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta05.emeryville.ca.mail.comcast.net with comcast
	id DSzW1h00E1t3BNj8RSzYn2; Thu, 28 Jul 2011 14:59:34 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 198AA102C36; Thu, 28 Jul 2011 07:59:17 -0700 (PDT)
Date: Thu, 28 Jul 2011 07:59:17 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <20110728145917.GA37805@icarus.home.lan>
References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
	<20110728012437.GA23430@icarus.home.lan>
	<FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
	<20110728103234.GA33275@icarus.home.lan>
	<A6828B6CE6764E13A44B1ABF61CF3FED@multiplay.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <A6828B6CE6764E13A44B1ABF61CF3FED@multiplay.co.uk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Questions about erasing an ssd to restore performance under
 FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 14:59:26 -0000

On Thu, Jul 28, 2011 at 02:22:21PM +0100, Steven Hartland wrote:
> ----- Original Message ----- From: "Jeremy Chadwick"
> <freebsd@jdc.parodius.com>
> >Well, on FreeBSD /dev/urandom is a symlink to /dev/random.  I've
> >discussed in the past why I use /dev/urandom instead of /dev/random (I
> >happen to work in a heterogeneous OS environment at work, where urandom
> >and random are different things).
> >
> >I was mainly curious why you were using if=/some/actual/file rather than
> >if=/dev/urandom directly.  'tis okay, not of much importance.
> 
> /dev/urandom seems to bottle neck at ~60MB/s a cached file generated from
> it doesn't e.g.
> dd if=/dev/random of=/dev/null bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 16.152686 secs (64916509 bytes/sec)
> 
> dd if=/dev/random of=/data/test bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 16.178811 secs (64811685 bytes/sec)
> 
> dd if=/data/test of=/dev/null bs=1m
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 0.240348 secs (4362738865 bytes/sec)

/dev/urandom is highly CPU-bound.  For example, on my home box it tops
out at about 79MBytes/sec.  I tend to use /dev/zero for I/O testing,
since I really don't need the CPU tied up generating random data from
entropy sources.  The difference in speed is dramatic.

So yes, I guess if you wanted to test high write speeds with purely
randomised data as your source, creating a temporary file with content
from /dev/urandom first would be your best bet.  (Assuming, of course,
that the source you plan to read from can transfer as fast as the writes
to the destination, but that goes without saying).

> >Okay, so it sounds like what happened -- if I understand correctly -- is
> >that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of
> >data copied to it.  It still had 60% free space available.  After, the
> >SSD performance for writes really plummeted (~20MByte/sec), but reads
> >were still decent.  Performing an actual ATA-level secure erase brought
> >the drive back to normal write performance (~190MByte/sec).
> 
> Yes this is correct.
> 
> >If all of that is correct, then I would say the issue is that the
> >internal GC on the Corsair SSD in question sucks.  With 60% of the drive
> >still available, performance should not have dropped to such an abysmal
> >rate; the FTL and wear levelling should have, ideally, dealt with this
> >just fine.  But it didn't.
> 
> Agreed
> 
> >Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever,
> >that's an engineering discussion for elsewhere) lacks TRIM.  The
> >underlying filesystem is therefore unable to tell the drive "hey, these
> >LBAs aren't used any more, you can consider them free and perform a NAND
> >page erase when an entire NAND page is unused".  The FTL has to track
> >all LBAs you've written to, otherwise if erasing a NAND page which still
> >had used data in it (for the filesystem) it would result in loss of
> >data.
> >
> >So in summary I'm not too surprised by this situation happening, but I
> >*AM* surprised at just how horrible writes became for you.  The white
> >paper I linked you goes over this to some degree -- it talks about how
> >everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks
> >or talks about how horrible they perform when very little free space is
> >available, or if the GC is badly implemented.  Maybe Corsair's GC is
> >badly implemented -- I don't know.
> 
> Agreed again, we've seen a few disks now drop to this level of performance
> at first we thought the disk was failing, as the newfs -E didn't fix it when
> the man page indicates it should. But seems thats explained now, only
> works if its ada not da, and also not quite as good as a secure erase.

I guess the newfs(8) man page should be rephrased then.  When I read the
description for the -E option, I see this paragraph:

            Erasing may take a long time as it writes to every sector
            on the disk.

And immediately think "Oh, all it does is write zeros to every LBA,
probably in blocks of some size that's unknown to me (vs. 512 bytes)".

I can submit a PR + patch for this, but I'd propose the man page
description for -E be changed to this:

   -E      Erase the content of the disk before making the filesystem.
           The reserved area in front of the superblock (for bootcode)
           will not be erased.

           This option writes zeros to every sector (LBA) on the disk,
           in transfer sizes of, at most, 65536 * sectorsize bytes.

Basically remove the mention of wear-leveling and "intended for use
with flash devices".  Any device can use this option as well; it's a
UFS-esque equivalent of dd if=/dev/zero of=/dev/device bs=..., sans the
exclusions mentioned.

The tricky part is the "transfer sizes of, at most..." line.  I'm
certain someone will ask me where I got that from, so I'll explain it.
Sorry for the long-winded stuff, but this is more or less how I learn,
and I hope it benefits someone in the process.  And man I sure hope I'm
reading this code right...

<ignore-if-you-dont-care>

Down the rabbit hole we go:

newfs(8) calls berase(3), which is part of libufs:

 501         if (Eflag && !Nflag) {
 ...
 505                 berase(&disk, sblock.fs_sblockloc / disk.d_bsize,
 506                     sblock.fs_size * sblock.fs_fsize - sblock.fs_sblockloc);

The man page for berase(3) doesn't tell you the size of I/O transfer
(the "block size") when it asks the kernel to effectively write zeros to
the device.

Looking at src/lib/libufs/block.c, we find this:

143 berase(struct uufsd *disk, ufs2_daddr_t blockno, ufs2_daddr_t size)
...
154         ioarg[0] = blockno * disk->d_bsize;
155         ioarg[1] = size;
156         rv = ioctl(disk->d_fd, DIOCGDELETE, ioarg);

This ioctl(2) (DIOCGDELETE) is not documented anywhere in the entire
source code tree (grep -r DIOCGDELETE /usr/src returns absolutely no
documentation references).  Furthermore, at this point we still have no
idea whow the arguments being passed to ioctl are used; is "size" the
total size, or is it the transfer size of the write we're going to
issue?

DIOCGDELETE is handled in src/sys/geom/geom_dev.c, where we finally get
some answers:

293         case DIOCGDELETE:
294                 offset = ((off_t *)data)[0];
295                 length = ((off_t *)data)[1];
...
303                 while (length > 0) {
304                         chunk = length;
305                         if (chunk > 65536 * cp->provider->sectorsize)
306                                 chunk = 65536 * cp->provider->sectorsize;
307                         error = g_delete_data(cp, offset, chunk);
308                         length -= chunk;
309                         offset += chunk;

So ioctl[0] is the offset, and ioctl[1] represents the actual TOTAL SIZE
of what we want erased, NOT the transfer block size itself.

The block size itself is calculated on line 306, so 65536 * the actual
GEOM provider's "advertised sector size".  On SSDs, this would be 512
bytes (no I am not kidding).

But we're still not finished.  What is g_delete_data?  It's an internal
GEOM function which does what it's told (heh :-) ).

src/sys/geom/geom_io.c sheds light on that:

739 g_delete_data(struct g_consumer *cp, off_t offset, off_t length)
740 {
741         struct bio *bp;
742         int error;
743
744         KASSERT(length > 0 && length >= cp->provider->sectorsize,
745             ("g_delete_data(): invalid length %jd", (intmax_t)length));
746
747         bp = g_alloc_bio();
748         bp->bio_cmd = BIO_DELETE;
749         bp->bio_done = NULL;
750         bp->bio_offset = offset;
751         bp->bio_length = length;
752         bp->bio_data = NULL;
753         g_io_request(bp, cp);
754         error = biowait(bp, "gdelete");
...

Okay, so without going into g_io_request() (did I not say something
about rabbit holes earlier?), we can safely assume that's even more
abstraction around a BIO_DELETE call.  bp->bio_length is the size of the
data to tinker with, in bytes.

So in summary, with a 512-byte "advertised sector" disk, the erase would
happen in 32MByte "transfer size blocks".  Let's test that theory with
an mdconfig(8) "disk" and a slightly modified version of newfs(8) that
tells us what the value of the 3rd argument is that it's passing to
berase(3):

# mdconfig -a -t malloc -s 256m -o reserve -u 0
md0
# sysctl -b kern.geom.conftxt | strings | grep md0
0 MD md0 268435456 512 u 0 s 512 f 0 fs 0 l 268435456 t malloc

Sector size of the md0 pseudo-disk is 512 bytes (5th parameter).  Now
the modified newfs:

# ~jdc/tmp/newfs/newfs -E /dev/md0
/dev/md0: 256.0MB (524288 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 64.02MB, 4097 blks, 8256 inodes.
Erasing sectors [128...524287]
berase() 3rd arg: 268369920
super-block backups (for fsck -b #) at:
 160, 131264, 262368, 393472

There's the printf() I added ("berase()...").  So the argument passed to
berase() is 268369920 (the size of the pseudo-disk, sans the area before
the superblock, in this case 4 CGs at 16384 block size, so 65536 bytes;
268435456 - 268369920 == 65536).  Now back to the geom_dev.c code with
the data we know:

- Line 395 would assign length to 268369920
- Line 304 would assign chunk to 268369920
- Line 305 conditional would prove true; 268369920 > 33554432
  (65536*512), so chunk becomes 33554432
- Line 307 "and within" does the actual zeroing

</ignore-if-you-dont-care>

The reason the man page can't say 32MBytes explicitly is because it's
dynamic (based on sector size).  I imagine, somewhere down the road, we
WILL have disks that start advertising non-512-byte sector sizes.  As of
this writing none I have seen do (SSDs nor WD -EARS drives).
           
> >I would see if there are any F/W updates for that model of drive.  The
> >firmware controls the GC model/method.  Otherwise, if this issue is
> >reproducible, I'll add this model of Corsair SSD to my list of drives to
> >avoid.
> 
> Its the latest firmware version, already checked that. The performance
> has been good till now and I suspect it could be a generic sandforce
> thing if its a firmware issue.

SandForce-based SSDs have a history of being extremely good with their
GC, but I've never used one.  However, if I remember right (something I
read not more than a week ago, I just can't remember where!), it's very
rare that any SF-based SSD vendor uses the stock SF firmware.  They
modify the hell out of it.  Meaning: two SSDs using the exact same model
of SF controller doesn't mean they'll behave the exact same.  Hmm, I
probably read this on some SSD review site, maybe Anandtech.  I imagine
the same applies to Marvell-based SSD controllers too.

> >Is it possible to accomplish Secure Erase via "camcontrol cmd" with
> >ada(4)?  Yes, but the procedure will be extremely painful, drawn out,
> >and very error-prone.
> >
> >Given that you've followed the procedure on the Linux hdparm/ATA Secure
> >Erase web page, you're aware of the security and "locked" status one has
> >to deal with using password-protection to accomplish the erase.  hdparm
> >makes this easy because it's just a bunch of command-line flags; the
> >""heavy lifting"" on the ATA layer is done elsewhere.  With "camcontrol
> >cmd", you get to submit the raw ATA CDB yourself, multiple times, at
> >different phases.  Just how familiar with the ATA protocol are you?  :-)
> >
> >Why I sound paranoid: a typo could potentially "brick" your drive.  If
> >you issue a set-password on the drive, ***ALL*** LBA accesses (read and
> >write) return I/O errors from that point forward.  Make a typo in the
> >password, formulate the CDB wrong, whatever -- suddenly you have a drive
> >that you can't access or use any more because the password was wrong,
> >etc...  If the user doesn't truly understand what they're doing
> >(including the formulation of the CDB), then they're going to panic.
> >
> >camcontrol and atacontrol could both be modified to do the heavy
> >lifting, making similar options/arguments that would mimic hdparm in
> >operation.  This would greatly diminish the risks, but the *EXACT
> >PROCEDURE* would need to be explained in the man page.  But keep reading
> >for why that may not be enough.
> >
> >I've been in the situation where I've gone through the procedure you
> >followed on said web page, only to run into a quirk with the ATA/IDE
> >subsystem on Windows XP, requiring a power-cycle of the system.  The
> >secure erase finished, but I was panicking when I saw the drive spitting
> >out I/O errors on every LBA.  I realised that I needed to unlock the
> >drive using --security-unlock then disable security by using
> >--security-disable.  Once I did that it was fine.  The web page omits
> >that part, in the case of emergency or anomalies are witnessed.  This
> >ordeal happened to me today, no joke, while tinkering with my new Intel
> >510 SSD.  So here's a better page:
> >
> >http://tinyapps.org/docs/wipe_drives_hdparm.html
> >
> >Why am I pointing this out?  Because, in effect, an entire "HOW TO DO
> >THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be
> >added to camcontrol/atacontrol to ensure people don't end up with
> >"bricked" drives and blame FreeBSD.  Trust me, it will happen.  Give
> >users tools to shoot themselves in the foot and they will do so.
> >
> >Furthermore, SCSI drives (which is what camcontrol has historically been
> >for up until recently) have a completely different secure erase CDB
> >command for them.  ATA has SECURITY ERASE UNIT, SCSI has SECURITY
> >INITIALIZE -- and in the SCSI realm, this feature is optional!  So
> >there's that error-prone issue as well.  Do you know how many times I've
> >issued "camcontrol inquiry" instead of "camcontrol identify" on my
> >ada(4)-based systems?  Too many.  Food for thought.  :-)
> >
> >Anyway, this is probably the only time you will ever find me saying
> >this, but: if improving camcontrol/atacontrol to accomplish the above is
> >what you want, patches are welcome.  I could try to spend some time on
> >this if there is great interest in the community for such (I'm more
> >familiar with atacontrol's code given my SMART work in the past), and I
> >do have an unused Intel 320-series SSD which I can test with.
> 
> This is of definite of interest here and I suspect to the rest of the
> community as well. I'm not at all familiar with ATA codes etc so I
> expect it would take me ages to come up with this.
> 
> In our case SSD's are a must as HD's don't have the IOPs to deal with
> our application, we'll just need to manage the write speed drop offs.
> 
> Performing offline maintenance to have them run at good speed is
> not ideal but much easier and more acceptable than booting another OS,
> which would a total PITA as some machines don't have IPMI with virtual
> media so means remote hands etc.
> 
> Using a Backup -> Erase -> Restore direct from BSD would hence be my
> preferred workaround until TRIM support is added, but I guess that could
> well be some time for ZFS.

Understood.  I'm off work this week so I'll see if I can dedicate some
time to it.  Too many non-work projects I'm juggling right now, argh.

I'll have to start with camcontrol since the test system I have uses
ada(4) and not classic ata(4).  I'm not even sure what I'm really in for
given that I've never looked at camcontrol's code before.

If I "brick" my SSD I'll send you a bill, Steven.  Kidding.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 15:27:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 93FC5106566B
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 15:27:11 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta09.emeryville.ca.mail.comcast.net
	(qmta09.emeryville.ca.mail.comcast.net [76.96.30.96])
	by mx1.freebsd.org (Postfix) with ESMTP id 7950E8FC15
	for <freebsd-fs@freebsd.org>; Thu, 28 Jul 2011 15:27:11 +0000 (UTC)
Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76])
	by qmta09.emeryville.ca.mail.comcast.net with comcast
	id DTSW1h0011eYJf8A9TT72B; Thu, 28 Jul 2011 15:27:08 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta19.emeryville.ca.mail.comcast.net with comcast
	id DTNR1h01R1t3BNj01TNde6; Thu, 28 Jul 2011 15:22:41 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id BE4CE102C36; Thu, 28 Jul 2011 08:21:51 -0700 (PDT)
Date: Thu, 28 Jul 2011 08:21:51 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <20110728152151.GA39317@icarus.home.lan>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
	<CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
	<4E315D78.90209@FreeBSD.org>
	<CAF-QHFV12fw8cxOF-q7T4z4OFngnjcYbWROTj83yc4ZnGdsqhw@mail.gmail.com>
	<4E316CC3.6070604@FreeBSD.org>
	<CAF-QHFX9g6C_vONrYmdQYP92wUKuiF+hyK-FaT3mxr99HqQGNQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAF-QHFX9g6C_vONrYmdQYP92wUKuiF+hyK-FaT3mxr99HqQGNQ@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 15:27:11 -0000

On Thu, Jul 28, 2011 at 04:23:55PM +0200, Ivan Voras wrote:
> On 28 July 2011 16:05, Andriy Gapon <avg@freebsd.org> wrote:
> > on 28/07/2011 16:35 Ivan Voras said the following:
> >> On 28 July 2011 15:00, Andriy Gapon <avg@freebsd.org> wrote:
> >>> on 28/07/2011 15:48 Ivan Voras said the following:
> >>
> >>>> >From the various csup dates I have on the servers it looks like it's
> >>>> been removed somewhere between April and now, possibly with ZFS 28
> >>>> MFC?
> >>>
> >>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html
> >>>
> >>>> I.e. this code is missing:
> >>
> >> I don't suppose that complaining about the removal of useful code will
> >> do any good?
> >
> > The question is obviously not directed to me? :-)
> 
> No, it was a question to the ZFS cabal :) We'll see if it remains rhetorical :)

What about this?

http://blog.tschokko.de/archives/786
http://milek.blogspot.com/2010/05/zfs-synchronous-vs-asynchronous-io.html

# zfs get all backups | grep sync
backups  sync                  standard               default

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 15:47:15 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BF8F7106564A
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 15:47:15 +0000 (UTC)
	(envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 30B6B8FC0C
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 15:47:14 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 16:46:42 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 16:46:42 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
	by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	(MDaemon PRO v10.0.4) with ESMTP id md50014359165.msg
	for <freebsd-fs@FreeBSD.ORG>; Thu, 28 Jul 2011 16:46:40 +0100
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG
Message-ID: <2A07CD8AE6AE49A5BAED59A7E547D1F9@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Jeremy Chadwick" <freebsd@jdc.parodius.com>
References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk>
	<20110728012437.GA23430@icarus.home.lan>
	<FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
	<20110728103234.GA33275@icarus.home.lan>
	<A6828B6CE6764E13A44B1ABF61CF3FED@multiplay.co.uk>
	<20110728145917.GA37805@icarus.home.lan>
Date: Thu, 28 Jul 2011 16:47:21 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Questions about erasing an ssd to restore performance under
	FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 15:47:15 -0000

----- Original Message ----- 
From: "Jeremy Chadwick" <freebsd@jdc.parodius.com>
> I guess the newfs(8) man page should be rephrased then.  When I read the
> description for the -E option, I see this paragraph:
> 
>            Erasing may take a long time as it writes to every sector
>            on the disk.
> 
> And immediately think "Oh, all it does is write zeros to every LBA,
> probably in blocks of some size that's unknown to me (vs. 512 bytes)".
> 
> I can submit a PR + patch for this, but I'd propose the man page
> description for -E be changed to this:
> 
>   -E      Erase the content of the disk before making the filesystem.
>           The reserved area in front of the superblock (for bootcode)
>           will not be erased.
> 
>           This option writes zeros to every sector (LBA) on the disk,
>           in transfer sizes of, at most, 65536 * sectorsize bytes.

It actually does more than this using BIO_DELETE to tell the disk its
unallocated now aka (TRIM) but needs to state its only suppored on some
controllers / disk drivers.

> Basically remove the mention of wear-leveling and "intended for use
> with flash devices".  Any device can use this option as well; it's a
> UFS-esque equivalent of dd if=/dev/zero of=/dev/device bs=..., sans the
> exclusions mentioned.

I believe it does this if its supported, which atm means ada, thats
what needs clarifying.

> SandForce-based SSDs have a history of being extremely good with their
> GC, but I've never used one.  However, if I remember right (something I
> read not more than a week ago, I just can't remember where!), it's very
> rare that any SF-based SSD vendor uses the stock SF firmware.  They
> modify the hell out of it.  Meaning: two SSDs using the exact same model
> of SF controller doesn't mean they'll behave the exact same.  Hmm, I
> probably read this on some SSD review site, maybe Anandtech.  I imagine
> the same applies to Marvell-based SSD controllers too.

Yer quite possibly.

>> Using a Backup -> Erase -> Restore direct from BSD would hence be my
>> preferred workaround until TRIM support is added, but I guess that could
>> well be some time for ZFS.
> 
> Understood.  I'm off work this week so I'll see if I can dedicate some
> time to it.  Too many non-work projects I'm juggling right now, argh.
> 
> I'll have to start with camcontrol since the test system I have uses
> ada(4) and not classic ata(4).  I'm not even sure what I'm really in for
> given that I've never looked at camcontrol's code before.
> 
> If I "brick" my SSD I'll send you a bill, Steven.  Kidding.  :-)

If you need a test SSD lmk an address offlist and I'll sort, least we
can do :)

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 28 15:52:49 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8BD9E1065674;
	Thu, 28 Jul 2011 15:52:49 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com
	[209.85.161.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 34A1D8FC12;
	Thu, 28 Jul 2011 15:52:48 +0000 (UTC)
Received: by gxk28 with SMTP id 28so2441043gxk.13
	for <multiple recipients>; Thu, 28 Jul 2011 08:52:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type;
	bh=khQM03DMmVfAoxmvtLo4vJNYDXW7h2379wCfzqJiGMc=;
	b=IXbGPVsSoD0cl7MMKF0W7QZzQ8Y0TbYPP19ucSV5uncJ/TTWKTPqNoxwyv6ZC+ZJsY
	67cTu6maBh5EkVS/plWqA+UDNC7+We24sWx5lpNGSRVZp2s+QAuFQw2yPmlUYkonqIkk
	1qSsZKsvFZ7+RbC6G8cLztfAm2DJSPFO4dcjM=
Received: by 10.100.211.11 with SMTP id j11mr174419ang.17.1311868368147; Thu,
	28 Jul 2011 08:52:48 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 08:52:08 -0700 (PDT)
In-Reply-To: <20110728152151.GA39317@icarus.home.lan>
References: <j0rhcq$78q$1@dough.gmane.org> <4E3154E0.1030206@FreeBSD.org>
	<CAF-QHFUWEo4cHqQbm5rBpHASsCR5SAV1xwJazBunAFaexTtMtQ@mail.gmail.com>
	<4E315D78.90209@FreeBSD.org>
	<CAF-QHFV12fw8cxOF-q7T4z4OFngnjcYbWROTj83yc4ZnGdsqhw@mail.gmail.com>
	<4E316CC3.6070604@FreeBSD.org>
	<CAF-QHFX9g6C_vONrYmdQYP92wUKuiF+hyK-FaT3mxr99HqQGNQ@mail.gmail.com>
	<20110728152151.GA39317@icarus.home.lan>
From: Ivan Voras <ivoras@freebsd.org>
Date: Thu, 28 Jul 2011 17:52:08 +0200
X-Google-Sender-Auth: mnnxSNZlP5mQGc5qLK_IbRWy4q8
Message-ID: <CAF-QHFWfLgeTms=fieZ+fR08s346jyWgM1cqTD9nEnSaAT+PKw@mail.gmail.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject: Re: ZFS how to find out if ZIL is currently enabled?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2011 15:52:49 -0000

On 28 July 2011 17:21, Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:
> On Thu, Jul 28, 2011 at 04:23:55PM +0200, Ivan Voras wrote:
>> On 28 July 2011 16:05, Andriy Gapon <avg@freebsd.org> wrote:
>> > on 28/07/2011 16:35 Ivan Voras said the following:
>> >> On 28 July 2011 15:00, Andriy Gapon <avg@freebsd.org> wrote:
>> >>> on 28/07/2011 15:48 Ivan Voras said the following:
>> >>
>> >>>> >From the various csup dates I have on the servers it looks like it's
>> >>>> been removed somewhere between April and now, possibly with ZFS 28
>> >>>> MFC?
>> >>>
>> >>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html
>> >>>
>> >>>> I.e. this code is missing:
>> >>
>> >> I don't suppose that complaining about the removal of useful code will
>> >> do any good?
>> >
>> > The question is obviously not directed to me? :-)
>>
>> No, it was a question to the ZFS cabal :) We'll see if it remains rhetorical :)
>
> What about this?
>
> http://blog.tschokko.de/archives/786
> http://milek.blogspot.com/2010/05/zfs-synchronous-vs-asynchronous-io.html

Hey, that's great! I didn't know about it  - the sync property is good
enough for me! (even better as it is per-fs)

I've just enabled it where I need it and I can see the difference.

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 29 05:55:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 658C2106566C;
	Fri, 29 Jul 2011 05:55:05 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 35E378FC0C;
	Fri, 29 Jul 2011 05:55:05 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6T5t5Ii093912;
	Fri, 29 Jul 2011 05:55:05 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6T5t5tb093908;
	Fri, 29 Jul 2011 05:55:05 GMT (envelope-from linimon)
Date: Fri, 29 Jul 2011 05:55:05 GMT
Message-Id: <201107290555.p6T5t5tb093908@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/159251: [zfs] [request]: add FLETCHER4 as DEDUP hash option
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Jul 2011 05:55:05 -0000

Old Synopsis: FEATURE REQUEST: add FLETCHER4 as DEDUP hash option
New Synopsis: [zfs] [request]: add FLETCHER4 as DEDUP hash option

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Fri Jul 29 05:54:25 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=159251