From owner-freebsd-fs@FreeBSD.ORG  Sun Sep 26 10:39:30 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A13781065670
	for <freebsd-fs@freebsd.org>; Sun, 26 Sep 2010 10:39:30 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E8918FC21
	for <freebsd-fs@freebsd.org>; Sun, 26 Sep 2010 10:39:24 +0000 (UTC)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id E827D45C98; Sun, 26 Sep 2010 12:39:21 +0200 (CEST)
Received: from localhost (chello089077043238.chello.pl [89.77.43.238])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 1D10945C99;
	Sun, 26 Sep 2010 12:39:17 +0200 (CEST)
Date: Sun, 26 Sep 2010 12:38:56 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Mikolaj Golub <to.my.trociny@gmail.com>
Message-ID: <20100926103856.GI47356@garage.freebsd.pl>
References: <86mxr7x0ih.fsf@kopusha.home.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="1X+6QtwRodzgDPAC"
Content-Disposition: inline
In-Reply-To: <86mxr7x0ih.fsf@kopusha.home.net>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 9.0-CURRENT amd64
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL 
	autolearn=no version=3.0.4
Cc: freebsd-fs@freebsd.org
Subject: Re: hastd: memory leaks if fork() fails
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Sep 2010 10:39:30 -0000


--1X+6QtwRodzgDPAC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Sep 24, 2010 at 06:51:02AM +0300, Mikolaj Golub wrote:
> Hi,
>=20
> Although it is rather unlikely situation but anyway :-)
>=20
> If fork() fails in hook_execv() hastd leaks some bytes referred by hp. Se=
e the
> attached patch.

Committed, thanks!

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--1X+6QtwRodzgDPAC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAkyfIsAACgkQForvXbEpPzS4lACeMynn016Q0+1nGpUtgA6j1sKe
aUUAn3C+MQo2jk8fafYvIy64F9SR1xZ2
=BDnp
-----END PGP SIGNATURE-----

--1X+6QtwRodzgDPAC--

From owner-freebsd-fs@FreeBSD.ORG  Sun Sep 26 21:19:35 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5461E106566C;
	Sun, 26 Sep 2010 21:19:35 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 2B5088FC14;
	Sun, 26 Sep 2010 21:19:35 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8QLJZNJ019219;
	Sun, 26 Sep 2010 21:19:35 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8QLJZ9W019215;
	Sun, 26 Sep 2010 21:19:35 GMT (envelope-from linimon)
Date: Sun, 26 Sep 2010 21:19:35 GMT
Message-Id: <201009262119.o8QLJZ9W019215@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/150910: [nfs] wsize=16384 on udp nfs mount unusable
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Sep 2010 21:19:35 -0000

Old Synopsis: wsize=16384 on udp nfs mount unusable
New Synopsis: [nfs] wsize=16384 on udp nfs mount unusable

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Sun Sep 26 21:18:58 UTC 2010
Responsible-Changed-Why: 
reclassify.

http://www.freebsd.org/cgi/query-pr.cgi?pr=150910

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 00:05:26 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 59C4F106566C
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 00:05:26 +0000 (UTC)
	(envelope-from fbsd@dannysplace.net)
Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 21EEF8FC14
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 00:05:25 +0000 (UTC)
Received: from [203.206.171.212] (helo=[192.168.10.10])
	by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <fbsd@dannysplace.net>)
	id 1P01E6-00035b-No; Mon, 27 Sep 2010 10:05:53 +1000
Message-ID: <4C9FDFBC.8030406@dannysplace.net>
Date: Mon, 27 Sep 2010 10:05:16 +1000
From: Danny Carroll <fbsd@dannysplace.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Martin Simmons <martin@lispworks.com>
References: <4C9AC1F6.90305@dannysplace.net>
	<201009231340.o8NDeNl5017806@higson.cam.lispworks.com>
In-Reply-To: <201009231340.o8NDeNl5017806@higson.cam.lispworks.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Authenticated-User: danny
X-Authenticator: plain
X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29)
X-Date: 2010-09-27 10:05:51
X-Connected-IP: 203.206.171.212:55109
X-Message-Linecount: 32
X-Body-Linecount: 17
X-Message-Size: 1299
X-Body-Size: 571
X-Received-Count: 1
X-Recipient-Count: 2
X-Local-Recipient-Count: 2
X-Local-Recipient-Defer-Count: 0
X-Local-Recipient-Fail-Count: 0
X-SA-Exim-Connect-IP: 203.206.171.212
X-SA-Exim-Mail-From: fbsd@dannysplace.net
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	damka.dannysplace.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-SA-Exim-Version: 4.2
X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net)
Cc: freebsd-fs@freebsd.org
Subject: Re: Devices disappeared after drive shuffle - or - how to recover
 and mount a slice with UFS partitions.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: fbsd@dannysplace.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 00:05:26 -0000

 On 23/09/2010 11:40 PM, Martin Simmons wrote:
>>>>>> On Thu, 23 Sep 2010 12:56:54 +1000, Danny Carroll said:
>> My only real question is.   Why did the devices fail to be created in
>> /dev from the original disk?
> See if the partitions are listed in the output of:
>
> sysctl -b kern.geom.conftxt
>
> If not, then it looks like a kernel/geom problem.
>

Thanks for the tip.    Unfortunately I've already wiped the slices and
re-partitioned with gpart.    Gpart seems to be ok.
I'm not too worried about it all.   I just thought it was interesting
enough to share.

-D

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 01:37:43 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0770B106564A
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 01:37:43 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id B8B168FC18
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 01:37:42 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAJKSn0yDaFvO/2dsb2JhbACDG6ALtVWRSYEigy50BIo6
X-IronPort-AV: E=Sophos;i="4.57,240,1283745600"; d="scan'208";a="95180231"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 26 Sep 2010 21:37:41 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D52E6B3F36;
	Sun, 26 Sep 2010 21:37:41 -0400 (EDT)
Date: Sun, 26 Sep 2010 21:37:41 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: "Sam Fourman Jr." <sfourman@gmail.com>
Message-ID: <1735156067.135783.1285551461797.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <AANLkTim2fgvKdn3yWdyGSPGG1_KEATradDmTyzMiskSa@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [24.65.230.102]
X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3
	(Mac)/6.0.7_GA_2473.RHEL4_64)
Cc: freebsd-fs@freebsd.org
Subject: Re: NFSRoot pxe available disk space
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 01:37:43 -0000

> I am running FreeBSD 9 amd64 via pxe NFSRoot
> I get a negative value for my available space on /
> 
> my FreeBSD NFS server is also running FreeBSD 9 built from todays SVN
> sources
> this did not happen in FreeBSD 8.1
> 
> 
> Sam# uname -a
> FreeBSD Sam.PuffyBSD.Com 9.0-CURRENT FreeBSD 9.0-CURRENT #2: Thu Sep
> 23 18:24:25 CDT 2010
> root@FNFS.PuffyBSD.Com:/usr/obj/usr/src/sys/WORKSTATION amd64
> 
> 
> Sam# df -h
> Filesystem Size Used
> Avail Capacity Mounted on
> 192.168.8.10:/Network/pxe/FreeBSD_AMD64_CURRENT -807G 28G
> -834G -3% /
> devfs 1.0K 1.0K
> 0B 100% /dev
> tmpfs 218M 4.0K
> 218M 0% /tmp
> linprocfs 4.0K 4.0K
> 0B 100% /compat/linux/proc
> 192.168.8.10:/Network/distfiles 1.2T 26G
> 1.2T 2% /usr/ports/distfiles
> 192.168.8.10:/Network/tv 1.5T 373G
> 1.2T 24% /Network/tv
> 192.168.8.10:/Network/iso 1.5T 277G
> 1.2T 19% /Network/iso
> 192.168.8.10:/Network/wow 1.2T 35G
> 1.2T 3% /Network/wow
> 192.168.8.10:/Network/music 1.3T 106G
> 1.2T 8% /Network/music
> 192.168.8.10:/Network/public 1.3T 148G
> 1.2T 11% /Network/public
> 192.168.8.10:/Network/pxe/FreeBSD_i386_8_1 1.2T 4.4G
> 1.2T 0% /compat/FreeBSD-i386
> 192.168.8.10:/Network/home/sfourman 1.2T 33G
> 1.2T 3% /usr/home/sfourman
> 
> 
I did send out a heads up when I committed it. If you replace the
kernel, but not pxeboot (built from recent sources), then the
default reverts back to using NFSv2. Either add "nfsv3" as an
option for the "/" line in /etc/fstab for the root fs on the NFS
server or replace "pxeboot" with one built from recent sources.
I think this is probably what is causing it.

rick
ps: NFSv2 only used 32bit #s for the sizes, so an overflow
    can easily happen.


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 11:06:54 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9AA6C106564A
	for <freebsd-fs@FreeBSD.org>; Mon, 27 Sep 2010 11:06:54 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 6571F8FC1D
	for <freebsd-fs@FreeBSD.org>; Mon, 27 Sep 2010 11:06:54 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8RB6sOK023458
	for <freebsd-fs@FreeBSD.org>; Mon, 27 Sep 2010 11:06:54 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8RB6r2Y023454
	for freebsd-fs@FreeBSD.org; Mon, 27 Sep 2010 11:06:53 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 27 Sep 2010 11:06:53 GMT
Message-Id: <201009271106.o8RB6r2Y023454@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 11:06:54 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/150910  fs         [nfs] wsize=16384 on udp nfs mount unusable
o kern/150796  fs         [panic] [suj] [ufs] [softupdates] Panic on portbuild
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149855  fs         [gvinum] growfs causes fsck to report errors in Filesy
o kern/149495  fs         [zfs] chflags sappend on zfs not working right
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149022  fs         [hang] File system operations hangs with suspfs state
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o bin/148296   fs         [zfs] [loader] [patch] Very slow probe in /usr/src/sys
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147790  fs         [zfs] zfs set acl(mode|inherit) fails on existing zfs
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/147292  fs         [nfs] [patch] readahead missing in nfs client options 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
o kern/146375  fs         [nfs] [patch] Typos in macro variables names in sys/fs
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
o kern/144458  fs         [nfs] [patch] nfsd fails as a kld
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o kern/143345  fs         [ext2fs] [patch] extfs minor header cleanups to better
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142924  fs         [ext2fs] [patch] Small cleanup for the inode struct in
o kern/142914  fs         [zfs] ZFS performance degradation over time
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142401  fs         [ntfs] [patch] Minor updates to NTFS from NetBSD
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140134  fs         [msdosfs] write and fsck destroy filesystem integrity
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
o bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/139363  fs         [nfs] diskless root nfs mount from non FreeBSD server 
o kern/138790  fs         [zfs] ZFS ceases caching when mem demand is high
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
f kern/137037  fs         [zfs] [hang] zfs rollback on root causes FreeBSD to fr
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
o kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135667  fs         [lor] LORs causing ufs filesystem corruption on XEN Do
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133614  fs         [panic] panic: ffs_truncate: read-only filesystem
o kern/133174  fs         [msdosfs] [patch] msdosfs must support utf-encoded int
f kern/133150  fs         [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/129059  fs         [zfs] [patch] ZFS bootloader whitelistable via WITHOUT
f kern/128829  fs         smbd(8) causes periodic panic on 7-RELEASE
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
p kern/124621  fs         [ext3] [patch] Cannot mount ext2fs partition
f bin/124424   fs         [zfs] zfs(8): zfs list -r shows strange snapshots' siz
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121779   fs         [ufs] snapinfo(8) (and related tools?) only work for t
o bin/121366   fs         [zfs] [patch] Automatic disk scrubbing from periodic(8
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
f kern/120991  fs         [panic] [fs] [snapshot] System crashes when manipulati
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
f kern/119735  fs         [zfs] geli + ZFS + samba starting on boot panics 7.0-B
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         mv(1): moving a directory changes its mtime
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
p kern/116608  fs         [msdosfs] [patch] msdosfs fails to check mount options
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o kern/116170  fs         [panic] Kernel panic when mounting /tmp
o kern/115645  fs         [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/106030  fs         [ufs] [panic] panic in ufs from geom when a dead disk 
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [iso9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o bin/94635    fs         snapinfo(8)/libufs only works for disk-backed filesyst
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
f kern/91568   fs         [ufs] [panic] writing to UFS/softupdates DVD media in 
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/85326   fs         [smbfs] [panic] saving a file via samba to an overquot
o kern/84589   fs         [2TB] 5.4-STABLE unresponsive during background fsck 2
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o kern/33464   fs         [ufs] soft update inconsistencies after system crash
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

207 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 14:24:49 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F18F6106564A
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 14:24:49 +0000 (UTC)
	(envelope-from cal@linu.gs)
Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124])
	by mx1.freebsd.org (Postfix) with ESMTP id B2CF28FC14
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 14:24:49 +0000 (UTC)
Received: from [77.109.131.203] (port=39421 helo=aare.localnet)
	by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69 (FreeBSD)) (envelope-from <cal@linu.gs>)
	id 1P0EdL-0006oU-3B
	for freebsd-fs@freebsd.org; Mon, 27 Sep 2010 16:24:48 +0200
From: Michael Naef <cal@linu.gs>
To: "freebsd-fs" <freebsd-fs@freebsd.org>
Date: Mon, 27 Sep 2010 16:24:42 +0200
User-Agent: KMail/1.13.5 (Linux/2.6.34-gentoo-r1; KDE/4.4.5; i686; ; )
References: <201009231938.09548.cal@linu.gs>
	<66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch>
In-Reply-To: <66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201009271624.46655.cal@linu.gs>
Subject: Re: Strange behaviour with sappend flag set on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 14:24:50 -0000

Hi all

On Friday 24 September 2010 01:15:55 Markus Gebert wrote:

> CURRENT and STABLE-8 seem to be affected to. The following patch
> seems to fix it (at least Michi's test case works fine with
> it):
> 
> ----
> diff -ru
> ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops
> .c ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> ---
> ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops
> .c   2010-05-19 08:49:52.000000000 +0200 +++
> ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c   
>     2010-09-23 23:24:43.549846948 +0200 @@ -709,7 +709,7 @@
>          */
>         pflags = zp->z_phys->zp_flags;
>         if ((pflags & (ZFS_IMMUTABLE | ZFS_READONLY)) ||
> -           ((pflags & ZFS_APPENDONLY) && !(ioflag & FAPPEND) &&
> +           ((pflags & ZFS_APPENDONLY) && !(ioflag & IO_APPEND)
> && (uio->uio_loffset < zp->z_phys->zp_size))) {
> ZFS_EXIT(zfsvfs);
>                 return (EPERM);
> ----
> 
> Can someone commit this if the patch is ok? Or should I (or
> Michi) open a PR?

Whats the next step? Is anyboby willing and able to commit the 
patch or should/must I open a PR? (Having a patch for bash which 
solves the most urgent problem, though - but I need a decision.)

cheers and thanks, Michi

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 16:12:38 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7D038106564A
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 16:12:38 +0000 (UTC)
	(envelope-from gljennjohn@googlemail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 0429A8FC14
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 16:12:37 +0000 (UTC)
Received: by fxm9 with SMTP id 9so3746236fxm.13
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 09:12:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=domainkey-signature:received:received:date:from:to:cc:subject
	:message-id:in-reply-to:references:reply-to:x-mailer:mime-version
	:content-type:content-transfer-encoding;
	bh=2fKP+WSswqpLH4tDTFPmeuAveSr1kuZ2qfc/AGE3GVw=;
	b=dYGLrL4i2ByURrP5aKgCvocADISbCpTISQuS4n5iE5ybPDdmTtTrUCeRy0k/JRU/6G
	Av6FXjj+i1hQwv93SRQ7p5P6vHOiloD5BKNsMUbJRGpwZ1leGN/BaACi8ZWuyi3sVGpk
	xwHCiNaMf5yMzoJnPxWNXiL8hBX+PEuSVDRD0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma;
	h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to
	:x-mailer:mime-version:content-type:content-transfer-encoding;
	b=wVurB1pNVV8K9bcbX4Yr4wDHA/yMnVLT2C5sVXS+j4R2PZm6D5OKy00ij9swtou9mo
	/CTY89a6uYd6tSd7AzhNWl9ETpVodGBAwmYTeLlG9JZAnFWyoGYb8WDRZuJemgMrvGzl
	+JCs/jLenkHaplfVzK8c5q5StIdrjsvCjsrxs=
Received: by 10.223.120.72 with SMTP id c8mr3500278far.65.1285603956860;
	Mon, 27 Sep 2010 09:12:36 -0700 (PDT)
Received: from ernst.jennejohn.org (p578E3B24.dip.t-dialin.net [87.142.59.36])
	by mx.google.com with ESMTPS id r8sm2507997faq.34.2010.09.27.09.12.35
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Mon, 27 Sep 2010 09:12:35 -0700 (PDT)
Date: Mon, 27 Sep 2010 18:12:33 +0200
From: Gary Jennejohn <gljennjohn@googlemail.com>
To: Michael Naef <cal@linu.gs>
Message-ID: <20100927181233.0e8c2869@ernst.jennejohn.org>
In-Reply-To: <201009271624.46655.cal@linu.gs>
References: <201009231938.09548.cal@linu.gs>
	<66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch>
	<201009271624.46655.cal@linu.gs>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.18.7; amd64-portbld-freebsd9.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: Strange behaviour with sappend flag set on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: gljennjohn@googlemail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 16:12:38 -0000

On Mon, 27 Sep 2010 16:24:42 +0200
Michael Naef <cal@linu.gs> wrote:

> Hi all
> 
> On Friday 24 September 2010 01:15:55 Markus Gebert wrote:
> 
> > CURRENT and STABLE-8 seem to be affected to. The following patch
> > seems to fix it (at least Michi's test case works fine with
> > it):
> > 
> > ----
> > diff -ru
> > ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops
> > .c ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> > ---
> > ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops
> > .c   2010-05-19 08:49:52.000000000 +0200 +++
> > ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c   
> >     2010-09-23 23:24:43.549846948 +0200 @@ -709,7 +709,7 @@
> >          */
> >         pflags = zp->z_phys->zp_flags;
> >         if ((pflags & (ZFS_IMMUTABLE | ZFS_READONLY)) ||
> > -           ((pflags & ZFS_APPENDONLY) && !(ioflag & FAPPEND) &&
> > +           ((pflags & ZFS_APPENDONLY) && !(ioflag & IO_APPEND)
> > && (uio->uio_loffset < zp->z_phys->zp_size))) {
> > ZFS_EXIT(zfsvfs);
> >                 return (EPERM);
> > ----
> > 
> > Can someone commit this if the patch is ok? Or should I (or
> > Michi) open a PR?
> 
> Whats the next step? Is anyboby willing and able to commit the 
> patch or should/must I open a PR? (Having a patch for bash which 
> solves the most urgent problem, though - but I need a decision.)
> 

Sending a PR is always a good idea - it ends up in the tracking
system and doesn't get lost in the mailing-list noise.

--
Gary Jennejohn

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 16:28:38 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CED3D1065673
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 16:28:38 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 0AF9C8FC08
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 16:28:37 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA13048;
	Mon, 27 Sep 2010 19:28:31 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA0C62E.4080809@icyb.net.ua>
Date: Mon, 27 Sep 2010 19:28:30 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: gljennjohn@googlemail.com
References: <201009231938.09548.cal@linu.gs>	<66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch>	<201009271624.46655.cal@linu.gs>
	<20100927181233.0e8c2869@ernst.jennejohn.org>
In-Reply-To: <20100927181233.0e8c2869@ernst.jennejohn.org>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: Strange behaviour with sappend flag set on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 16:28:38 -0000

on 27/09/2010 19:12 Gary Jennejohn said the following:
> 
> Sending a PR is always a good idea - it ends up in the tracking
> system and doesn't get lost in the mailing-list noise.

Yeah, it just gets lost in the PR database instead :)
While the thread is active there is some hope that someone would get hooked.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 27 21:23:11 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 96212106566B
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 21:23:11 +0000 (UTC)
	(envelope-from jhellenthal@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 53C008FC22
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 21:23:11 +0000 (UTC)
Received: by iwn34 with SMTP id 34so6388153iwn.13
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 14:23:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:sender:message-id:date:from
	:user-agent:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:content-type:content-transfer-encoding;
	bh=EA9NdXItbk6sWNSzraOx0Y1AfiVVwznnxFf7JqcqZM0=;
	b=uOf5zcH4WsmULiD5E+nt6ZqEhgvhh+3o8PEUnatIGcAvAzrvJ21mh05x0ZRrm8zTnQ
	UrZ3SALquDUsu5fHhdsk67ZMsKDXhp9F2t5GvKmvLUsW1mONfz7aRNTTi6m2gDcXtzRF
	DNzifbXCcEpZbwO1nmTczMOb7o0T9X1VpmV28=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:x-enigmail-version:content-type
	:content-transfer-encoding;
	b=FmlEMClIUAsFqkQgUofuwNkgTD+U86jeZmHWn7XI+M6ilAElue45nBdWCjctLV7SpH
	8LNgKsMbl92CK/hNdIrDHinUoGSI6tGWHBv99SSDaQ3uk+hdKb3EkL/SytAqYw9dWOSt
	LWc1mXiWXnnBpY47YXOxuaXaP2gPKJKwnoB0s=
Received: by 10.231.33.203 with SMTP id i11mr8914079ibd.8.1285622590414;
	Mon, 27 Sep 2010 14:23:10 -0700 (PDT)
Received: from centel.dataix.local
	(adsl-99-181-158-110.dsl.klmzmi.sbcglobal.net [99.181.158.110])
	by mx.google.com with ESMTPS id i6sm6655851iba.20.2010.09.27.14.23.08
	(version=SSLv3 cipher=RC4-MD5); Mon, 27 Sep 2010 14:23:08 -0700 (PDT)
Sender: "J. Hellenthal" <jhellenthal@gmail.com>
Message-ID: <4CA10B3A.10401@DataIX.net>
Date: Mon, 27 Sep 2010 17:23:06 -0400
From: jhell <jhell@DataIX.net>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US;
	rv:1.9.2.9) Gecko/20100917 Lightning/1.0b1 Thunderbird
MIME-Version: 1.0
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
References: <201009231938.09548.cal@linu.gs>	<66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch>	<201009271624.46655.cal@linu.gs>
	<20100927181233.0e8c2869@ernst.jennejohn.org>
	<4CA0C62E.4080809@icyb.net.ua>
In-Reply-To: <4CA0C62E.4080809@icyb.net.ua>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Andriy Gapon <avg@icyb.net.ua>
Subject: Re: Strange behaviour with sappend flag set on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 21:23:11 -0000

On 09/27/2010 12:28, Andriy Gapon wrote:
> on 27/09/2010 19:12 Gary Jennejohn said the following:
>>
>> Sending a PR is always a good idea - it ends up in the tracking
>> system and doesn't get lost in the mailing-list noise.
> 
> Yeah, it just gets lost in the PR database instead :)
> While the thread is active there is some hope that someone would get hooked.
> 

Yeah! ;)

On the same note, hopes and wishes high, would be real nice if all
userland flags would work. Specifically arch, opaque, uappend, uchg,
uunlink. But like I said this is just a wish and more a convenience than
anything else. If I would put something at the top of the list it would
be opaque and then uchg.


Regards,

Hi Andriy,

-- 

 jhell,v

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 02:38:15 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E21DD106566B
	for <freebsd-fs@freebsd.org>; Tue, 28 Sep 2010 02:38:15 +0000 (UTC)
	(envelope-from bsdunix44@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 9B9E38FC0C
	for <freebsd-fs@freebsd.org>; Tue, 28 Sep 2010 02:38:15 +0000 (UTC)
Received: by yxn35 with SMTP id 35so2200909yxn.13
	for <freebsd-fs@freebsd.org>; Mon, 27 Sep 2010 19:38:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:from:to
	:content-type:content-transfer-encoding:mime-version:subject:date
	:x-mailer; bh=yjWxlVVcQdNyYIImg2i8flXoTwAmx9ObTJSDPT6IMJ8=;
	b=dkndVuG5ykpA3orIu/+nkdaBYJZWjoyTSQ40i7qlI7JcTbgU5DnsSwUNYl1va87VKL
	eBO7wMJiUf9Rr83whdlmdJozKuiuaQRwzwmCFWTvYlnKxyaGS0jobQdN/aNTM0BXD+m9
	aIF0aVUN78pDEK11NTrMESDBpuvPWgwotZvHM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:from:to:content-type:content-transfer-encoding
	:mime-version:subject:date:x-mailer;
	b=ga6zFZikABErEfuTkuiLbua12xacL2F5L7lm6j3mGy58Rr9M/gBugZarr+pYTZn0FQ
	JJH0i92RpqwiwSweInhBaLH6QWhmKViiLxiF6GakLsmF3jOcFPEmBYvezX4Flw7LG+ms
	vXlXX1pxG1U4tCnNZrj3msSw+PMcrWJb6rUV8=
Received: by 10.150.12.9 with SMTP id 9mr10106824ybl.213.1285641492895;
	Mon, 27 Sep 2010 19:38:12 -0700 (PDT)
Received: from [192.168.1.4] (ip98-164-15-137.ks.ks.cox.net [98.164.15.137])
	by mx.google.com with ESMTPS id u42sm9966157yba.12.2010.09.27.19.38.11
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Mon, 27 Sep 2010 19:38:11 -0700 (PDT)
Message-Id: <246F9240-10FE-4BD6-B72B-D374F9BB1FC9@gmail.com>
From: Chris Watson <bsdunix44@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Mon, 27 Sep 2010 21:38:04 -0500
X-Mailer: Apple Mail (2.936)
Subject: zdb and zpool status inconsistency question...
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 02:38:16 -0000

Apologies if this is common knowledge but I am confused about that  
output of zdb and zpool status. Running a:

priyanka# zdb data
     version=14
     name='data'
     state=0
     txg=23
     pool_guid=7697236283104447800
     hostid=1421614680
     hostname='priyanka.open-systems.net'
     vdev_tree
         type='root'
         id=0
         guid=7697236283104447800
         children[0]
                 type='mirror'
                 id=0
                 guid=13989036133163076272
                 metaslab_array=26
                 metaslab_shift=33
                 ashift=9
                 asize=1000199946240
                 is_log=0
                 children[0]
                         type='disk'
                         id=0
                         guid=15173803910329500054
                         path='/dev/ada2'
                         whole_disk=0
                 children[1]
                         type='disk'
                         id=1
                         guid=17277025077506889808
                         path='/dev/ada3'
                         whole_disk=0
         children[1]
                 type='mirror'
                 id=1
                 guid=5773672864445772603
                 metaslab_array=23
                 metaslab_shift=33
                 ashift=9
                 asize=1000199946240
                 is_log=0
                 children[0]
                         type='disk'
                         id=0
                         guid=2441189965306101196
                         path='/dev/ada4'
                         whole_disk=0
                 children[1]
                         type='disk'
                         id=1
                         guid=6210476332908709518
                         path='/dev/ada5'
                         whole_disk=0
Uberblock

	magic = 0000000000bab10c
	version = 14
	txg = 11387
	guid_sum = 13222208345635842403
	timestamp = 1285637267 UTC = Mon Sep 27 20:27:47 2010

Dataset mos [META], ID 0, cr_txg 4, 1.06M, 44 objects
Dataset data/Aperture [ZPL], ID 31, cr_txg 37, 47.9G, 17596 objects
Dataset data [ZPL], ID 16, cr_txg 1, 19.0K, 5 objects
[...]
                             capacity   operations   bandwidth  ----  
errors ----
description                used avail  read write  read write  read  
write cksum
data                      47.9G 1.77T   292     0 32.6M     0      
0     0     2
   mirror                  23.9G  904G   146     0 16.3M     0      
0     0     6
     /dev/ada2                           141     0 16.5M     0      
0     0     6
     /dev/ada3                           141     0 16.5M     0      
0     0     6
   mirror                  23.9G  904G   146     0 16.3M     0      
0     0     2
     /dev/ada4                           141     0 16.5M     0      
0     0     2
     /dev/ada5                           141     0 16.5M     0      
0     0     2
priyanka#

produces cksum errors of 2,6,6,6,2,2,2 respectively.

While a:

priyanka# zpool status -v data
   pool: data
  state: ONLINE
  scrub: scrub completed after 0h6m with 0 errors on Mon Sep 27  
20:22:10 2010
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  mirror    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0
	    ada3    ONLINE       0     0     0
	  mirror    ONLINE       0     0     0
	    ada4    ONLINE       0     0     0
	    ada5    ONLINE       0     0     0

errors: No known data errors
priyanka#

So the two questions I have that I don't understand are the following:

1) Why does zdb report cksum errors while zpool status does not?

2) Assuming zdb is correct, shouldnt the errors from zdb for the pool  
"data" be 8 instead of 2? Since the first mirror has 6 and the second  
mirror has 2 cksum errors?

The zdb man page is pretty sparse. And I know it's not meant to be run  
by the average joe. I'm just trying to learn ZFS as thoroughly as I  
can. So while I have a test system I am trying many configs and  
options to learn how it works and why. And the above inconsistency  
confused me. Again apologies if this is covered elsewhere.

Thanks for any schooling about the above!

Chris


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 11:24:38 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 622EE1065670;
	Tue, 28 Sep 2010 11:24:38 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id F003C8FC22;
	Tue, 28 Sep 2010 11:24:37 +0000 (UTC)
Received: from localhost (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id F0615153434;
	Tue, 28 Sep 2010 13:24:36 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id d7i5zrXYXYnO; Tue, 28 Sep 2010 13:24:31 +0200 (CEST)
Received: from [127.0.0.1] (opteron [192.168.10.67])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 08986153433;
	Tue, 28 Sep 2010 13:24:31 +0200 (CEST)
Message-ID: <4CA1D06C.9050305@digiware.nl>
Date: Tue, 28 Sep 2010 13:24:28 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: stable@freebsd.org, fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 11:24:38 -0000

Hi,

This is with stable as of yesterday,but with an un-tunned ZFS box I was 
still able to generate a kmem exhausted panic.
Hard panic, just 3 lines.

The box contains 12Gb memory, runs on a 6 core (with HT) xeon.
6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log.

The box died while rsyncing 5.8T from its partnering system.
(that was the only activity on the box)

So the obvious would to conclude that auto-tuning voor ZFS on 8.1-Stable 
is not yet quite there.

So I guess that we still need tuning advice even for 8.1.
And thus prevent a hard panic.

At the moment trying to 'zfs send | rsh zfs receive' the stuff.
Which seems to run at about 40Mb/sec, and is a lot faster than the rsync 
stuff.

--WjW

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 12:04:05 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 124591065673
	for <fs@freebsd.org>; Tue, 28 Sep 2010 12:04:05 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta01.westchester.pa.mail.comcast.net
	(qmta01.westchester.pa.mail.comcast.net [76.96.62.16])
	by mx1.freebsd.org (Postfix) with ESMTP id B4A3C8FC0C
	for <fs@freebsd.org>; Tue, 28 Sep 2010 12:04:04 +0000 (UTC)
Received: from omta10.westchester.pa.mail.comcast.net ([76.96.62.28])
	by qmta01.westchester.pa.mail.comcast.net with comcast
	id CAax1f0060cZkys51BqqWX; Tue, 28 Sep 2010 11:50:50 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta10.westchester.pa.mail.comcast.net with comcast
	id CBqo1f00P3LrwQ23WBqp1Z; Tue, 28 Sep 2010 11:50:50 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 5DA489B418; Tue, 28 Sep 2010 04:50:47 -0700 (PDT)
Date: Tue, 28 Sep 2010 04:50:47 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Willem Jan Withagen <wjw@digiware.nl>
Message-ID: <20100928115047.GA62142@icarus.home.lan>
References: <4CA1D06C.9050305@digiware.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4CA1D06C.9050305@digiware.nl>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 12:04:05 -0000

On Tue, Sep 28, 2010 at 01:24:28PM +0200, Willem Jan Withagen wrote:
> This is with stable as of yesterday,but with an un-tunned ZFS box I
> was still able to generate a kmem exhausted panic.
> Hard panic, just 3 lines.
> 
> The box contains 12Gb memory, runs on a 6 core (with HT) xeon.
> 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log.
> 
> The box died while rsyncing 5.8T from its partnering system.
> (that was the only activity on the box)

It would help if you could provide output from the following commands
(even after the box has rebooted):

$ sysctl -a | egrep ^vm.kmem
$ sysctl -a | egrep ^vfs.zfs.arc
$ sysctl kstat.zfs.misc.arcstats

> So the obvious would to conclude that auto-tuning voor ZFS on
> 8.1-Stable is not yet quite there.
> 
> So I guess that we still need tuning advice even for 8.1.
> And thus prevent a hard panic.

Andriy Gapon provides this general recommendation:

http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html

The advice I've given for RELENG_8 (as of the time of this writing),
8.1-STABLE, and 8.1-RELEASE, is that for amd64 you'll need to tune:

vm.kmem_size
vfs.zfs.arc_max

An example machine: amd64, with 4GB physical RAM installed (3916MB
available for use (verified via dmesg)) uses values:

vm.kmem_size="4096M"
vfs.zfs.arc_max="3584M"

Another example machine: amd64, with 8GB physical RAM installed (7875MB
available for use) uses values:

vm.kmem_size="8192M"
vfs.zfs.arc_max="6144M"

I believe the trick -- Andriy, please correct me if I'm wrong -- is the
tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high
watermark".

However, I believe there have been occasional reports of exhaustion
panics despite both of these being set[1].  Those reports are being
investigated on an individual basis.

I set some other ZFS-related parameters as well (disabling prefetch,
adjusting txg.timeout, etc.), but those shouldn't be necessary to gain
stability at this point in time.

I can't provide tuning advice for i386.


[1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 12:37:28 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D35BC106564A
	for <fs@freebsd.org>; Tue, 28 Sep 2010 12:37:28 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 1E7FF8FC0C
	for <fs@freebsd.org>; Tue, 28 Sep 2010 12:37:27 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA03464;
	Tue, 28 Sep 2010 15:22:02 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1DDE9.8090107@icyb.net.ua>
Date: Tue, 28 Sep 2010 15:22:01 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
In-Reply-To: <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 12:37:28 -0000

on 28/09/2010 14:50 Jeremy Chadwick said the following:
> I believe the trick -- Andriy, please correct me if I'm wrong -- is the

Wouldn't hurt to CC me, so that I could do it :-)

> tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high
> watermark".

Not sure what you mean here.
What is hard limit, what is high watermark, what is the difference and when is
"now"? :-)

I believe that "the trick" is to set vm.kmem_size high enough, eitehr using this
tunable or vm.kmem_size_scale.

> However, I believe there have been occasional reports of exhaustion
> panics despite both of these being set[1].  Those reports are being
> investigated on an individual basis.

I don't believe that the report that you quote actually demonstrates what you say
it does.
Two quotes from it:
"During these panics no tuning or /boot/loader.conf values where present."
"Only after hitting this behaviour yesterday i created boot/loader.conf"

> 
> [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html
> 


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 13:25:39 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 927601065673;
	Tue, 28 Sep 2010 13:25:39 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id DF9AC8FC17;
	Tue, 28 Sep 2010 13:25:38 +0000 (UTC)
Received: from localhost (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id 47227153434;
	Tue, 28 Sep 2010 15:25:37 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Cri7K4Ph43q9; Tue, 28 Sep 2010 15:25:33 +0200 (CEST)
Received: from [127.0.0.1] (unknown [192.168.254.10])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 63A24153433;
	Tue, 28 Sep 2010 15:25:33 +0200 (CEST)
Message-ID: <4CA1ECCC.4070801@digiware.nl>
Date: Tue, 28 Sep 2010 15:25:32 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
In-Reply-To: <20100928115047.GA62142@icarus.home.lan>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, "avg@icyb.net.ua >> Andriy Gapon" <avg@icyb.net.ua>,
	fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 13:25:39 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28-9-2010 13:50, Jeremy Chadwick wrote:
> On Tue, Sep 28, 2010 at 01:24:28PM +0200, Willem Jan Withagen wrote:
>> This is with stable as of yesterday,but with an un-tunned ZFS box I
>> was still able to generate a kmem exhausted panic.
>> Hard panic, just 3 lines.
>>
>> The box contains 12Gb memory, runs on a 6 core (with HT) xeon.
>> 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log.
>>
>> The box died while rsyncing 5.8T from its partnering system.
>> (that was the only activity on the box)
> 
> It would help if you could provide output from the following commands
> (even after the box has rebooted):

It is currently in the proces of zfs receive of that same 5.8T.

> $ sysctl -a | egrep ^vm.kmem
> $ sysctl -a | egrep ^vfs.zfs.arc
> $ sysctl kstat.zfs.misc.arcstats

> sysctl -a | egrep ^vm.kmem
vm.kmem_size_scale: 3
vm.kmem_size_max: 329853485875
vm.kmem_size_min: 0
vm.kmem_size: 4156850176

> sysctl -a | egrep ^vfs.zfs.arc
vfs.zfs.arc_meta_limit: 770777088
vfs.zfs.arc_meta_used: 33449648
vfs.zfs.arc_min: 385388544
vfs.zfs.arc_max: 3083108352

>  sysctl kstat.zfs.misc.arcstats
kstat.zfs.misc.arcstats.hits: 3119873
kstat.zfs.misc.arcstats.misses: 98710
kstat.zfs.misc.arcstats.demand_data_hits: 3043947
kstat.zfs.misc.arcstats.demand_data_misses: 3699
kstat.zfs.misc.arcstats.demand_metadata_hits: 67981
kstat.zfs.misc.arcstats.demand_metadata_misses: 90005
kstat.zfs.misc.arcstats.prefetch_data_hits: 121
kstat.zfs.misc.arcstats.prefetch_data_misses: 48
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 7824
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4958
kstat.zfs.misc.arcstats.mru_hits: 34828
kstat.zfs.misc.arcstats.mru_ghost_hits: 21736
kstat.zfs.misc.arcstats.mfu_hits: 3077133
kstat.zfs.misc.arcstats.mfu_ghost_hits: 47605
kstat.zfs.misc.arcstats.allocated: 5507025
kstat.zfs.misc.arcstats.deleted: 5349715
kstat.zfs.misc.arcstats.stolen: 4468221
kstat.zfs.misc.arcstats.recycle_miss: 83995
kstat.zfs.misc.arcstats.mutex_miss: 231
kstat.zfs.misc.arcstats.evict_skip: 130461
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_l2_eligible: 592200836608
kstat.zfs.misc.arcstats.evict_l2_ineligible: 11000092160
kstat.zfs.misc.arcstats.hash_elements: 20585
kstat.zfs.misc.arcstats.hash_elements_max: 150543
kstat.zfs.misc.arcstats.hash_collisions: 761847
kstat.zfs.misc.arcstats.hash_chains: 780
kstat.zfs.misc.arcstats.hash_chain_max: 6
kstat.zfs.misc.arcstats.p: 2266075295
kstat.zfs.misc.arcstats.c: 2410082200
kstat.zfs.misc.arcstats.c_min: 385388544
kstat.zfs.misc.arcstats.c_max: 3083108352
kstat.zfs.misc.arcstats.size: 2410286720
kstat.zfs.misc.arcstats.hdr_size: 7565040
kstat.zfs.misc.arcstats.data_size: 2394099200
kstat.zfs.misc.arcstats.other_size: 8622480
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_read_bytes: 0
kstat.zfs.misc.arcstats.l2_write_bytes: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 85908
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0


>> So the obvious would to conclude that auto-tuning voor ZFS on
>> 8.1-Stable is not yet quite there.
>>
>> So I guess that we still need tuning advice even for 8.1.
>> And thus prevent a hard panic.
> 
> Andriy Gapon provides this general recommendation:
> 
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html
> 
> The advice I've given for RELENG_8 (as of the time of this writing),
> 8.1-STABLE, and 8.1-RELEASE, is that for amd64 you'll need to tune:

Well advises seem to vary, and the latest I understood was that
8.1-stable did not need any tuning. (The other system with a much older
kernel is tuned as to what most here are suggesting)
And I was shure led to believe that even since 8.0 panics were no longer
among us......
> 
> vm.kmem_size
> vfs.zfs.arc_max

real memory  = 12889096192 (12292 MB)
avail memory = 12408684544 (11833 MB)

So that prompts vm.kmem_size=18G.

Form the other post:
> As to arc_max/arc_min, set them based your needs according to general
> ZFS recommendations.

I'm seriously at a loss what general recommendations would be.

The other box has 8G
loader.conf:
vm.kmem_size="14G"      # 2* phys RAM size for ZFS perf.
vm.kmem_size_scale="1"
vfs.zfs.arc_min="1G"
vfs.zfs.arc_max="6G"

So I'd select something like 11G for arc_max on a box with 12G mem.

> I believe the trick -- Andriy, please correct me if I'm wrong -- is the
> tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high
> watermark".

> I can't provide tuning advice for i386.

This is amd64.

- --WjW
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJMoezMAAoJEP4k4K6R6rBhEScIAI/rZH5/VTmASMGyEYu4NZHU
SSFo3TOSOkYPEJicd8/NgM7w7D3xgMA0Xse0fu3tQOsjX940Z6fUKvnM7LCX2OJK
vvkW0LpGuKbv/9sFFvkklodjkArtRzzoptLtiCVsaYsoieRqnmYMpBxU9WFYCY2I
HoRx1nMbArg2HvKPzeZjf9knnQaU6YOR/PUiFBo6YuHkDJ40noqRElewbPEiOVZz
zqnUh90ZDFVdHMYNuZegOKtfSVCA1AifHR3e7+zn8jSco/+svESd7tBIxmHZWQ8u
BA1AKyYVTHs+wKsTw2J7u1v8yg74HxJNyVqwPRP048Z8onoPlGgtnFCTWbl2ICU=
=KiyH
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 13:36:47 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1C76E106564A;
	Tue, 28 Sep 2010 13:36:47 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 27D078FC0C;
	Tue, 28 Sep 2010 13:36:45 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04596;
	Tue, 28 Sep 2010 16:36:41 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1EF69.4040402@icyb.net.ua>
Date: Tue, 28 Sep 2010 16:36:41 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
In-Reply-To: <20100928132355.GA63149@icarus.home.lan>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 13:36:47 -0000

on 28/09/2010 16:23 Jeremy Chadwick said the following:
> On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote:
>> on 28/09/2010 14:50 Jeremy Chadwick said the following:
>>> I believe the trick -- Andriy, please correct me if I'm wrong -- is the
>>
>> Wouldn't hurt to CC me, so that I could do it :-)
>>
>>> tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high
>>> watermark".
>>
>> Not sure what you mean here.
>> What is hard limit, what is high watermark, what is the difference and when is
>> "now"? :-)
> 
> There was some speculation on the part of users a while back which lead
> to this understanding.  Folks were seeing actual ARC usage higher than
> what vfs.zfs.arc_max was set to (automatically or administratively).  I
> believe it started here:
> 
> http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28884.html
> 
> With the "high-water mark" statements being here:
> 
> http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28887.html
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-04/msg00129.html
> 
> The term implies that there is not an explicitly hard limit on the ARC
> utilisation/growth.  As stated in the unix.derkeiler.com URL above, this
> behaviour was in fact changed.  Why/when/how?  I had to go digging up
> the commits -- this took me some time.  Here they are, labelled r197816,
> for RELENG_8 and RELENG_7 respectively.  These were both committed on
> 2010/01/08 UTC:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.2
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.15.2.6
> 
> In HEAD/CURRENT (yet to be MFC'd), it looks like above code got removed
> on 2010/09/17 UTC, citing they should be "enforced by actual
> calculations of delta":
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.46
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.45
> 
> So what's this "delta" code piece that's mentioned?  That appears to be
> have been committed to RELENG_8 on 2010/05/24 UTC (thus, between the
> above two dates):
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.4
> 
> (Side note: the "delta stuff" was never committed to RELENG_7 -- and
> that's fine.  I'm pointing this out not out of retaliation or insult,
> but because people will almost certainly Google, find this post, and
> wonder if their 7.x machines might be affected.)
> 
> This situation with the ARC, and all its changes over time, is one of
> the reasons why I rant aggressively about the need for more
> communication transparency (re: what the changes actually affect).  Most
> SAs and users don't follow commits.


Well, no time for me to dig through all that history.
arc_max should be a hard limit and it is now. If it ever wasn't then it was a bug.

Besides, "high watermark" is still an ambiguous term, for you it "implies" that it
is not a hard limit, but for me it "implies" exactly a hard limit.

Additionally, going from "non-hard limit" to a "hard limit" on ARC size should
improve things memory-wise, not vice versa, right? :)

P.S.  All that I said above is a hint that this is a pointless branch of the thread :)

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 13:37:10 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5E5B610657C7
	for <fs@freebsd.org>; Tue, 28 Sep 2010 13:37:10 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.emeryville.ca.mail.comcast.net
	(qmta04.emeryville.ca.mail.comcast.net [76.96.30.40])
	by mx1.freebsd.org (Postfix) with ESMTP id 445A68FC1D
	for <fs@freebsd.org>; Tue, 28 Sep 2010 13:37:10 +0000 (UTC)
Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51])
	by qmta04.emeryville.ca.mail.comcast.net with comcast
	id CC4a1f00416AWCUA4DPxtk; Tue, 28 Sep 2010 13:23:57 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta06.emeryville.ca.mail.comcast.net with comcast
	id CDPw1f0013LrwQ28SDPwRH; Tue, 28 Sep 2010 13:23:57 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id EC1849B418; Tue, 28 Sep 2010 06:23:55 -0700 (PDT)
Date: Tue, 28 Sep 2010 06:23:55 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Andriy Gapon <avg@icyb.net.ua>
Message-ID: <20100928132355.GA63149@icarus.home.lan>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4CA1DDE9.8090107@icyb.net.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 13:37:10 -0000

On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote:
> on 28/09/2010 14:50 Jeremy Chadwick said the following:
> > I believe the trick -- Andriy, please correct me if I'm wrong -- is the
> 
> Wouldn't hurt to CC me, so that I could do it :-)
> 
> > tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high
> > watermark".
> 
> Not sure what you mean here.
> What is hard limit, what is high watermark, what is the difference and when is
> "now"? :-)

There was some speculation on the part of users a while back which lead
to this understanding.  Folks were seeing actual ARC usage higher than
what vfs.zfs.arc_max was set to (automatically or administratively).  I
believe it started here:

http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28884.html

With the "high-water mark" statements being here:

http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28887.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-04/msg00129.html

The term implies that there is not an explicitly hard limit on the ARC
utilisation/growth.  As stated in the unix.derkeiler.com URL above, this
behaviour was in fact changed.  Why/when/how?  I had to go digging up
the commits -- this took me some time.  Here they are, labelled r197816,
for RELENG_8 and RELENG_7 respectively.  These were both committed on
2010/01/08 UTC:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.2
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.15.2.6

In HEAD/CURRENT (yet to be MFC'd), it looks like above code got removed
on 2010/09/17 UTC, citing they should be "enforced by actual
calculations of delta":

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.46
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.45

So what's this "delta" code piece that's mentioned?  That appears to be
have been committed to RELENG_8 on 2010/05/24 UTC (thus, between the
above two dates):

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.4

(Side note: the "delta stuff" was never committed to RELENG_7 -- and
that's fine.  I'm pointing this out not out of retaliation or insult,
but because people will almost certainly Google, find this post, and
wonder if their 7.x machines might be affected.)

This situation with the ARC, and all its changes over time, is one of
the reasons why I rant aggressively about the need for more
communication transparency (re: what the changes actually affect).  Most
SAs and users don't follow commits.

> I believe that "the trick" is to set vm.kmem_size high enough, eitehr using this
> tunable or vm.kmem_size_scale.

Thanks for the clarification.  I just wish I knew how vm.kmem_size_scale
fit into the picture (meaning what it does, etc.).  The sysctl
description isn't very helpful.  Again, my lack of VM knowledge...

> > However, I believe there have been occasional reports of exhaustion
> > panics despite both of these being set[1].  Those reports are being
> > investigated on an individual basis.
> 
> I don't believe that the report that you quote actually demonstrates what you say
> it does.
> Two quotes from it:
> "During these panics no tuning or /boot/loader.conf values where present."
> "Only after hitting this behaviour yesterday i created boot/loader.conf"
> 
> > [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html

You're right -- the report I'm quoting is not the one I thought it was.
I'll see if I can dig up the correct mail/report.  It could be that I'm
thinking of something quite old (pre-ARC-changes (see above
paragraphs)).  I can barely keep track of all the changes going on.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 13:39:10 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 430CF10656B2;
	Tue, 28 Sep 2010 13:39:10 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 5AD548FC1E;
	Tue, 28 Sep 2010 13:39:09 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04620;
	Tue, 28 Sep 2010 16:39:06 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1EFF9.1050802@icyb.net.ua>
Date: Tue, 28 Sep 2010 16:39:05 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
In-Reply-To: <20100928132355.GA63149@icarus.home.lan>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 13:39:10 -0000

on 28/09/2010 16:23 Jeremy Chadwick said the following:
> On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote:
>> I believe that "the trick" is to set vm.kmem_size high enough, eitehr using this
>> tunable or vm.kmem_size_scale.
> 
> Thanks for the clarification.  I just wish I knew how vm.kmem_size_scale
> fit into the picture (meaning what it does, etc.).  The sysctl
> description isn't very helpful.  Again, my lack of VM knowledge...
> 

Roughly, vm.kmem_size would get set to <available memory> divided by
vm.kmem_size_scale.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 13:46:34 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 04837106566C;
	Tue, 28 Sep 2010 13:46:34 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 10D8B8FC08;
	Tue, 28 Sep 2010 13:46:32 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04711;
	Tue, 28 Sep 2010 16:46:29 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1F1B4.1020700@icyb.net.ua>
Date: Tue, 28 Sep 2010 16:46:28 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Willem Jan Withagen <wjw@digiware.nl>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl>
In-Reply-To: <4CA1ECCC.4070801@digiware.nl>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 13:46:34 -0000

on 28/09/2010 16:25 Willem Jan Withagen said the following:
> Well advises seem to vary, and the latest I understood was that
> 8.1-stable did not need any tuning. (The other system with a much older
> kernel is tuned as to what most here are suggesting)
> And I was shure led to believe that even since 8.0 panics were no longer
> among us......

Well, now you have demonstrated yourself that it is not always so.

>> vm.kmem_size
>> vfs.zfs.arc_max
> 
> real memory  = 12889096192 (12292 MB)
> avail memory = 12408684544 (11833 MB)
> 
> So that prompts vm.kmem_size=18G.
> 
> Form the other post:
>> As to arc_max/arc_min, set them based your needs according to general
>> ZFS recommendations.
> 
> I'm seriously at a loss what general recommendations would be.

Have you asked Mr. Google? :)
- http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
  Search for "Memory and Dynamic Reconfiguration Recommendation"
-
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache

Short version - decide how much memory you need for everything else but ZFS ARC.
If autotuned value suits you, then you don't need to change anything.

> The other box has 8G
> loader.conf:
> vm.kmem_size="14G"      # 2* phys RAM size for ZFS perf.
> vm.kmem_size_scale="1"

No need to set both of the above.  vm.kmem_size overrides vm.kmem_size_scale.

> vfs.zfs.arc_min="1G"
> vfs.zfs.arc_max="6G"
> 
> So I'd select something like 11G for arc_max on a box with 12G mem.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 14:02:33 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0DD92106567A;
	Tue, 28 Sep 2010 14:02:33 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 721E38FC15;
	Tue, 28 Sep 2010 14:02:32 +0000 (UTC)
Received: from localhost (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id C4C7515346A;
	Tue, 28 Sep 2010 16:02:31 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id WAt0f0YQumbi; Tue, 28 Sep 2010 16:02:25 +0200 (CEST)
Received: from [127.0.0.1] (unknown [192.168.254.10])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 9B7D215346C;
	Tue, 28 Sep 2010 16:02:25 +0200 (CEST)
Message-ID: <4CA1F570.6000602@digiware.nl>
Date: Tue, 28 Sep 2010 16:02:24 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Andriy Gapon <avg@icyb.net.ua>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua>
In-Reply-To: <4CA1F1B4.1020700@icyb.net.ua>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 14:02:33 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28-9-2010 15:46, Andriy Gapon wrote:
> on 28/09/2010 16:25 Willem Jan Withagen said the following:
>> Well advises seem to vary, and the latest I understood was that 
>> 8.1-stable did not need any tuning. (The other system with a much
>> older kernel is tuned as to what most here are suggesting) And I
>> was shure led to believe that even since 8.0 panics were no longer 
>> among us......
> 
> Well, now you have demonstrated yourself that it is not always so.

I thought I should share the knowledge. ;)
Which is not a bad thing ofr those (starting to) use ZFS.

I do not read commits, but do read a lot of FreeBSD groups.
And for me there is still a shroud of black art over ZFS.
Just glad that my main fileserver doesn't crash. (knock on wood).

>>> vm.kmem_size vfs.zfs.arc_max
>> 
>> real memory  = 12889096192 (12292 MB) avail memory = 12408684544
>> (11833 MB)
>> 
>> So that prompts vm.kmem_size=18G.
>> 
>> Form the other post:
>>> As to arc_max/arc_min, set them based your needs according to
>>> general ZFS recommendations.
>> 
>> I'm seriously at a loss what general recommendations would be.
> 
> Have you asked Mr. Google? :) -
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
>
> Search for "Memory and Dynamic Reconfiguration Recommendation"
> -
> 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
> 
> Short version - decide how much memory you need for everything else
> but ZFS ARC.
> If autotuned value suits you, then you don't need to change
> anything.

I do have (read) this document, but still that doesn't really give you
guidelines for tuning on FreeBSD. It is a fileserver without any serious
other apps.
I was using "auto-tuned", and that crashed my box. That is what started
this whole thread.

- --WjW
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJMofVwAAoJEP4k4K6R6rBhFaUH/3wahrGWO71+xBhHi/ayNoaf
DfbOWMD262XfualJudPRgoji7xb9lGaRmd4emv7QBcDjqzmcsiyIeXskT5IYKj7P
DvJDULIH66iKQrRZeIBouMXMhLfiLjjT85Lj1hE8fuGg8NAOv97dnUwvVIwC0/Ai
yzeeEHYivCYbRmzBhISlAWjdpSXk7xVs6gZnaLUUp953+Uv/8KmNLeG+laoWn+Hn
wdKHUG3kR0g/XwJIMc5dZzYvs2kdDPh47uLythoYGC0yaLCwtxLHqEGIPtb/Gypy
nIIWxOGtueJo2HjpS0+HlX/pTRW8tfYzXTzKgFKDd90t9fDt2p18BPSexuJSLVc=
=hSAg
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 14:07:34 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D1CBA106566C;
	Tue, 28 Sep 2010 14:07:34 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id C6EA28FC1A;
	Tue, 28 Sep 2010 14:07:33 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05060;
	Tue, 28 Sep 2010 17:07:29 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1F6A0.20109@icyb.net.ua>
Date: Tue, 28 Sep 2010 17:07:28 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Willem Jan Withagen <wjw@digiware.nl>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua>
	<4CA1F570.6000602@digiware.nl>
In-Reply-To: <4CA1F570.6000602@digiware.nl>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 14:07:34 -0000

on 28/09/2010 17:02 Willem Jan Withagen said the following:
> I do have (read) this document, but still that doesn't really give you
> guidelines for tuning on FreeBSD. It is a fileserver without any serious
> other apps.
> I was using "auto-tuned", and that crashed my box. That is what started
> this whole thread.

Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 14:09:06 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0B4DE106564A;
	Tue, 28 Sep 2010 14:09:06 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 7A4AC8FC25;
	Tue, 28 Sep 2010 14:09:05 +0000 (UTC)
Received: from localhost (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id D8026153434;
	Tue, 28 Sep 2010 16:09:04 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id NQjP-WbBQvEU; Tue, 28 Sep 2010 16:09:02 +0200 (CEST)
Received: from [127.0.0.1] (unknown [192.168.254.10])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 1AE54153433;
	Tue, 28 Sep 2010 16:09:02 +0200 (CEST)
Message-ID: <4CA1F6FD.5090807@digiware.nl>
Date: Tue, 28 Sep 2010 16:09:01 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Andriy Gapon <avg@icyb.net.ua>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua>
	<4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua>
In-Reply-To: <4CA1F6A0.20109@icyb.net.ua>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 14:09:06 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28-9-2010 16:07, Andriy Gapon wrote:
> on 28/09/2010 17:02 Willem Jan Withagen said the following:
>> I do have (read) this document, but still that doesn't really give you
>> guidelines for tuning on FreeBSD. It is a fileserver without any serious
>> other apps.
>> I was using "auto-tuned", and that crashed my box. That is what started
>> this whole thread.
> 
> Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size.
> 

I consider that a useful statement.

- --WjW

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJMofb9AAoJEP4k4K6R6rBhqaUH/iFd1GG/pGLEKY+savwCRQDA
iitWtiBnUVfscP3Cfy81Mrg0m3SNik+lgRD2ywC03jsE+6sJbExuw52G46RjpExc
EleJZTW74KvbLHBnVQd+gWUoULKfGx4sZSBuYlkFpANhbrucpYmyPftbpFzmpD7N
IOeeY6H7iOa4vnb03DLYY0iErL+ak8NtiSKqYTLYqDA/UWqVfOsvdcRbywrMIOoV
JoaoD+65ZQpFYkugiFr7/BtcxXA9GJNpsUI+vIADbDgr77XmhKfu0ky4/Ci5f/L9
8YbEzhobOtRBTjX4/JAl60ZC2ToPwyZ8F4Al7Kj8r7FJnpnhddw7XlVXqEouJxQ=
=X2gD
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 14:25:44 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C0A471065695;
	Tue, 28 Sep 2010 14:25:44 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id CC02B8FC13;
	Tue, 28 Sep 2010 14:25:43 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05445;
	Tue, 28 Sep 2010 17:25:39 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1FAE3.9090200@icyb.net.ua>
Date: Tue, 28 Sep 2010 17:25:39 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Willem Jan Withagen <wjw@digiware.nl>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua>
	<4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua>
	<4CA1F6FD.5090807@digiware.nl>
In-Reply-To: <4CA1F6FD.5090807@digiware.nl>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 14:25:44 -0000

on 28/09/2010 17:09 Willem Jan Withagen said the following:
> On 28-9-2010 16:07, Andriy Gapon wrote:
>> on 28/09/2010 17:02 Willem Jan Withagen said the following:
>>> I do have (read) this document, but still that doesn't really give you
>>> guidelines for tuning on FreeBSD. It is a fileserver without any serious
>>> other apps.
>>> I was using "auto-tuned", and that crashed my box. That is what started
>>> this whole thread.
> 
>> Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size.
> 
> 
> I consider that a useful statement.

Hm, looks like I've just given a bad advice.
It seems that auto-tuned arc_max is based on kmem size.
So if you use kmem size that is larger than available physical memory, then you
better limit arc_max to the available memory minus 1GB or so, if the autotuned
value is larger than that.

I think this needs to be fixed in the code.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 14:30:28 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BFA661065675;
	Tue, 28 Sep 2010 14:30:28 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 395C28FC13;
	Tue, 28 Sep 2010 14:30:28 +0000 (UTC)
Received: from localhost (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id 09F23153433;
	Tue, 28 Sep 2010 16:30:27 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id I07McwiaIwDQ; Tue, 28 Sep 2010 16:30:24 +0200 (CEST)
Received: from [127.0.0.1] (unknown [192.168.254.10])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 87AB3153435;
	Tue, 28 Sep 2010 16:30:23 +0200 (CEST)
Message-ID: <4CA1FBFE.3020107@digiware.nl>
Date: Tue, 28 Sep 2010 16:30:22 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Andriy Gapon <avg@icyb.net.ua>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua>
	<4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua>
	<4CA1F6FD.5090807@digiware.nl> <4CA1FAE3.9090200@icyb.net.ua>
In-Reply-To: <4CA1FAE3.9090200@icyb.net.ua>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 14:30:28 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28-9-2010 16:25, Andriy Gapon wrote:
> on 28/09/2010 17:09 Willem Jan Withagen said the following:
>> On 28-9-2010 16:07, Andriy Gapon wrote:
>>> on 28/09/2010 17:02 Willem Jan Withagen said the following:
>>>> I do have (read) this document, but still that doesn't really give you
>>>> guidelines for tuning on FreeBSD. It is a fileserver without any serious
>>>> other apps.
>>>> I was using "auto-tuned", and that crashed my box. That is what started
>>>> this whole thread.
>>
>>> Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size.
>>
>>
>> I consider that a useful statement.
> 
> Hm, looks like I've just given a bad advice.
> It seems that auto-tuned arc_max is based on kmem size.
> So if you use kmem size that is larger than available physical memory, then you
> better limit arc_max to the available memory minus 1GB or so, if the autotuned
> value is larger than that.
> 
> I think this needs to be fixed in the code.

So in my case (no other serious apps) with 12G phys mem:

vm.kmem_size=17G
vfs.zfs.arc_max=11G

- --WjW

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)

iQEcBAEBAgAGBQJMofv9AAoJEP4k4K6R6rBhrksH/0L7EP9oSi4hhITZTB0uIk8q
0IEKnc2ltnPUSFJXS9wP1r9iLzNFJJXGqrO1ZvZUFcJeXXwSzSjhD+zbd237yf/r
f5nQ7yBNPd7MxZlZjDkIXB9ZJYuE1u0KMfuQSxptzOWB7oin8MpXHa1YdX6CVE7A
3+hSykteHFFqs8qwUSzoUs47r0dW2WxXE2qAEurelL6VFn++K86d32F5WNv/SX4u
aN43r+/CgrjiJVNrxG+gchoicEnIaI90jepkjzpEMp8M85VF4skIZbflZrSSNheY
Wzi4LD2h8dFf/La+9EB5AYkMgRcTvXcgNkppIsZ94nf7oSyYNZFuxLYC3ilQetY=
=WYzV
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 14:32:19 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BE0AE106564A;
	Tue, 28 Sep 2010 14:32:19 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id AE7CA8FC14;
	Tue, 28 Sep 2010 14:32:18 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05575;
	Tue, 28 Sep 2010 17:32:13 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA1FC6D.1060000@icyb.net.ua>
Date: Tue, 28 Sep 2010 17:32:13 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Willem Jan Withagen <wjw@digiware.nl>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142@icarus.home.lan>
	<4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua>
	<4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua>
	<4CA1F6FD.5090807@digiware.nl> <4CA1FAE3.9090200@icyb.net.ua>
	<4CA1FBFE.3020107@digiware.nl>
In-Reply-To: <4CA1FBFE.3020107@digiware.nl>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 14:32:19 -0000

on 28/09/2010 17:30 Willem Jan Withagen said the following:
> So in my case (no other serious apps) with 12G phys mem:
> 
> vm.kmem_size=17G
> vfs.zfs.arc_max=11G
> 

Should be good.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 16:24:48 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A33D21065674;
	Tue, 28 Sep 2010 16:24:48 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 297328FC1B;
	Tue, 28 Sep 2010 16:24:47 +0000 (UTC)
Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152])
	(authenticated bits=0)
	by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SFo5c9027002
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 28 Sep 2010 15:50:06 GMT (envelope-from ben@wanderview.com)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Ben Kelly <ben@wanderview.com>
In-Reply-To: <4CA1EF69.4040402@icyb.net.ua>
Date: Tue, 28 Sep 2010 11:50:05 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
To: Andriy Gapon <avg@icyb.net.ua>
X-Mailer: Apple Mail (2.1081)
X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD
X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 16:24:48 -0000


On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote:
> Well, no time for me to dig through all that history.
> arc_max should be a hard limit and it is now. If it ever wasn't then =
it was a bug.

I believe the size of the arc could exceed the limit if your working set =
was larger than arc_max.  The arc can't (couldn't then, anyway) evict =
data that is still referenced.

A contributing factor at the time was that the page daemon did not take =
into account back pressure from the arc when deciding which pages to =
move from active to inactive, etc.  So data was more likely to be =
referenced and therefore forced to remain in the arc.

I'm not sure if this is still the current state.  I seem to remember =
some changesets mentioning arc back pressure at some point, but I don't =
know the details.

- Ben=

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 16:30:16 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC0DE1065694;
	Tue, 28 Sep 2010 16:30:16 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id E12CE8FC17;
	Tue, 28 Sep 2010 16:30:15 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07743;
	Tue, 28 Sep 2010 19:30:02 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA21809.7090504@icyb.net.ua>
Date: Tue, 28 Sep 2010 19:30:01 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Ben Kelly <ben@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
In-Reply-To: <FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 16:30:17 -0000

on 28/09/2010 18:50 Ben Kelly said the following:
> 
> On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote:
>> Well, no time for me to dig through all that history. arc_max should be a
>> hard limit and it is now. If it ever wasn't then it was a bug.
> 
> I believe the size of the arc could exceed the limit if your working set was
> larger than arc_max.  The arc can't (couldn't then, anyway) evict data that is
> still referenced.

I think that you are correct and I was wrong.
ARC would still allocate a new buffer even if it's at or above arc_max and can not
re-use any exisiting buffer.
But I think that this is more likely to happen with "tiny" ARC size.  I have hard
time imagining a workload at which gigabytes of data would be simultaneously and
continuously used (see below for definition of "used").

> A contributing factor at the time was that the page daemon did not take into
> account back pressure from the arc when deciding which pages to move from
> active to inactive, etc.  So data was more likely to be referenced and
> therefore forced to remain in the arc.

I don't think that this is what happened and I don't think that pagedaemon has
anything to do with the discussed issue.
I think that ARC buffers exist independently of pagedaemon and page cache.
I think that they are held only during time when I/O is happening to or from them.

> I'm not sure if this is still the current state.  I seem to remember some
> changesets mentioning arc back pressure at some point, but I don't know the
> details.

I think that backpressure has nothing to do with it.
If ZFS truly does I/O with all existing buffers and it needs a new buffer, then
the choices are limited: either block and wait, or go over the limit.
Apparently ZFS designers went with the latter option.

But as I've said, for non-tiny ARC sizes it's hard to imagine such amount of
parallel I/O that would tie all ARC buffers.  Given the adaptive nature of ARC I
still see it happening, but only when ARC size is near its minimum, not when it is
at maximum.

It seems that kstat.zfs.misc.arcstats.recycle_miss is a counter of allocations
when ARC refused to grow and no existing buffer could be recycled, but this is not
the same as going above ARC maximum size.

BTW, such allocation over the limit could be considered as a form of memory
pressure from ARC on the rest of the system.

P.S.
The code is in arc_get_data_buf().

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 16:46:44 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2144C106566C;
	Tue, 28 Sep 2010 16:46:44 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 9C4638FC1E;
	Tue, 28 Sep 2010 16:46:43 +0000 (UTC)
Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152])
	(authenticated bits=0)
	by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SGkc6j027489
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 28 Sep 2010 16:46:39 GMT (envelope-from ben@wanderview.com)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Ben Kelly <ben@wanderview.com>
In-Reply-To: <4CA21809.7090504@icyb.net.ua>
Date: Tue, 28 Sep 2010 12:46:39 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
To: Andriy Gapon <avg@icyb.net.ua>
X-Mailer: Apple Mail (2.1081)
X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD
X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 16:46:44 -0000


On Sep 28, 2010, at 12:30 PM, Andriy Gapon wrote:

> on 28/09/2010 18:50 Ben Kelly said the following:
>>=20
>> On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote:
>>> Well, no time for me to dig through all that history. arc_max should =
be a
>>> hard limit and it is now. If it ever wasn't then it was a bug.
>>=20
>> I believe the size of the arc could exceed the limit if your working =
set was
>> larger than arc_max.  The arc can't (couldn't then, anyway) evict =
data that is
>> still referenced.
>=20
> I think that you are correct and I was wrong.
> ARC would still allocate a new buffer even if it's at or above arc_max =
and can not
> re-use any exisiting buffer.
> But I think that this is more likely to happen with "tiny" ARC size.  =
I have hard
> time imagining a workload at which gigabytes of data would be =
simultaneously and
> continuously used (see below for definition of "used").
>=20
>> A contributing factor at the time was that the page daemon did not =
take into
>> account back pressure from the arc when deciding which pages to move =
from
>> active to inactive, etc.  So data was more likely to be referenced =
and
>> therefore forced to remain in the arc.
>=20
> I don't think that this is what happened and I don't think that =
pagedaemon has
> anything to do with the discussed issue.
> I think that ARC buffers exist independently of pagedaemon and page =
cache.
> I think that they are held only during time when I/O is happening to =
or from them.


Hmm.  My server is currently idle with no I/O happening:

  kstat.zfs.misc.arcstats.c: 25165824
  kstat.zfs.misc.arcstats.c_max: 46137344
  kstat.zfs.misc.arcstats.size: 91863156

If what you say is true, this shouldn't happen, should it?  This system =
is an i386 machine with kmem max at 800M and arc set to 40M.  This is =
running head from April 6, 2010, so it is a bit old, though.

At one point I had patches running on my system that triggered the =
pagedaemon based on arc load and it did allow me to keep my arc below =
the max.  Or at least I thought it did.

In any case, I've never really been able to wrap my head around the VFS =
layer and how it interacts with zfs.  So I'm more than willing to =
believe I'm confused.  Any insights are greatly appreciated.

Thanks!

- Ben=

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 17:17:58 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 536271065695;
	Tue, 28 Sep 2010 17:17:58 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 667268FC08;
	Tue, 28 Sep 2010 17:17:56 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA08291;
	Tue, 28 Sep 2010 20:17:44 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA22337.2010900@icyb.net.ua>
Date: Tue, 28 Sep 2010 20:17:43 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Ben Kelly <ben@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
In-Reply-To: <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 17:17:58 -0000

on 28/09/2010 19:46 Ben Kelly said the following:
> Hmm.  My server is currently idle with no I/O happening:
> 
>   kstat.zfs.misc.arcstats.c: 25165824
>   kstat.zfs.misc.arcstats.c_max: 46137344
>   kstat.zfs.misc.arcstats.size: 91863156
> 
> If what you say is true, this shouldn't happen, should it?  This system is an i386 machine with kmem max at 800M and arc set to 40M.  This is running head from April 6, 2010, so it is a bit old, though.

Well, your system is a bit old indeed.
And the branch is unknown, so I can't really see what sources you have.
And I am not sure if I'll be able to say anything about those sources.

As to the numbers - yes, with current code I'd expect arcstats.size to go down to
arcstats.c when there is no I/O.  arc_reclaim_thread should do that.

> At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max.  Or at least I thought it did.
> 
> In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs.  So I'm more than willing to believe I'm confused.  Any insights are greatly appreciated.

ARC is a ZFS private cache.
ZFS doesn't use unified buffer/page cache.
So ARC is not directly affected by pagedaemon.
But this is not exactly VFS layer thing.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 17:24:36 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 303E31065673;
	Tue, 28 Sep 2010 17:24:36 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 744F88FC14;
	Tue, 28 Sep 2010 17:24:33 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA08370;
	Tue, 28 Sep 2010 20:24:21 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA224C5.8000202@icyb.net.ua>
Date: Tue, 28 Sep 2010 20:24:21 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Ben Kelly <ben@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
In-Reply-To: <4CA22337.2010900@icyb.net.ua>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 17:24:36 -0000

on 28/09/2010 20:17 Andriy Gapon said the following:
> on 28/09/2010 19:46 Ben Kelly said the following:
>> If what you say is true, this shouldn't happen, should it?  This system is an i386 machine with kmem max at 800M and arc set to 40M.  This is running head from April 6, 2010, so it is a bit old, though.
> 
> Well, your system is a bit old indeed.
> And the branch is unknown, so I can't really see what sources you have.

Apologies, missed "head" in your description of the system.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 18:37:20 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3DD2F106566B
	for <fs@freebsd.org>; Tue, 28 Sep 2010 18:37:20 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id A79368FC16
	for <fs@freebsd.org>; Tue, 28 Sep 2010 18:37:19 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8SIDRLF015692
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 28 Sep 2010 21:13:27 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	o8SIDRIr087366; Tue, 28 Sep 2010 21:13:27 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8SIDRM3087365; 
	Tue, 28 Sep 2010 21:13:27 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 28 Sep 2010 21:13:27 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@icyb.net.ua>
Message-ID: <20100928181327.GS43070@deviant.kiev.zoral.com.ua>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="v6YRErhBvjoBjrPV"
Content-Disposition: inline
In-Reply-To: <4CA22337.2010900@icyb.net.ua>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_05,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 18:37:20 -0000


--v6YRErhBvjoBjrPV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Sep 28, 2010 at 08:17:43PM +0300, Andriy Gapon wrote:
> ARC is a ZFS private cache.
> ZFS doesn't use unified buffer/page cache.
> So ARC is not directly affected by pagedaemon.
> But this is not exactly VFS layer thing.
As a pure speculation, unbacked by any code reasing or understanding
of the principles. Can ARC be changed to use some custom vm pager
instead of managing memory on its own. As I understand it, ARC
uses wired kernel mappings right now.

If it starts using managed pages backed by a new pager, then pagedaemon
might take actual decisions on the cache shrink by putting and reclaiming
pages. Does ARC has some `active' count for the caching unit ? It might be
translated to the active count for the page etc.

Did I said that this is Pure Speculation ? Seems so, right at the
beginning.

--v6YRErhBvjoBjrPV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkyiMEcACgkQC3+MBN1Mb4iyIwCghq2eRbNL1kxbdsWjRcijVT3e
WH4An0aCYQpyzr3sawdW5TTcA6Lzjtpc
=wO8m
-----END PGP SIGNATURE-----

--v6YRErhBvjoBjrPV--

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 18:40:07 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E64A9106566C;
	Tue, 28 Sep 2010 18:40:06 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 9EB6F8FC1D;
	Tue, 28 Sep 2010 18:40:06 +0000 (UTC)
Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152])
	(authenticated bits=0)
	by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SIdx1R028419
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 28 Sep 2010 18:40:00 GMT (envelope-from ben@wanderview.com)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Ben Kelly <ben@wanderview.com>
In-Reply-To: <4CA22337.2010900@icyb.net.ua>
Date: Tue, 28 Sep 2010 14:40:00 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
To: Andriy Gapon <avg@icyb.net.ua>
X-Mailer: Apple Mail (2.1081)
X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD
X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 18:40:07 -0000


On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote:

> on 28/09/2010 19:46 Ben Kelly said the following:
>> Hmm.  My server is currently idle with no I/O happening:
>>=20
>>  kstat.zfs.misc.arcstats.c: 25165824
>>  kstat.zfs.misc.arcstats.c_max: 46137344
>>  kstat.zfs.misc.arcstats.size: 91863156
>>=20
>> If what you say is true, this shouldn't happen, should it?  This =
system is an i386 machine with kmem max at 800M and arc set to 40M.  =
This is running head from April 6, 2010, so it is a bit old, though.
>=20
> Well, your system is a bit old indeed.
> And the branch is unknown, so I can't really see what sources you =
have.
> And I am not sure if I'll be able to say anything about those sources.

Quite old.  I've been intending to update, but haven't found the time =
lately.  I'll try to do the upgrade this weekend and see if it changes =
anything.

> As to the numbers - yes, with current code I'd expect arcstats.size to =
go down to
> arcstats.c when there is no I/O.  arc_reclaim_thread should do that.

Thats what I thought as well, but when I debugged it a year or two ago I =
found that the buffers were still referenced and thus could not be =
reclaimed.  As far as I can remember they needed a vfs/vnops like =
zfs_vnops_inactive or zfs_vnops_reclaim to be executed in order to free =
the reference.  What is responsible for making those calls?

>=20
>> At one point I had patches running on my system that triggered the =
pagedaemon based on arc load and it did allow me to keep my arc below =
the max.  Or at least I thought it did.
>>=20
>> In any case, I've never really been able to wrap my head around the =
VFS layer and how it interacts with zfs.  So I'm more than willing to =
believe I'm confused.  Any insights are greatly appreciated.
>=20
> ARC is a ZFS private cache.
> ZFS doesn't use unified buffer/page cache.
> So ARC is not directly affected by pagedaemon.
> But this is not exactly VFS layer thing.

Can you explain the difference in how the vfs/vnode operations are =
called or used for those two situations?

I thought that the buffer cache was used by filesystems to implement =
these operations.  So that the buffer cache was below the vfs/vnops =
layer.  So while zfs implemented its operations in terms of the arc, =
things like UFS implemented vfs/vnops in terms of the buffer cache.  I =
thought the layers further up the chain like the page daemon did not =
distinguish that much between these two implementation due to the VFS =
interface layer.  (Although there seems to be a layering violation in =
that the buffer cache signals directly to the upper page daemon layer to =
trigger page reclamation.)

The old (ancient) patch I tried previously to help reduce the arc =
working set and allow it to shrink is here:

  http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff

Unfortunately, there are a couple ideas on fighting fragmentation mixed =
into that patch.  See the part about arc_reclaim_pages().  This patch =
did seem to allow my arc to stay under the target maximum even when =
under load that previously caused the system to exceed the maximum.  =
When I update this weekend I'll try a stripped down version of the patch =
to see if it helps or not with the latest zfs.

Thanks for your help in understanding this stuff!

- Ben=

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 20:19:21 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 86D271065670
	for <fs@freebsd.org>; Tue, 28 Sep 2010 20:19:21 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 24B8C8FC0A
	for <fs@freebsd.org>; Tue, 28 Sep 2010 20:19:20 +0000 (UTC)
Received: from besplex.bde.org (c122-107-116-249.carlnfd1.nsw.optusnet.com.au
	[122.107.116.249])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o8SKJAGf002392
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 29 Sep 2010 06:19:18 +1000
Date: Wed, 29 Sep 2010 06:19:10 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20100929031825.L683@besplex.bde.org>
Message-ID: <20100929054826.E797@besplex.bde.org>
References: <20100929031825.L683@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 20:19:21 -0000

On Wed, 29 Sep 2010, Bruce Evans wrote:

> For benchmarks on ext2fs:
>
> Under FreeBSD-~5.2 rerun today:
> untar:     59.17 real
> tar:       19.52 real
>
> Under -current run today:
> untar:    101.16 real
> tar:      172.03 real
>
> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
> untar.
> ...
> dumpe2fs seems to show a bizarre layout:
> % ...
> % Group 3: (Blocks 98304-131071)
> %   Backup superblock at 98304, Group descriptors at 98305-98305
> %   Block bitmap at 98306 (+2), Inode bitmap at 98307 (+3)
> %   Inode table at 98308-98816 (+4)
> %   6882 free blocks, 16288 free inodes, 0 directories
> %   Free blocks: 123207, 123209-123215, 123217-123223, 123225-123231, 
> 123233-123239, 123241-123247, ...
>
> The last line was about 15000 characters long, and seems to have the 
> following
> pattern except for the first free block:
>
>    1 block used (12208)
>    7 blocks free (123209-123215)
>    1 block used (12216)
>    7 blocks free (123217-123223)
>    1 block used ...
>    7 blocks free ...
>
> So it seems that only 1 block in every 8 is used, and there is a seek
> after every block.  This asks for an 8-fold reduction in throughput,
> and it seems to have got that and a bit more for reading although not
> for writing.  Even (or especially) with perfect hardware, it must give
> an 8-fold reduction.  And it is likely to give more, since it defeats
> vfs clustering by making all runs of contiguous blocks have length 1.
>
> Simple sequential allocation should be used unless the allocation policy
> and implementation are very good.

This work a bit better after zapping the 8-fold way:

% Index: ext2_alloc.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
% retrieving revision 1.2
% diff -u -2 -r1.2 ext2_alloc.c
% --- ext2_alloc.c	1 Sep 2010 05:34:17 -0000	1.2
% +++ ext2_alloc.c	28 Sep 2010 19:12:46 -0000
% @@ -1,2 +1,5 @@
% +int bde_blkpref = 0;
% +int bde_alloc8 = 1;
% +
%  /*-
%   *  modified for Lites 1.1
% @@ -542,6 +545,12 @@
%  	   then set the goal to what we thought it should be
%  	*/
% +if (bde_blkpref == 0) {
%  	if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
%  		return ip->i_next_alloc_goal;
% +} else if (bde_blkpref == 1) {
% +	if(ip->i_next_alloc_block == lbn)
% +		return ip->i_next_alloc_goal;
% +} else
% +	return 0;
% 
%  	/* now check whether we were provided with an array that basically
% @@ -662,4 +671,5 @@
%  	 * block.
%  	 */
% +if (bde_alloc8 == 0) {
%  	if (bpref)
%  		start = dtogd(fs, bpref) / NBBY;
% @@ -679,4 +689,5 @@
%  		}
%  	}
% +}
% 
%  	bno = ext2_mapsearch(fs, bbp, bpref);

This gives an improvement of:

untar:    101.16 real -> 63.46
tar:      172.03 real -> 50.70

Now -current is only 1.1 times slower for untar and 2.6 times slower for
tar.

There must be a problem with bpref for things to have been so bad.  There
is some point to leaving a gap of 7 blocks for expansion, but the gap was
left even between blocks in a single file.

I don't have a userland program for displaying the layout produced by
ext2fs, but I have kernel printfs for it several foofs_bmaparray()
functions.  Turning this on for ext2fs gives for 3 files:

% ino 231895: size 99982(25), lbn 0, bn 3704960-3704967, indir 913288-913295, runp 0
% ino 231895: size 99982(25), lbn 1, bn 913200-913287, indir 913288-913295, runp 10
% ino 231895: size 99982(25), lbn 12, bn 913296-913399, indir 913288-913295, runp 12

25 is the number of 4K blocks.  These should be allocated contiguously,
except for an indirect block in the middle.  (ffs also gets this wrong,
by allocating the indirect block far away.)  The above and the below
show bn's for the lbn 0's all nearby.  Then in all cases, the bn for
lbn 1 is far away.  For lbn1-lbn<end>, the allocation is perfectly
contiguous, except for the indirect block in the correct place in the
middle.

% ino 231880: size 82877(21), lbn 0, bn 3704848-3704855, indir 912224-912231, runp 0
% ino 231880: size 82877(21), lbn 1, bn 912136-912223, indir 912224-912231, runp 10
% ino 231880: size 82877(21), lbn 12, bn 912232-912303, indir 912224-912231, runp 8
% ino 231881: size 82343(21), lbn 0, bn 3704856-3704863, indir 912392-912399, runp 0
% ino 231881: size 82343(21), lbn 1, bn 912304-912391, indir 912392-912399, runp 10
% ino 231881: size 82343(21), lbn 12, bn 912400-912471, indir 912392-912399, runp 8

Same pattern for all files examined.  The last 2 have sequential ino's and
were probably created sequentially.  Everything is perfectly sequential
except for jumping back and forth between lbn0 and lbn1.  Perhaps bpref
(and/or the 'goal' variable) is working as intended to keep the lbn0's
together, but something fails so the bn's for all other lbn's are allocated
sequentially starting from the beginning of the disk (912K is much smaller
than 3704K).  Cylinder groups can't be working right either.

I haven't tried the bde_blkpref hack in the above.  It should kill bpref
completely so that there is no jump between lbn0 and lbn1, and break
cylinder group based allocation even better.  Setting bde_blkpref to 1
restores the bug that was present in ext2fs in FreeBSD between 1995 and
2010.  This bug gave seqential allocation starting at the beginning of
the disk in almost all cases, so map searches were slow and early groups
filled up before later groups were used at all.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 20:25:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CB35A106566C
	for <fs@freebsd.org>; Tue, 28 Sep 2010 20:25:52 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx08.syd.optusnet.com.au
	(fallbackmx08.syd.optusnet.com.au [211.29.132.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 4EEB18FC17
	for <fs@freebsd.org>; Tue, 28 Sep 2010 20:25:51 +0000 (UTC)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o8SHq28I030607 for <fs@freebsd.org>; Wed, 29 Sep 2010 03:52:02 +1000
Received: from besplex.bde.org (c122-107-116-249.carlnfd1.nsw.optusnet.com.au
	[122.107.116.249])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o8SHpwQ3002339
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <fs@freebsd.org>; Wed, 29 Sep 2010 03:52:00 +1000
Date: Wed, 29 Sep 2010 03:51:58 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: fs@freebsd.org
Message-ID: <20100929031825.L683@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: 
Subject: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 20:25:52 -0000

For benchmarks on ext2fs:

Under FreeBSD-~5.2 rerun today:
untar:     59.17 real
tar:       19.52 real

Under -current run today:
untar:    101.16 real
tar:      172.03 real

So, -current is 8.8 times slower for tar, but only 1.7 times slower for
untar.

FreeBSD-~5.2 is my version of FreeBSD-5.2-CURRENT-old, which has
significant changes in ext2fs which make it a few percent faster (real)
and a few percent slower (sys) by using the BSD buffer cache instead
of a private cache for inodes).  I committed most of my changes to
ext2fs except the ones that made it slower.

More details: the untar benchmark copies about 400 MB of sources from
a large subset of /usr/src to a freshly mkfs.ext2'd and mounted file
system using 2 tars in a pipe (ends up with 488828 1K-blocks used on
ext2fs with 4K-blocks).  The source is supposed to be cached, so that
the untar is almost from memory.  The untar benchmark unmounts the
file system, remounts it, and tars up its contents to /dev/zero.  This
benchmark was originally mainly for finding fs layout problems.  In
fact, it was originally for figuring out why ext2fs was faster than
ffs in 1997 (*).  Since the tar part of it is not much affected by
caching, its results are much easier to reproduce than for the tar
benchmark.  Slowness in it normally that the fs layout is bad, and
that shouldn't happen for a freshly laid out file system.

(*) This turned out to be because the ext2fs layout policy was completely
broken (essentially sequential, ignoring cylinder groups), but this
was actually an optimization for the relatively small file sets tested
by the benchmark (even smaller then), when combined with lack of caching
in my disk drive -- the drive was very slow for even small seeks, and
the broken allocation policy accidentally avoided lots of small seeks,
while ffs's fancier policy tends to generate too many of them.

Rawer results with all relevant possible fs parameters:

FreeBSD-~5.2:
%%%
ext2fs-1024-1024:
tarcp /f srcs:                 68.85 real         0.35 user         7.15 sys
tar cf /dev/zero srcs:         22.36 real         0.15 user         4.90 sys
ext2fs-1024-1024-as:
tarcp /f srcs:                 46.00 real         0.27 user         6.23 sys
tar cf /dev/zero srcs:         22.89 real         0.08 user         4.94 sys
ext2fs-4096-4096:
tarcp /f srcs:                 59.17 real         0.22 user         5.89 sys
tar cf /dev/zero srcs:         19.52 real         0.12 user         2.13 sys
ext2fs-4096-4096-as:
tarcp /f srcs:                 37.73 real         0.22 user         4.94 sys
tar cf /dev/zero srcs:         19.40 real         0.19 user         2.05 sys
%%%

ext2fs-1024-1024 means ext2fs with 1024-blocks and 1024-frags, and the -as
suffix means an async mount, etc.  tarcp is 2 tars in a pipe (untar).

FreeBSD-current:
%%%
ext2fs-1024-1024:
tarcp /f srcs:                130.18 real         0.26 user         6.39 sys
tar cf /dev/zero srcs:         73.90 real         0.15 user         2.30 sys
ext2fs-1024-1024-as:
tarcp /f srcs:                 98.22 real         0.30 user         6.38 sys
tar cf /dev/zero srcs:         70.36 real         0.13 user         2.29 sys
ext2fs-4096-4096:
tarcp /f srcs:                101.16 real         0.33 user         5.04 sys
tar cf /dev/zero srcs:        172.03 real         0.13 user         1.26 sys
ext2fs-4096-4096-as:
tarcp /f srcs:                 78.23 real         0.21 user         5.09 sys
tar cf /dev/zero srcs:        147.87 real         0.15 user         1.23 sys
%%%

The benchmark also prints the i/o counts using mount -v.  This is broken
in -current, so it is not easy to see if there are too many i/o's.

I guess the problem is mainly a bad layout policy, since the efficiency of
the tar step doesn't depend much on the layout.  Testing under ~5.2
confirms this: for the file system left at the end of the above run, but
tarred up by ~5.2 after reboot

%%%
tar cf /dev/zero srcs:        151.88 real         0.14 user         2.30 sys
%%%

So -current is actually 1.03 times faster, not 8.8 times slower, for tar :-/.

dumpe2fs seems to show a bizarre layout:

% Filesystem volume name:   <none>
% Last mounted on:          <not available>
% Filesystem UUID:          a792ae57-2438-4e78-bad6-4ef939fde0df
% Filesystem magic number:  0xEF53
% Filesystem revision #:    1 (dynamic)
% Filesystem features:      filetype sparse_super
% Default mount options:    (none)
% Filesystem state:         not clean
% Errors behavior:          Continue
% Filesystem OS type:       unknown
% Inode count:              1531072
% Block count:              3058374
% Reserved block count:     152918
% Free blocks:              2888113
% Free inodes:              1498688
% First block:              0
% Block size:               4096
% Fragment size:            4096
% Blocks per group:         32768
% Fragments per group:      32768
% Inodes per group:         16288
% Inode blocks per group:   509
% Filesystem created:       Wed Sep 29 02:16:32 2010
% Last mount time:          n/a
% Last write time:          Wed Sep 29 03:15:24 2010
% Mount count:              0
% Maximum mount count:      28
% Last checked:             Wed Sep 29 02:16:32 2010
% Check interval:           15552000 (6 months)
% Next check after:         Mon Mar 28 03:16:32 2011
% Reserved blocks uid:      0 (user root)
% Reserved blocks gid:      0 (group wheel)
% First inode:              11
% Inode size:		  128
% Default directory hash:   tea
% Directory Hash Seed:      036f029e-7924-4a73-91ec-730fd18e832d
% 
% 
% Group 0: (Blocks 0-32767)
%   Primary superblock at 0, Group descriptors at 1-1
%   Block bitmap at 2 (+2), Inode bitmap at 3 (+3)
%   Inode table at 4-512 (+4)
%   0 free blocks, 16277 free inodes, 2 directories
%   Free blocks: 
%   Free inodes: 12-16288
% Group 1: (Blocks 32768-65535)
%   Backup superblock at 32768, Group descriptors at 32769-32769
%   Block bitmap at 32770 (+2), Inode bitmap at 32771 (+3)
%   Inode table at 32772-33280 (+4)
%   0 free blocks, 16288 free inodes, 0 directories
%   Free blocks: 
%   Free inodes: 16289-32576
% Group 2: (Blocks 65536-98303)
%   Block bitmap at 65536 (+0), Inode bitmap at 65537 (+1)
%   Inode table at 65538-66046 (+2)
%   32257 free blocks, 16288 free inodes, 0 directories
%   Free blocks: 66047-98303
%   Free inodes: 32577-48864
% Group 3: (Blocks 98304-131071)
%   Backup superblock at 98304, Group descriptors at 98305-98305
%   Block bitmap at 98306 (+2), Inode bitmap at 98307 (+3)
%   Inode table at 98308-98816 (+4)
%   6882 free blocks, 16288 free inodes, 0 directories
%   Free blocks: 123207, 123209-123215, 123217-123223, 123225-123231, 123233-123239, 123241-123247, ...

The last line was about 15000 characters long, and seems to have the following
pattern except for the first free block:

     1 block used (12208)
     7 blocks free (123209-123215)
     1 block used (12216)
     7 blocks free (123217-123223)
     1 block used ...
     7 blocks free ...

So it seems that only 1 block in every 8 is used, and there is a seek
after every block.  This asks for an 8-fold reduction in throughput,
and it seems to have got that and a bit more for reading although not
for writing.  Even (or especially) with perfect hardware, it must give
an 8-fold reduction.  And it is likely to give more, since it defeats
vfs clustering by making all runs of contiguous blocks have length 1.

Simple sequential allocation should be used unless the allocation policy
and implementation are very good.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 21:31:17 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A7A6D1065673;
	Tue, 28 Sep 2010 21:31:17 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id C35838FC13;
	Tue, 28 Sep 2010 21:31:15 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA11500;
	Wed, 29 Sep 2010 00:31:00 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1P0hlL-0000Hk-Qa; Wed, 29 Sep 2010 00:30:59 +0300
Message-ID: <4CA25E92.4060904@icyb.net.ua>
Date: Wed, 29 Sep 2010 00:30:58 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Ben Kelly <ben@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
	<F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>
In-Reply-To: <F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 21:31:17 -0000

on 28/09/2010 21:40 Ben Kelly said the following:
> 
> On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote:
> 
>> on 28/09/2010 19:46 Ben Kelly said the following:
>>> Hmm.  My server is currently idle with no I/O happening:
>>> 
>>> kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max:
>>> 46137344 kstat.zfs.misc.arcstats.size: 91863156
>>> 
>>> If what you say is true, this shouldn't happen, should it?  This system
>>> is an i386 machine with kmem max at 800M and arc set to 40M.  This is
>>> running head from April 6, 2010, so it is a bit old, though.
>> 
>> Well, your system is a bit old indeed. And the branch is unknown, so I
>> can't really see what sources you have. And I am not sure if I'll be able
>> to say anything about those sources.
> 
> Quite old.  I've been intending to update, but haven't found the time lately.
> I'll try to do the upgrade this weekend and see if it changes anything.
> 
>> As to the numbers - yes, with current code I'd expect arcstats.size to go
>> down to arcstats.c when there is no I/O.  arc_reclaim_thread should do
>> that.
> 
> Thats what I thought as well, but when I debugged it a year or two ago I
> found that the buffers were still referenced and thus could not be reclaimed.
> As far as I can remember they needed a vfs/vnops like zfs_vnops_inactive or
> zfs_vnops_reclaim to be executed in order to free the reference.  What is
> responsible for making those calls?

It's time that we should start showing each other places in code :)
Because I don't think that that's how the code work.
E.g. I look at how zfs_read() calls dmu_read_uio() which calls
dmu_buf_hold_array() and dmu_buf_rele_array() around uimove() call.
>From what I see, dmu_buf_hold_array() calls dmu_buf_hold_array_by_dnode() calls
dbuf_hold() calls arc_buf_add_ref() or arc_buf_alloc().
And conversely, dmu_buf_rele_array() calls dbuf_rele() calls arc_buf_remove_ref().

So, I am quite sure that ARC buffers are held/referenced only during ongoing I/O
to or from them.

Perhaps, on the other hand, you had in mind life-cycle of other things (not ARC
buffers) that are accounted against ARC size (with type ARC_SPACE_OTHER)?
Such as e.g. dmu_buf_impl_t-s allocated in dbuf_create().
I have to admit that I haven't investigated behavior of that part of
ARC-assigned memory.  It's only a small proportion (~10%) of the whole ARC size
on my systems.

>>> At one point I had patches running on my system that triggered the
>>> pagedaemon based on arc load and it did allow me to keep my arc below the
>>> max.  Or at least I thought it did.
>>> 
>>> In any case, I've never really been able to wrap my head around the VFS
>>> layer and how it interacts with zfs.  So I'm more than willing to believe
>>> I'm confused.  Any insights are greatly appreciated.
>> 
>> ARC is a ZFS private cache. ZFS doesn't use unified buffer/page cache. So
>> ARC is not directly affected by pagedaemon. But this is not exactly VFS
>> layer thing.
> 
> Can you explain the difference in how the vfs/vnode operations are called or
> used for those two situations?

They are called exactly the same.
VFS layer and code above it are not aware of FS implementation details.

> I thought that the buffer cache was used by filesystems to implement these
> operations.  So that the buffer cache was below the vfs/vnops layer.  So

Buffer cache works as part of unified VM and its buffers use the same pages as
page cache does.

> while zfs implemented its operations in terms of the arc, things like UFS
> implemented vfs/vnops in terms of the buffer cache.  I thought the layers

Yes.  Filesystems like UFS are "sandwiched" between buffer cache and page cache,
which work in concert.  Also, they don't (have to) implement their own
buffer/page caching policies, because it's all managed by unified VM system.

On the contrary, ZFS has its own private cache.
So, first of all, its data may be cached in two places at once - page cache and
ARC.  And, because of that, some assumptions of the higher level code get
violated, so ZFS has to jump through the hoops to meet those assumptions (e.g.
see UIO_NOCOPY).

> further up the chain like the page daemon did not distinguish that much
> between these two implementation due to the VFS interface layer.  (Although

Right, but see above.

> there seems to be a layering violation in that the buffer cache signals
> directly to the upper page daemon layer to trigger page reclamation.)

Umm, not sure if that is a fact.

> The old (ancient) patch I tried previously to help reduce the arc working set
> and allow it to shrink is here:
> 
> http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff
> 
> Unfortunately, there are a couple ideas on fighting fragmentation mixed into
> that patch.  See the part about arc_reclaim_pages().  This patch did seem to
> allow my arc to stay under the target maximum even when under load that
> previously caused the system to exceed the maximum.  When I update this
> weekend I'll try a stripped down version of the patch to see if it helps or
> not with the latest zfs.
> 
> Thanks for your help in understanding this stuff!

The patch seems good, especially the part about taking into account the kmem
fragmentation.  But it also seems to be heavily tuned towards "tiny ARC" systems
like yours, so I am not sure yet how suitable it is for "mainstream" systems.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 22:01:27 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F036B106566C;
	Tue, 28 Sep 2010 22:01:27 +0000 (UTC)
	(envelope-from ben@wanderview.com)
Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 6EE088FC0A;
	Tue, 28 Sep 2010 22:01:27 +0000 (UTC)
Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152])
	(authenticated bits=0)
	by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SM1LVX031742
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 28 Sep 2010 22:01:21 GMT (envelope-from ben@wanderview.com)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Ben Kelly <ben@wanderview.com>
In-Reply-To: <4CA25E92.4060904@icyb.net.ua>
Date: Tue, 28 Sep 2010 18:01:21 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
	<F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>
	<4CA25E92.4060904@icyb.net.ua>
To: Andriy Gapon <avg@icyb.net.ua>
X-Mailer: Apple Mail (2.1081)
X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD
X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 22:01:28 -0000


On Sep 28, 2010, at 5:30 PM, Andriy Gapon wrote:

<< snipped lots of good info here... probably won't have time to look at =
it in detail until the weekend >>

>> there seems to be a layering violation in that the buffer cache =
signals
>> directly to the upper page daemon layer to trigger page reclamation.)
>=20
> Umm, not sure if that is a fact.

I was referring to the code in vfs_bio.c that used to twiddle =
vm_pageout_deficit directly.  That seems to have been replaced with a =
call to vm_page_grab().

>> The old (ancient) patch I tried previously to help reduce the arc =
working set
>> and allow it to shrink is here:
>>=20
>> http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff
>>=20
>> Unfortunately, there are a couple ideas on fighting fragmentation =
mixed into
>> that patch.  See the part about arc_reclaim_pages().  This patch did =
seem to
>> allow my arc to stay under the target maximum even when under load =
that
>> previously caused the system to exceed the maximum.  When I update =
this
>> weekend I'll try a stripped down version of the patch to see if it =
helps or
>> not with the latest zfs.
>>=20
>> Thanks for your help in understanding this stuff!
>=20
> The patch seems good, especially the part about taking into account =
the kmem
> fragmentation.  But it also seems to be heavily tuned towards "tiny =
ARC" systems
> like yours, so I am not sure yet how suitable it is for "mainstream" =
systems.

Thanks.  Yea, there is a lot of aggressive tuning there.  In particular, =
the slow growth algorithm is somewhat dubious.  What I found, though, =
was that the fragmentation jumped whenever the arc was reduced in size, =
so it was an attempt to make the size slowly approach peak load without =
overshooting.

A better long term solution would probably be to enhance UMA to support =
custom slab sizes on a zone-by-zone basis.  That way all zfs/arc =
allocations can use slabs of 128k (at a memory efficiency penalty of =
course).  I prototyped this with a dumbed down block pool allocator at =
one point and was able to avoid most, if not all, of the fragmentation.  =
Adding the support to UMA seemed non-trivial, though.

Thanks again for the information.  I hope to get a chance to look at the =
code this weekend.

- Ben=

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 22:22:59 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F401A106566B;
	Tue, 28 Sep 2010 22:22:58 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 0E61D8FC08;
	Tue, 28 Sep 2010 22:22:57 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA12308;
	Wed, 29 Sep 2010 01:22:45 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1P0iZR-0000Lq-6i; Wed, 29 Sep 2010 01:22:45 +0300
Message-ID: <4CA26AB4.3050108@icyb.net.ua>
Date: Wed, 29 Sep 2010 01:22:44 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Ben Kelly <ben@wanderview.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
	<F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>
	<4CA25E92.4060904@icyb.net.ua>
	<5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com>
In-Reply-To: <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 22:22:59 -0000

on 29/09/2010 01:01 Ben Kelly said the following:
> Thanks.  Yea, there is a lot of aggressive tuning there.  In particular, the
> slow growth algorithm is somewhat dubious.  What I found, though, was that
> the fragmentation jumped whenever the arc was reduced in size, so it was an
> attempt to make the size slowly approach peak load without overshooting.
> 
> A better long term solution would probably be to enhance UMA to support
> custom slab sizes on a zone-by-zone basis.  That way all zfs/arc allocations
> can use slabs of 128k (at a memory efficiency penalty of course).  I
> prototyped this with a dumbed down block pool allocator at one point and was
> able to avoid most, if not all, of the fragmentation.  Adding the support to
> UMA seemed non-trivial, though.

BTW, have you seen my posts about UMA and ZFS on hackers@ ?
I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing
size of per-CPU caches for the zones with large-sized items.
I further modified the code in my local tree to completely disable per-CPU
caches for items > 32KB.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 28 23:15:02 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC29C106566B
	for <fs@freebsd.org>; Tue, 28 Sep 2010 23:15:01 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 78F678FC2E
	for <fs@freebsd.org>; Tue, 28 Sep 2010 23:15:01 +0000 (UTC)
Received: from besplex.bde.org (c122-107-116-249.carlnfd1.nsw.optusnet.com.au
	[122.107.116.249])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o8SNEvp4006110
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 29 Sep 2010 09:14:58 +1000
Date: Wed, 29 Sep 2010 09:14:57 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20100929054826.E797@besplex.bde.org>
Message-ID: <20100929084801.M948@besplex.bde.org>
References: <20100929031825.L683@besplex.bde.org>
	<20100929054826.E797@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2010 23:15:02 -0000

On Wed, 29 Sep 2010, Bruce Evans wrote:

> On Wed, 29 Sep 2010, Bruce Evans wrote:
>
>> For benchmarks on ext2fs:
>> 
>> Under FreeBSD-~5.2 rerun today:
>> untar:     59.17 real
>> tar:       19.52 real
>> 
>> Under -current run today:
>> untar:    101.16 real
>> tar:      172.03 real
>> 
>> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
>> untar.
>> ...
>> So it seems that only 1 block in every 8 is used, and there is a seek
>> after every block.  This asks for an 8-fold reduction in throughput,
>> and it seems to have got that and a bit more for reading although not
>> for writing.  Even (or especially) with perfect hardware, it must give
>> an 8-fold reduction.  And it is likely to give more, since it defeats
>> vfs clustering by making all runs of contiguous blocks have length 1.
>> 
>> Simple sequential allocation should be used unless the allocation policy
>> and implementation are very good.
>
> This work a bit better after zapping the 8-fold way:
   Things
> ...
> This gives an improvement of:
>
> untar:    101.16 real -> 63.46
> tar:      172.03 real -> 50.70
>
> Now -current is only 1.1 times slower for untar and 2.6 times slower for
> tar.
>
> There must be a problem with bpref for things to have been so bad.  There
> is some point to leaving a gap of 7 blocks for expansion, but the gap was
> left even between blocks in a single file.
> ...
> I haven't tried the bde_blkpref hack in the above.  It should kill bpref
> completely so that there is no jump between lbn0 and lbn1, and break
> cylinder group based allocation even better.  Setting bde_blkpref to 1
> restores the bug that was present in ext2fs in FreeBSD between 1995 and
> 2010.  This bug gave seqential allocation starting at the beginning of
> the disk in almost all cases, so map searches were slow and early groups
> filled up before later groups were used at all.

Tried this (patch repeated below), and it gave essentially the same
speed as old versions.

The main problem seems to be that the `goal' variables aren't initialized.
After restoring bits verbatim from an old version, things seem to work as
expected:

% Index: ext2_alloc.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
% retrieving revision 1.2
% diff -u -2 -r1.2 ext2_alloc.c
% --- ext2_alloc.c	1 Sep 2010 05:34:17 -0000	1.2
% +++ ext2_alloc.c	28 Sep 2010 21:08:42 -0000
% @@ -1,2 +1,5 @@
% +int bde_blkpref = 0;
% +int bde_alloc8 = 0;
% +
%  /*-
%   *  modified for Lites 1.1
% @@ -117,4 +120,8 @@
%                                                   ext2_alloccg);
%          if (bno > 0) {
% +		/* set next_alloc fields as done in block_getblk */
% +		ip->i_next_alloc_block = lbn;
% +		ip->i_next_alloc_goal = bno;
% +
%                  ip->i_blocks += btodb(fs->e2fs_bsize);
%                  ip->i_flag |= IN_CHANGE | IN_UPDATE;

The only things that changed recently in this block were the 4 deleted
lines and 4 lines with tabs corrupted to spaces.  Perhaps an editing
error.

% @@ -542,6 +549,12 @@
%  	   then set the goal to what we thought it should be
%  	*/
% +if (bde_blkpref == 0) {
%  	if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
%  		return ip->i_next_alloc_goal;
% +} else if (bde_blkpref == 1) {
% +	if(ip->i_next_alloc_block == lbn)
% +		return ip->i_next_alloc_goal;
% +} else
% +	return 0;
% 
%  	/* now check whether we were provided with an array that basically

Not needed now.

% @@ -662,4 +675,5 @@
%  	 * block.
%  	 */
% +if (bde_alloc8 == 0) {
%  	if (bpref)
%  		start = dtogd(fs, bpref) / NBBY;
% @@ -679,4 +693,5 @@
%  		}
%  	}
% +}
% 
%  	bno = ext2_mapsearch(fs, bbp, bpref);

The code to skip to the next 8-block boundary should be removed permanently.
After fixing the initialization, it doesn't generate holes inside files but
it still generates holes between files.  The holes are quite large with
4K-blocks.

Benchmark results with just the initialization of `goal' variables restored:

%%%
ext2fs-1024-1024:
tarcp /f srcs:                 78.79 real         0.31 user         4.94 sys
tar cf /dev/zero srcs:         24.62 real         0.19 user         1.82 sys
ext2fs-1024-1024-as:
tarcp /f srcs:                 52.07 real         0.26 user         4.95 sys
tar cf /dev/zero srcs:         24.80 real         0.10 user         1.93 sys
ext2fs-4096-4096:
tarcp /f srcs:                 74.14 real         0.34 user         3.96 sys
tar cf /dev/zero srcs:         33.82 real         0.10 user         1.19 sys
ext2fs-4096-4096-as:
tarcp /f srcs:                 53.54 real         0.36 user         3.87 sys
tar cf /dev/zero srcs:         33.91 real         0.14 user         1.15 sys
%%%

The much larger holes between the files are apparently responsible for the
decreased speed with 4K-blocks.  1K-blocks are really too small, so 4K-blocks
should be faster.

Benchmark results with the fix and bde_alloc8 = 1.

ext2fs-1024-1024:
tarcp /f srcs:                 71.60 real         0.15 user         2.04 sys
tar cf /dev/zero srcs:         22.34 real         0.05 user         0.79 sys
ext2fs-1024-1024-as:
tarcp /f srcs:                 46.03 real         0.14 user         2.02 sys
tar cf /dev/zero srcs:         21.97 real         0.05 user         0.80 sys
ext2fs-4096-4096:
tarcp /f srcs:                 59.66 real         0.13 user         1.63 sys
tar cf /dev/zero srcs:         19.88 real         0.07 user         0.46 sys
ext2fs-4096-4096-as:
tarcp /f srcs:                 37.30 real         0.12 user         1.60 sys
tar cf /dev/zero srcs:         19.93 real         0.05 user         0.49 sys

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 00:01:48 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D560106564A
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 00:01:48 +0000 (UTC)
	(envelope-from scottj75074@yahoo.com)
Received: from web110702.mail.gq1.yahoo.com (web110702.mail.gq1.yahoo.com
	[67.195.13.209]) by mx1.freebsd.org (Postfix) with SMTP id 313388FC15
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 00:01:47 +0000 (UTC)
Received: (qmail 32029 invoked by uid 60001); 29 Sep 2010 00:01:47 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
	t=1285718507; bh=s/EOty7iIA36f4lTrWCyHwF+ttMCv0QGH65S0rCsXtc=;
	h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type;
	b=Hq1AYeX3c0EfU2w3XnDlHWHf7IQjl0fKVnRCUmNtNAL3iFOJV3jEfBv/xVzeqprZgfFN49R3BqTkda+JtNEnjYr9b0NSJVAxfer1ZVaTvoCNUmXh7AUxAzGd2xX0jVL28s4IFyvmh/j8nQSMPCNQtquuqV22/F6cJ8YFyxsE6oc=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type;
	b=tW2NDrhQZgDMJtLXORuswSLB90F0IaB8b0EbLrBoFdgclrxpkC98GKvQEXOLbR5uT26nneCZyjIpnIIK68OdPLur336giYRp8D4c9Sw0wGosMGIooSWmulIdy+08XLLwF5WNdNDB2nEj+GbMdnOo4V3xYVKuxzpljqq6ZZghY/E=;
Message-ID: <707981.30589.qm@web110702.mail.gq1.yahoo.com>
X-YMail-OSG: ta4L74QVM1lbp3oOg83p_MXvzmUNEE.MAGOTy9PqB_HjHYo
	enSIYlcvnHIN6GZjbERiSkg3KFj1IT5PmxWNc16g1GJtUcslJx0OPL_yhxMJ
	MlcJYFk0IGIL_FWASOfQGHPuVeJAOtlOMVagA22eJcgPrvJXyFIi0tQ3YGPG
	4YixBXNlRrlVy0m6fgnQ23kxHiHf0hcqCB0hVZmn1bt6Ni7jCphPLbBdgJlV
	wTfae6ItMXy.9O5L2feclM4J8VjsJTJh.OsdqrCYwJ29XvP1T.rGqhBJBGZ9
	UJabzBSNbCRYehAh1aGuQb748Y7INisNymzYqYAHI97O3oZ4versM2w--
Received: from [99.189.91.206] by web110702.mail.gq1.yahoo.com via HTTP;
	Tue, 28 Sep 2010 17:01:47 PDT
X-Mailer: YahooMailRC/497 YahooMailWebService/0.8.105.279950
Date: Tue, 28 Sep 2010 17:01:47 -0700 (PDT)
From: Scott Johnson <scottj75074@yahoo.com>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Subject: zfs+smb checksum errors
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 00:01:48 -0000

I've been running FBSD 8.0-rel and now 8.1-rel for 6 months or so, with a 4 disk 
raidz, weekly scrubs. Never saw a checksum error until last week when I got 
about 5 during operation, followed by another 10 or so on the next scrub.

They all occurred when I was accessing the server through SMB from my WinXP 
desktop. I've been doing this for months, transferring files to and from 
regularly, but this was the first time I'd been doing simultaneous heavy reading 
& writing.

I was running Imgburn on WinXP, creating an iso file from folders. Both source 
files and destination iso file were on the same SMB share on the FBSD server.

After all the checksum errors, there was 1 unrecoverable error, on one of the 
destination iso files.

There are 0 read errors, 0 write errors, and 0 new SMART errors on the 4 disks. 
The checksum errors were spread roughly equally across the 4 disks. All of which 
leads me to believe this is a software problem at the filesystem level.

What is my next move for diagnosing and eventually resolving this?

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 01:08:24 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 03EDA1065672;
	Wed, 29 Sep 2010 01:08:24 +0000 (UTC)
	(envelope-from artemb@gmail.com)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 8DB6C8FC08;
	Wed, 29 Sep 2010 01:08:23 +0000 (UTC)
Received: by qyk7 with SMTP id 7so434487qyk.13
	for <multiple recipients>; Tue, 28 Sep 2010 18:08:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:sender:received
	:in-reply-to:references:date:x-google-sender-auth:message-id:subject
	:from:to:cc:content-type;
	bh=zs8nNjLa8+S++1leXXRprGA1GllZtA/+P7WblUbSuA4=;
	b=vPMfki3vezl7Cwbcpf+v5Cr8CJRtXDqiOMRCFpaEqDzzvAebWC8nY12GwUAPfTxp3a
	4HpHXruhS/UsJc2pYY4kGaFGWqJ0iA+stJkLKDeE7MumEnkGifSCYc+NTNGDkEkKCmAk
	kv+E4MaVbFnMRvvc8a01S8YXOEduATm2pW5II=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	b=HXJtpsEkDkLYcIlYB8eyYZdEW7DRWljp2nRTM5YHwVuBWn+KS4WkVebgYZYcXzl3JY
	0bQ6O5dejUCIdy5scgoicAQkrDXfxurR/y6VXMKcsvTqT50vOFFtjBB9xgPtpbD8i8Nk
	QxdPecCfi8qZZJSvCviiRfiq6B42Wky4m0hmk=
MIME-Version: 1.0
Received: by 10.220.63.5 with SMTP id z5mr188074vch.105.1285720710225; Tue, 28
	Sep 2010 17:38:30 -0700 (PDT)
Sender: artemb@gmail.com
Received: by 10.220.176.77 with HTTP; Tue, 28 Sep 2010 17:38:30 -0700 (PDT)
In-Reply-To: <4CA26AB4.3050108@icyb.net.ua>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
	<F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>
	<4CA25E92.4060904@icyb.net.ua>
	<5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com>
	<4CA26AB4.3050108@icyb.net.ua>
Date: Tue, 28 Sep 2010 17:38:30 -0700
X-Google-Sender-Auth: n2VHTRWuhNlv4iTT-adWLojl3Vg
Message-ID: <AANLkTi=aZp=46pHn9NtYsKBbMq1JwxPKb1D3o3Wngy1V@mail.gmail.com>
From: Artem Belevich <fbsdlist@src.cx>
To: Andriy Gapon <avg@icyb.net.ua>
Content-Type: text/plain; charset=ISO-8859-1
Cc: stable@freebsd.org, fs@freebsd.org, Ben Kelly <ben@wanderview.com>
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 01:08:24 -0000

On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon <avg@icyb.net.ua> wrote:
> BTW, have you seen my posts about UMA and ZFS on hackers@ ?
> I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing
> size of per-CPU caches for the zones with large-sized items.
> I further modified the code in my local tree to completely disable per-CPU
> caches for items > 32KB.

Do you have updated patch disabling per-cpu caches for large items?
I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050
from -head to compile) and so far things look good. I'll re-enable UMA
for ZFS and see how it flies in a couple of days.

--Artem

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 04:43:21 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D954A1065674
	for <fs@freebsd.org>; Wed, 29 Sep 2010 04:43:20 +0000 (UTC)
	(envelope-from sarawgi.aditya@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 64BB68FC19
	for <fs@freebsd.org>; Wed, 29 Sep 2010 04:43:19 +0000 (UTC)
Received: by fxm9 with SMTP id 9so314881fxm.13
	for <fs@freebsd.org>; Tue, 28 Sep 2010 21:43:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:date:from:to:cc:subject
	:message-id:references:mime-version:content-type:content-disposition
	:in-reply-to:user-agent;
	bh=md0dTwv/f8csoKd+A6bpUnt6BePVtdQ4PC+Xyecld3w=;
	b=YICrY4xgMZzS6WKobl558pPzNp5ZvvebGLvAeopBzSm2udlpF9QlUiNSG52ag4XOcg
	cjWMGcS4ASDgdaKYOva2xi3GKrZGTexPHXjvU5b0TcwYRboHF1nzHcbBsYwCiaWfsdRe
	1LNh8pFASa7G07ApK6f+//pgAIbwI+4HR8yF8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	b=hXC4RMPTevUkcX1Ck3dz+Fma+GgHRUV5xNUfV8HFSNlgbS7PWbpLCsSuhSeMJj8LPY
	kx51drIhHYr8jQDQnZtsopA6kSrJ9v4HQXtqL2+Ov9/wP/5AGXkZ3DwTj5S0OyUz5T/K
	P1us+kKBkCkzbgqMBnorL6jg8uarDOPUqHFmM=
Received: by 10.223.104.17 with SMTP id m17mr1068337fao.22.1285733698510;
	Tue, 28 Sep 2010 21:14:58 -0700 (PDT)
Received: from aditya ([183.87.49.235])
	by mx.google.com with ESMTPS id k25sm3548185fac.41.2010.09.28.21.14.54
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Tue, 28 Sep 2010 21:14:57 -0700 (PDT)
Date: Wed, 29 Sep 2010 09:46:55 +0530
From: Aditya Sarawgi <sarawgi.aditya@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20100929041650.GA1553@aditya>
References: <20100929031825.L683@besplex.bde.org>
	<20100929054826.E797@besplex.bde.org>
	<20100929084801.M948@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100929084801.M948@besplex.bde.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 04:43:21 -0000

On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
> On Wed, 29 Sep 2010, Bruce Evans wrote:
> 
> > On Wed, 29 Sep 2010, Bruce Evans wrote:
> >
> >> For benchmarks on ext2fs:
> >> 
> >> Under FreeBSD-~5.2 rerun today:
> >> untar:     59.17 real
> >> tar:       19.52 real
> >> 
> >> Under -current run today:
> >> untar:    101.16 real
> >> tar:      172.03 real
> >> 
> >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
> >> untar.
> >> ...
> >> So it seems that only 1 block in every 8 is used, and there is a seek
> >> after every block.  This asks for an 8-fold reduction in throughput,
> >> and it seems to have got that and a bit more for reading although not
> >> for writing.  Even (or especially) with perfect hardware, it must give
> >> an 8-fold reduction.  And it is likely to give more, since it defeats
> >> vfs clustering by making all runs of contiguous blocks have length 1.
> >> 
> >> Simple sequential allocation should be used unless the allocation policy
> >> and implementation are very good.
> >
> > This work a bit better after zapping the 8-fold way:
>    Things
> > ...
> > This gives an improvement of:
> >
> > untar:    101.16 real -> 63.46
> > tar:      172.03 real -> 50.70
> >
> > Now -current is only 1.1 times slower for untar and 2.6 times slower for
> > tar.
> >
> > There must be a problem with bpref for things to have been so bad.  There
> > is some point to leaving a gap of 7 blocks for expansion, but the gap was
> > left even between blocks in a single file.
> > ...
> > I haven't tried the bde_blkpref hack in the above.  It should kill bpref
> > completely so that there is no jump between lbn0 and lbn1, and break
> > cylinder group based allocation even better.  Setting bde_blkpref to 1
> > restores the bug that was present in ext2fs in FreeBSD between 1995 and
> > 2010.  This bug gave seqential allocation starting at the beginning of
> > the disk in almost all cases, so map searches were slow and early groups
> > filled up before later groups were used at all.
> 
> Tried this (patch repeated below), and it gave essentially the same
> speed as old versions.
> 
> The main problem seems to be that the `goal' variables aren't initialized.
> After restoring bits verbatim from an old version, things seem to work as
> expected:
> 
> % Index: ext2_alloc.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
> % retrieving revision 1.2
> % diff -u -2 -r1.2 ext2_alloc.c
> % --- ext2_alloc.c	1 Sep 2010 05:34:17 -0000	1.2
> % +++ ext2_alloc.c	28 Sep 2010 21:08:42 -0000
> % @@ -1,2 +1,5 @@
> % +int bde_blkpref = 0;
> % +int bde_alloc8 = 0;
> % +
> %  /*-
> %   *  modified for Lites 1.1
> % @@ -117,4 +120,8 @@
> %                                                   ext2_alloccg);
> %          if (bno > 0) {
> % +		/* set next_alloc fields as done in block_getblk */
> % +		ip->i_next_alloc_block = lbn;
> % +		ip->i_next_alloc_goal = bno;
> % +
> %                  ip->i_blocks += btodb(fs->e2fs_bsize);
> %                  ip->i_flag |= IN_CHANGE | IN_UPDATE;
> 
> The only things that changed recently in this block were the 4 deleted
> lines and 4 lines with tabs corrupted to spaces.  Perhaps an editing
> error.
> 
> % @@ -542,6 +549,12 @@
> %  	   then set the goal to what we thought it should be
> %  	*/
> % +if (bde_blkpref == 0) {
> %  	if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
> %  		return ip->i_next_alloc_goal;
> % +} else if (bde_blkpref == 1) {
> % +	if(ip->i_next_alloc_block == lbn)
> % +		return ip->i_next_alloc_goal;
> % +} else
> % +	return 0;
> % 
> %  	/* now check whether we were provided with an array that basically
> 
> Not needed now.
> 
> % @@ -662,4 +675,5 @@
> %  	 * block.
> %  	 */
> % +if (bde_alloc8 == 0) {
> %  	if (bpref)
> %  		start = dtogd(fs, bpref) / NBBY;
> % @@ -679,4 +693,5 @@
> %  		}
> %  	}
> % +}
> % 
> %  	bno = ext2_mapsearch(fs, bbp, bpref);
> 
> The code to skip to the next 8-block boundary should be removed permanently.
> After fixing the initialization, it doesn't generate holes inside files but
> it still generates holes between files.  The holes are quite large with
> 4K-blocks.
> 
> Benchmark results with just the initialization of `goal' variables restored:
> 
> %%%
> ext2fs-1024-1024:
> tarcp /f srcs:                 78.79 real         0.31 user         4.94 sys
> tar cf /dev/zero srcs:         24.62 real         0.19 user         1.82 sys
> ext2fs-1024-1024-as:
> tarcp /f srcs:                 52.07 real         0.26 user         4.95 sys
> tar cf /dev/zero srcs:         24.80 real         0.10 user         1.93 sys
> ext2fs-4096-4096:
> tarcp /f srcs:                 74.14 real         0.34 user         3.96 sys
> tar cf /dev/zero srcs:         33.82 real         0.10 user         1.19 sys
> ext2fs-4096-4096-as:
> tarcp /f srcs:                 53.54 real         0.36 user         3.87 sys
> tar cf /dev/zero srcs:         33.91 real         0.14 user         1.15 sys
> %%%
> 
> The much larger holes between the files are apparently responsible for the
> decreased speed with 4K-blocks.  1K-blocks are really too small, so 4K-blocks
> should be faster.
> 
> Benchmark results with the fix and bde_alloc8 = 1.
> 
> ext2fs-1024-1024:
> tarcp /f srcs:                 71.60 real         0.15 user         2.04 sys
> tar cf /dev/zero srcs:         22.34 real         0.05 user         0.79 sys
> ext2fs-1024-1024-as:
> tarcp /f srcs:                 46.03 real         0.14 user         2.02 sys
> tar cf /dev/zero srcs:         21.97 real         0.05 user         0.80 sys
> ext2fs-4096-4096:
> tarcp /f srcs:                 59.66 real         0.13 user         1.63 sys
> tar cf /dev/zero srcs:         19.88 real         0.07 user         0.46 sys
> ext2fs-4096-4096-as:
> tarcp /f srcs:                 37.30 real         0.12 user         1.60 sys
> tar cf /dev/zero srcs:         19.93 real         0.05 user         0.49 sys
> 
> Bruce

Hi,

I see what you are saying. The gap of 8 block between the files 
is due to the old preallocation which used to allocate additional 
8 blocks in advance for a particular inode when allocating a block
for it. The gap between blocks of the same file shouldn't be there 
too. Both of these cases should be removed. I will look into this 
during this week. The slowness is also due to lack of preallocation
in the new code.

Thanks
Aditya Sarawgi

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 07:25:09 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E4C321065670;
	Wed, 29 Sep 2010 07:25:09 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id EFA738FC0A;
	Wed, 29 Sep 2010 07:25:08 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA19592;
	Wed, 29 Sep 2010 10:24:51 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1P0r23-00037A-0A; Wed, 29 Sep 2010 10:24:51 +0300
Message-ID: <4CA2E9C2.3030806@icyb.net.ua>
Date: Wed, 29 Sep 2010 10:24:50 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Artem Belevich <fbsdlist@src.cx>
References: <4CA1D06C.9050305@digiware.nl>	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>	<4CA1DDE9.8090107@icyb.net.ua>	<20100928132355.GA63149@icarus.home.lan>	<4CA1EF69.4040402@icyb.net.ua>	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>	<4CA21809.7090504@icyb.net.ua>	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>	<4CA22337.2010900@icyb.net.ua>	<F244BA6D-3347-4D76-BAFB-D8B975783877@wanderview.com>	<4CA25E92.4060904@icyb.net.ua>	<5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com>	<4CA26AB4.3050108@icyb.net.ua>
	<AANLkTi=aZp=46pHn9NtYsKBbMq1JwxPKb1D3o3Wngy1V@mail.gmail.com>
In-Reply-To: <AANLkTi=aZp=46pHn9NtYsKBbMq1JwxPKb1D3o3Wngy1V@mail.gmail.com>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: stable@freebsd.org, fs@freebsd.org, Ben Kelly <ben@wanderview.com>
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 07:25:10 -0000

on 29/09/2010 03:38 Artem Belevich said the following:
> On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon <avg@icyb.net.ua> wrote:
>> BTW, have you seen my posts about UMA and ZFS on hackers@ ?
>> I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing
>> size of per-CPU caches for the zones with large-sized items.
>> I further modified the code in my local tree to completely disable per-CPU
>> caches for items > 32KB.
> 
> Do you have updated patch disabling per-cpu caches for large items?
> I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050
> from -head to compile) and so far things look good. I'll re-enable UMA
> for ZFS and see how it flies in a couple of days.

I've just uploaded uma-3.diff.
It implements what uma-1.diff did, plus totally skips per-CPU caches for items >
32KB, and also has code from uma-2.diff for flushing per-CPU caches on
significant memory shortage.

Will appreciate your feedback.
Thank you for testing!

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 09:33:56 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 87E041065674
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 09:33:56 +0000 (UTC)
	(envelope-from kpielorz_lst@tdx.co.uk)
Received: from mail.tdx.com (mail.tdx.com [62.13.128.18])
	by mx1.freebsd.org (Postfix) with ESMTP id 2BF758FC18
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 09:33:55 +0000 (UTC)
Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk
	[62.13.130.30]) (authenticated bits=0)
	by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8T9L1ww029415
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 10:21:02 +0100 (BST)
Date: Wed, 29 Sep 2010 10:20:22 +0100
From: Karl Pielorz <kpielorz_lst@tdx.co.uk>
To: freebsd-fs@freebsd.org
Message-ID: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 09:33:56 -0000


Hi All,

I moved my machine from FreeBSD 7.2-S/amd64 to 8.1-R/amd64 about a week 
ago. Since then I've noticed that ZFS just 'hangs' - e.g. it'll work fine 
for a few days, then a process will get 'hung up' waiting on ZFS.

The machine is a Tyan motherboard (dual Opteron, dual cores, w/10Gb of 
RAM). 7.2-S & ZFS ran perfectly under it.

Anything else then that touches the pools, also 'hangs' - in top the 
original process shows as:

"
1927 root        1  44    0  8224K  1544K zio->i  0   0:00  0.00% ls
"

Anything else that touches the ZFS pools, ends up like:

"
2082 root        1  44    0 10284K  2976K zfs     3   0:00  0.00% csh
"

I saw a while ago a command under 8.1 to get 'more info' for these stuck 
processes, but can't for the life of me remember it?

If someone can give me some pointers to try and track down what's hanging?

The drives are spread over two Marvell 88SX6081's. I've tried the mvs 
driver for that controller, which gave me a bucket load of errors, and data 
corruption :(

Switching back to the standard ATA drivers for that card, I just get hangs 
:( - nothing is logged on the console, or syslog when this happens.


Thanks,

-Karl

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 10:24:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A120C10656C3
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 10:24:52 +0000 (UTC)
	(envelope-from martin@lispworks.com)
Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com
	[193.34.186.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 27AAA8FC1D
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 10:24:51 +0000 (UTC)
Received: from higson.cam.lispworks.com
	(IDENT:U2FsdGVkX19XpWCdIITMFBIhEBJ0bWFJ98KLHWFUmcA@higson
	[192.168.1.7])
	by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id
	o8TAOnui064769; Wed, 29 Sep 2010 11:24:49 +0100 (BST)
	(envelope-from martin@lispworks.com)
Received: from higson.cam.lispworks.com by higson.cam.lispworks.com (8.13.1)
	id o8TAOn1N013733; Wed, 29 Sep 2010 11:24:49 +0100
Received: (from martin@localhost)
	by higson.cam.lispworks.com (8.13.1/8.13.1/Submit) id o8TAOnph013730;
	Wed, 29 Sep 2010 11:24:49 +0100
Date: Wed, 29 Sep 2010 11:24:49 +0100
Message-Id: <201009291024.o8TAOnph013730@higson.cam.lispworks.com>
From: Martin Simmons <martin@lispworks.com>
To: freebsd-fs@freebsd.org
In-reply-to: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk> (message
	from Karl Pielorz on Wed, 29 Sep 2010 10:20:22 +0100)
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 10:24:52 -0000

>>>>> On Wed, 29 Sep 2010 10:20:22 +0100, Karl Pielorz said:
> 
> I saw a while ago a command under 8.1 to get 'more info' for these stuck 
> processes, but can't for the life of me remember it?

Maybe procstat -k -k $pid is what you are looking for (i.e. a kernel
backtrace)?  Use -a instead of $pid to get all processes.

__Martin

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 10:31:12 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A9F37106564A
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 10:31:12 +0000 (UTC)
	(envelope-from kpielorz_lst@tdx.co.uk)
Received: from mail.tdx.com (mail.tdx.com [62.13.128.18])
	by mx1.freebsd.org (Postfix) with ESMTP id 4B5E38FC14
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 10:31:11 +0000 (UTC)
Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk
	[62.13.130.30]) (authenticated bits=0)
	by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8TAV90r035552
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Wed, 29 Sep 2010 11:31:10 +0100 (BST)
Date: Wed, 29 Sep 2010 11:30:29 +0100
From: Karl Pielorz <kpielorz_lst@tdx.co.uk>
To: Martin Simmons <martin@lispworks.com>, freebsd-fs@freebsd.org
Message-ID: <8CF1F1F15531907E2F8DC2A2@HexaDeca64.dmpriest.net.uk>
In-Reply-To: <201009291024.o8TAOnph013730@higson.cam.lispworks.com>
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
	<201009291024.o8TAOnph013730@higson.cam.lispworks.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: 
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 10:31:12 -0000

--On 29 September 2010 11:24 +0100 Martin Simmons <martin@lispworks.com> 
wrote:

>> I saw a while ago a command under 8.1 to get 'more info' for these stuck
>> processes, but can't for the life of me remember it?
>
> Maybe procstat -k -k $pid is what you are looking for (i.e. a kernel
> backtrace)?  Use -a instead of $pid to get all processes.

Yup, that's it - thanks!

Having run it I get:

procstat -k -k 1927 (PID 1927 is the 'ls' that's locked up)

  PID    TID COMM             TDNAME           KSTACK
 1927 100206 ls               -                mi_switch+0x16f 
sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dbuf_read+0x39a 
dnode_hold_impl+0xe7 dmu_bonus_hold+0x2a zfs_zget+0x227 
zfs_dirent_lock+0x4e3 zfs_dirlook+0x69 zfs_lookup+0x1f0 
zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf0 VOP_LOOKUP_APV+0x40 
lookup+0x40a namei+0x52b kern_statat_vnhook+0x8f kern_statat+0x15


Which will hopefully mean something more to someone here than it does me at 
the moment ;)

-Karl

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 13:26:06 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 878E710656B9
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 13:26:06 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 0947A8FC17
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 13:26:06 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 99CBA46C12;
	Wed, 29 Sep 2010 09:26:05 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 236898A04E;
	Wed, 29 Sep 2010 09:26:04 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-fs@freebsd.org
Date: Wed, 29 Sep 2010 09:17:04 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; )
References: <20100929031825.L683@besplex.bde.org>
	<20100929084801.M948@besplex.bde.org>
	<20100929041650.GA1553@aditya>
In-Reply-To: <20100929041650.GA1553@aditya>
MIME-Version: 1.0
Content-Type: Multipart/Mixed;
  boundary="Boundary-00=_QxzoMs/Iug8+N80"
Message-Id: <201009290917.05269.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Wed, 29 Sep 2010 09:26:04 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: 
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 13:26:06 -0000

--Boundary-00=_QxzoMs/Iug8+N80
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote:
> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
> > On Wed, 29 Sep 2010, Bruce Evans wrote:
> > 
> > > On Wed, 29 Sep 2010, Bruce Evans wrote:
> > >
> > >> For benchmarks on ext2fs:
> > >> 
> > >> Under FreeBSD-~5.2 rerun today:
> > >> untar:     59.17 real
> > >> tar:       19.52 real
> > >> 
> > >> Under -current run today:
> > >> untar:    101.16 real
> > >> tar:      172.03 real
> > >> 
> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
> > >> untar.
> > >> ...
> > >> So it seems that only 1 block in every 8 is used, and there is a seek
> > >> after every block.  This asks for an 8-fold reduction in throughput,
> > >> and it seems to have got that and a bit more for reading although not
> > >> for writing.  Even (or especially) with perfect hardware, it must give
> > >> an 8-fold reduction.  And it is likely to give more, since it defeats
> > >> vfs clustering by making all runs of contiguous blocks have length 1.
> > >> 
> > >> Simple sequential allocation should be used unless the allocation policy
> > >> and implementation are very good.
> > >
> > > This work a bit better after zapping the 8-fold way:
> >    Things
> > > ...
> > > This gives an improvement of:
> > >
> > > untar:    101.16 real -> 63.46
> > > tar:      172.03 real -> 50.70
> > >
> > > Now -current is only 1.1 times slower for untar and 2.6 times slower for
> > > tar.
> > >
> > > There must be a problem with bpref for things to have been so bad.  There
> > > is some point to leaving a gap of 7 blocks for expansion, but the gap was
> > > left even between blocks in a single file.
> > > ...
> > > I haven't tried the bde_blkpref hack in the above.  It should kill bpref
> > > completely so that there is no jump between lbn0 and lbn1, and break
> > > cylinder group based allocation even better.  Setting bde_blkpref to 1
> > > restores the bug that was present in ext2fs in FreeBSD between 1995 and
> > > 2010.  This bug gave seqential allocation starting at the beginning of
> > > the disk in almost all cases, so map searches were slow and early groups
> > > filled up before later groups were used at all.
> > 
> > Tried this (patch repeated below), and it gave essentially the same
> > speed as old versions.
> > 
> > The main problem seems to be that the `goal' variables aren't initialized.
> > After restoring bits verbatim from an old version, things seem to work as
> > expected:
> > 
> > % Index: ext2_alloc.c
> > % ===================================================================
> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
> > % retrieving revision 1.2
> > % diff -u -2 -r1.2 ext2_alloc.c
> > % --- ext2_alloc.c	1 Sep 2010 05:34:17 -0000	1.2
> > % +++ ext2_alloc.c	28 Sep 2010 21:08:42 -0000
> > % @@ -1,2 +1,5 @@
> > % +int bde_blkpref = 0;
> > % +int bde_alloc8 = 0;
> > % +
> > %  /*-
> > %   *  modified for Lites 1.1
> > % @@ -117,4 +120,8 @@
> > %                                                   ext2_alloccg);
> > %          if (bno > 0) {
> > % +		/* set next_alloc fields as done in block_getblk */
> > % +		ip->i_next_alloc_block = lbn;
> > % +		ip->i_next_alloc_goal = bno;
> > % +
> > %                  ip->i_blocks += btodb(fs->e2fs_bsize);
> > %                  ip->i_flag |= IN_CHANGE | IN_UPDATE;
> > 
> > The only things that changed recently in this block were the 4 deleted
> > lines and 4 lines with tabs corrupted to spaces.  Perhaps an editing
> > error.
> > 
> > % @@ -542,6 +549,12 @@
> > %  	   then set the goal to what we thought it should be
> > %  	*/
> > % +if (bde_blkpref == 0) {
> > %  	if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
> > %  		return ip->i_next_alloc_goal;
> > % +} else if (bde_blkpref == 1) {
> > % +	if(ip->i_next_alloc_block == lbn)
> > % +		return ip->i_next_alloc_goal;
> > % +} else
> > % +	return 0;
> > % 
> > %  	/* now check whether we were provided with an array that basically
> > 
> > Not needed now.
> > 
> > % @@ -662,4 +675,5 @@
> > %  	 * block.
> > %  	 */
> > % +if (bde_alloc8 == 0) {
> > %  	if (bpref)
> > %  		start = dtogd(fs, bpref) / NBBY;
> > % @@ -679,4 +693,5 @@
> > %  		}
> > %  	}
> > % +}
> > % 
> > %  	bno = ext2_mapsearch(fs, bbp, bpref);
> > 
> > The code to skip to the next 8-block boundary should be removed permanently.
> > After fixing the initialization, it doesn't generate holes inside files but
> > it still generates holes between files.  The holes are quite large with
> > 4K-blocks.
> > 
> > Benchmark results with just the initialization of `goal' variables restored:
> > 
> > %%%
> > ext2fs-1024-1024:
> > tarcp /f srcs:                 78.79 real         0.31 user         4.94 sys
> > tar cf /dev/zero srcs:         24.62 real         0.19 user         1.82 sys
> > ext2fs-1024-1024-as:
> > tarcp /f srcs:                 52.07 real         0.26 user         4.95 sys
> > tar cf /dev/zero srcs:         24.80 real         0.10 user         1.93 sys
> > ext2fs-4096-4096:
> > tarcp /f srcs:                 74.14 real         0.34 user         3.96 sys
> > tar cf /dev/zero srcs:         33.82 real         0.10 user         1.19 sys
> > ext2fs-4096-4096-as:
> > tarcp /f srcs:                 53.54 real         0.36 user         3.87 sys
> > tar cf /dev/zero srcs:         33.91 real         0.14 user         1.15 sys
> > %%%
> > 
> > The much larger holes between the files are apparently responsible for the
> > decreased speed with 4K-blocks.  1K-blocks are really too small, so 4K-blocks
> > should be faster.
> > 
> > Benchmark results with the fix and bde_alloc8 = 1.
> > 
> > ext2fs-1024-1024:
> > tarcp /f srcs:                 71.60 real         0.15 user         2.04 sys
> > tar cf /dev/zero srcs:         22.34 real         0.05 user         0.79 sys
> > ext2fs-1024-1024-as:
> > tarcp /f srcs:                 46.03 real         0.14 user         2.02 sys
> > tar cf /dev/zero srcs:         21.97 real         0.05 user         0.80 sys
> > ext2fs-4096-4096:
> > tarcp /f srcs:                 59.66 real         0.13 user         1.63 sys
> > tar cf /dev/zero srcs:         19.88 real         0.07 user         0.46 sys
> > ext2fs-4096-4096-as:
> > tarcp /f srcs:                 37.30 real         0.12 user         1.60 sys
> > tar cf /dev/zero srcs:         19.93 real         0.05 user         0.49 sys
> > 
> > Bruce
> 
> Hi,
> 
> I see what you are saying. The gap of 8 block between the files 
> is due to the old preallocation which used to allocate additional 
> 8 blocks in advance for a particular inode when allocating a block
> for it. The gap between blocks of the same file shouldn't be there 
> too. Both of these cases should be removed. I will look into this 
> during this week. The slowness is also due to lack of preallocation
> in the new code.

One of the GSoC students worked on a patch to add preallocation back to
ext2fs this summer.  Would you be interested in reviewing and/or testing
that patch?  (I've attached it).  Here is his original e-mail:

<quote>
Hi all,

There is a patch in attachment which implements a preallocation 
algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010.

This patch implements the in-memory ext2/3 block preallocation algorithm 
from reservation window. It uses a RB-tree to index block allocation 
request and reserve a number of blocks for each file which has requested 
to allocate a block. When a file request to allocate a block, it will 
find a block to allocate to this file. When it find the block to 
allocate, it will try to allocate a block, which is in the same cylinder 
group with inode and is not in other reservation window in RB-tree. 
Meanwhile there are some contiguous free blocks after this block. It 
uses a data structure to store this block's position and the length of 
contiguous free blocks. Then it inserts this data structure into 
RB-tree. When this file request to allocate a block again, It will find 
corresponding data structure in RB-tree. If it can find, the next free 
block will be allocated to this file directly. Otherwise, it will search 
a new block again.

I have run some benchmarks to test this algorithm. Please review it in 
wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance 
is better when the number of threads is smaller than 4. When the number 
of threads is greater than 4, the performance can be increased a little.

Please test it.


Thanks and best regards,

lz
</quote>

-- 
John Baldwin

--Boundary-00=_QxzoMs/Iug8+N80
Content-Type: text/x-patch;
  charset="UTF-8";
  name="ext2fs_prealloc.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="ext2fs_prealloc.patch"

diff -urN /usr/src/sys/fs/ext2fs/ext2_alloc.c new/ext2_alloc.c
--- /usr/src/sys/fs/ext2fs/ext2_alloc.c	2010-01-14 22:30:54.000000000 +0800
+++ new/ext2_alloc.c	2010-08-19 02:47:29.000000000 +0800
@@ -50,6 +50,9 @@
 #include <fs/ext2fs/ext2fs.h>
 #include <fs/ext2fs/fs.h>
 #include <fs/ext2fs/ext2_extern.h>
+#include <fs/ext2fs/ext2_rsv_win.h>
+
+#define phy_blk(cg, fs) (((cg) * (fs->e2fs->e2fs_fpg)) + fs->e2fs->e2fs_first_dblock)
 
 static daddr_t	ext2_alloccg(struct inode *, int, daddr_t, int);
 static u_long	ext2_dirpref(struct inode *);
@@ -59,37 +62,524 @@
 						int));
 static daddr_t	ext2_nodealloccg(struct inode *, int, daddr_t, int);
 static daddr_t  ext2_mapsearch(struct m_ext2fs *, char *, daddr_t);
+
+/* For reservation window */
+static u_long   ext2_alloc_blk(struct inode *, int, struct buf *, int32_t, struct ext2_rsv_win *);
+static int      ext2_alloc_new_rsv(struct inode *, int, struct buf *, int32_t);
+static int      ext2_bpref_in_rsv(struct ext2_rsv_win *, int32_t);
+static int      ext2_find_rsv(struct ext2_rsv_win *, struct ext2_rsv_win *,
+                              struct m_ext2fs *, int32_t, int);
+static void	ext2_remove_rsv_win(struct m_ext2fs *, struct ext2_rsv_win *);
+static u_long   ext2_rsvalloc(struct m_ext2fs *, struct inode *,
+                              int, struct buf *, int32_t, int);
+static daddr_t  ext2_search_next_block(struct m_ext2fs *, char *, int, int);
+static struct ext2_rsv_win *ext2_search_rsv(struct ext2_rsv_win_tree *, int32_t);
+
+RB_GENERATE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp);
+
 /*
  * Allocate a block in the file system.
  *
- * A preference may be optionally specified. If a preference is given
- * the following hierarchy is used to allocate a block:
- *   1) allocate the requested block.
- *   2) allocate a rotationally optimal block in the same cylinder.
- *   3) allocate a block in the same cylinder group.
- *   4) quadradically rehash into other cylinder groups, until an
- *        available block is located.
- * If no block preference is given the following hierarchy is used
- * to allocate a block:
- *   1) allocate a block in the cylinder group that contains the
- *        inode for the file.
- *   2) quadradically rehash into other cylinder groups, until an
- *        available block is located.
- *
- * A preference may be optionally specified. If a preference is given
- * the following hierarchy is used to allocate a block:
- *   1) allocate the requested block.
- *   2) allocate a rotationally optimal block in the same cylinder.
- *   3) allocate a block in the same cylinder group.
- *   4) quadradically rehash into other cylinder groups, until an
- *        available block is located.
- * If no block preference is given the following hierarchy is used
- * to allocate a block:
- *   1) allocate a block in the cylinder group that contains the
- *        inode for the file.
- *   2) quadradically rehash into other cylinder groups, until an
- *        available block is located.
+ * By given preference:
+ *   Check whether inode has a reservation window and preference
+ *   is within it and try to allocate a free block from
+ *   this reservation window.
+ *   If not, traverse RB tree to find a place, which is not in
+ *   any window and insert it to RB tree to try to allocate a
+ *   free block again.
+ *   If it fails, try to allocate a free block in other cylinder
+ *   groups without preference.
+ */
+
+/*
+ * Allocate a free block.
+ *
+ * First check whether reservation window is used.
+ * If reservation window is used, try to allocate a free
+ * block from the reservation window. If it fails, traverse
+ * the bitmap to find a free block.
+ * If reservation window is not used, try to allocate
+ * a free block by bpref. If it fails, traverse the bitmap
+ * to find a free block.
  */
+static u_long
+ext2_alloc_blk(struct inode *ip, int cg, struct buf *bp,
+    int32_t bpref, struct ext2_rsv_win *rp)
+{
+	struct m_ext2fs *fs;
+	struct ext2mount *ump;
+	int bno, start, end;
+	char *bbp;
+
+	fs = ip->i_e2fs;
+	ump = ip->i_ump;
+	bbp = (char *)bp->b_data;
+
+	if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0)
+		return (0);
+
+        if (bpref < 0)
+                bpref = 0;
+
+        /* Check whether it use reservation window */
+        if (rp != NULL) {
+                /*
+                 * If window's start is not in this cylinder group,
+                 * try to allocate from the beginning, otherwise
+                 * try to allocate from the beginning of the
+                 * window.
+                 */
+                if (dtog(fs, rp->rsv_start) < cg)
+                        start = 0;
+                else
+                        start = rp->rsv_start;
+
+                /*
+                 * If window's end crosses the end of this group,
+                 * set end variable to the end of this group.
+                 * Otherwise, set it to the window's end.
+                 */
+                if (dtog(fs, rp->rsv_end) > cg)
+                        end = phy_blk(cg + 1, fs) - 1;
+                else
+                        end = rp->rsv_end;
+
+                /* If preference block is within the window, try to allocate it. */
+                if (start <= bpref && bpref <= end) {
+                        bpref = dtogd(fs, bpref);
+                        if (isclr(bbp, bpref)) {
+                                rp->rsv_alloc_hit++;
+                                bno = bpref;
+                                goto gotit;
+                        }
+                } else
+                        if (dtog(fs, rp->rsv_start) == cg)
+                                bpref = dtogd(fs, rp->rsv_start);
+                        else
+                                bpref = 0;
+        } else {
+                if (dtog(fs, bpref) != cg)
+                        bpref = 0;
+                if (bpref != 0) {
+                        bpref = dtogd(fs, bpref);
+                        if (isclr(bbp, bpref)) {
+                                bno = bpref;
+                                goto gotit;
+                        }
+                }
+        }
+
+	bno = ext2_mapsearch(fs, bbp, bpref);
+	if (bno < 0)
+		return (0);
+
+gotit:
+	setbit(bbp, (daddr_t)bno);
+	EXT2_LOCK(ump);
+	fs->e2fs->e2fs_fbcount--;
+	fs->e2fs_gd[cg].ext2bgd_nbfree--;
+	fs->e2fs_fmod = 1;
+	EXT2_UNLOCK(ump);
+	bdwrite(bp);
+	bno = phy_blk(cg, fs) + bno;
+        return (bno);
+}
+
+/*
+ * Initialize reservation window per inode.
+ */
+void
+ext2_init_rsv(struct inode *ip)
+{
+	struct ext2_rsv_win *rp;
+
+	rp = malloc(sizeof(struct ext2_rsv_win),
+	    M_EXT2NODE, M_WAITOK | M_ZERO);
+
+	/* 
+         * If malloc failed, we just do not use the
+	 * reservation window mechanism.
+	 */
+	if (rp == NULL)
+		return;
+
+	rp->rsv_start = EXT2_RSV_NOT_ALLOCATED;
+	rp->rsv_end = EXT2_RSV_NOT_ALLOCATED;
+
+	rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS;
+	rp->rsv_alloc_hit = 0;
+
+	ip->i_rsv = rp;
+} 
+
+/*
+ * Discard reservation window.
+ *
+ * It is called during the following situations:
+ * 1. free an inode
+ * 2. sync inode
+ * 3. truncate a file
+ */
+void
+ext2_discard_rsv(struct inode *ip)
+{
+	struct ext2_rsv_win *rp;
+
+	if (ip->i_rsv == NULL) 
+                return;
+
+	rp = ip->i_rsv;
+
+        /* If reservation window is empty, nothing to do */
+	if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED)
+                return;
+
+        EXT2_TREE_LOCK(ip->i_e2fs);
+        ext2_remove_rsv_win(ip->i_e2fs, rp);
+        EXT2_TREE_UNLOCK(ip->i_e2fs);
+        rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS;
+}
+
+/*
+ * Remove a ext2_rsv_win structure from RB tree.
+ */
+static void
+ext2_remove_rsv_win(struct m_ext2fs *fs, struct ext2_rsv_win *rp)
+{
+	RB_REMOVE(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp);
+	rp->rsv_start = EXT2_RSV_NOT_ALLOCATED;
+	rp->rsv_end = EXT2_RSV_NOT_ALLOCATED;
+	rp->rsv_alloc_hit = 0;
+}
+
+/*
+ * Check bpref is in the reservation window.
+ */
+static int
+ext2_bpref_in_rsv(struct ext2_rsv_win *rp, int32_t bpref)
+{
+        if (bpref >= 0 && (bpref < rp->rsv_start || bpref > rp->rsv_end))
+                return (0);
+
+        return (1);
+}
+
+/*
+ * Search a tree node from RB tree. It includes the bpref or
+ * the previous one if bpref is not in any window.
+ */
+static struct ext2_rsv_win *
+ext2_search_rsv(struct ext2_rsv_win_tree *root, int32_t start)
+{
+        struct ext2_rsv_win *prev, *next;
+
+        if (RB_EMPTY(root))
+                return (NULL);
+
+        next = RB_ROOT(root);
+        do {
+                prev = next;
+                if (start < next->rsv_start)
+                        next = RB_LEFT(next, rsv_link);
+                else if (start > next->rsv_end)
+                        next = RB_RIGHT(next, rsv_link);
+                else
+                        return (next);
+        } while (next != NULL);
+
+        if (prev->rsv_start > start) {
+                next = RB_PREV(ext2_rsv_win_tree, root, prev);
+                if (next != NULL)
+                        prev = next;
+        }
+
+        return (prev);
+}
+
+/*
+ * Find a reservation window by given range from start to
+ * the end of this cylinder group.
+ */
+static int
+ext2_find_rsv(struct ext2_rsv_win *search, struct ext2_rsv_win *rp,
+    struct m_ext2fs *fs, int32_t start, int cg)
+{
+        struct ext2_rsv_win *rsv, *prev;
+        int32_t cur;
+        int size = rp->rsv_goal_size;
+
+        if (search == NULL) {
+                rp->rsv_start = start & ~7;
+                rp->rsv_end = start + size - 1;
+                rp->rsv_alloc_hit = 0;
+
+                RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp);
+
+                return (0);
+        }
+
+        /*
+         * Make the start of reservation window byte-aligned
+         * in order to can find a free block with bit operations
+         * in the ext2_search_next_block() function.
+         */
+        cur = start & ~7;
+        rsv = search;
+        prev = NULL;
+
+        while (1) {
+                if (cur <= rsv->rsv_end)
+                        cur = rsv->rsv_end + 1;
+
+                if (dtog(fs, cur) != cg)
+                        return (-1);
+
+                prev = rsv;
+                rsv = RB_NEXT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rsv);
+
+                if (rsv == NULL)
+                        break;
+
+                if (cur + size <= rsv->rsv_start)
+                        break;
+        }
+
+        if (prev != rp && rp->rsv_end != EXT2_RSV_NOT_ALLOCATED)
+                ext2_remove_rsv_win(fs, rp);
+
+        rp->rsv_start = cur;
+        rp->rsv_end = cur + size - 1;
+        rp->rsv_alloc_hit = 0;
+
+        if (prev != rp)
+                RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp);
+
+        return (0);
+}
+
+/*
+ * Find a free block by given range from bpref to
+ * the end of this cylinder group.
+ */
+static daddr_t
+ext2_search_next_block(struct m_ext2fs *fs, char *bbp, int bpref, int cg)
+{
+        daddr_t bno;
+        int start, loc, len, map, i;
+
+        start = bpref / NBBY;
+        len = howmany(fs->e2fs->e2fs_fpg, NBBY) - start;
+        loc = skpc(0xff, len, &bbp[start]);
+        if (loc == 0)
+                return (-1);
+
+        i = start + len - loc;
+        map = bbp[i];
+        bno = i * NBBY;
+        for (i = 1; i < (1 << NBBY); i <<= 1, bno++) {
+                if ((map & i) == 0)
+                        return (bno);
+        }
+
+        return (-1);
+}
+
+/*
+ * Allocate a new reservation window.
+ */
+static int
+ext2_alloc_new_rsv(struct inode *ip, int cg, struct buf *bp, int32_t bpref)
+{
+        struct m_ext2fs *fs;
+        struct ext2_rsv_win *rp, *search;
+        char *bbp;
+        int start, size, ret;
+
+        fs = ip->i_e2fs;
+        rp = ip->i_rsv;
+        bbp = bp->b_data;
+        size = rp->rsv_goal_size;
+
+        if (bpref <= 0)
+                start = phy_blk(cg, fs);
+        else
+                start = bpref;
+
+        /* Dynamically increase the size of window */
+        if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) {
+                if (rp->rsv_alloc_hit >
+                    ((rp->rsv_end - rp->rsv_start + 1) / 2)) {
+                        size = size * 2;
+                        if (size > EXT2_RSV_MAX_RESERVE_BLKS)
+                                size = EXT2_RSV_MAX_RESERVE_BLKS;
+                        rp->rsv_goal_size = size;
+                }
+        }
+
+        EXT2_TREE_LOCK(fs);
+
+        search = ext2_search_rsv(fs->e2fs_rsv_tree, start);
+
+repeat:
+        ret = ext2_find_rsv(search, rp, fs, start, cg);
+        if (ret < 0) {
+                if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED)
+                        ext2_remove_rsv_win(fs, rp);
+                EXT2_TREE_UNLOCK(fs);
+                return (-1);
+        }
+        EXT2_TREE_UNLOCK(fs);
+
+        start = dtogd(fs, rp->rsv_start);
+        start = ext2_search_next_block(fs, bbp, start, cg);
+        if (start < 0) {
+                EXT2_TREE_LOCK(fs);
+                if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED)
+                        ext2_remove_rsv_win(fs, rp);
+                EXT2_TREE_UNLOCK(fs);
+                return (-1);
+        }
+
+        start = phy_blk(cg, fs) + start;
+        if (start >= rp->rsv_start && start <= rp->rsv_end)
+                return (0);
+
+        search = rp;
+        EXT2_TREE_LOCK(fs);
+        goto repeat;
+}
+
+/*
+ * Allocate a free block from reservation window.
+ */
+static u_long
+ext2_rsvalloc(struct m_ext2fs *fs, struct inode *ip, int cg,
+    struct buf *bp, int32_t bpref, int size)
+{
+        struct ext2_rsv_win *rp;
+        int ret;
+
+        rp = ip->i_rsv;
+        if (rp == NULL)
+                return (ext2_alloc_blk(ip, cg, bp, bpref, NULL));
+
+        if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED ||
+            !ext2_bpref_in_rsv(rp, bpref)) {
+                ret = ext2_alloc_new_rsv(ip, cg, bp, bpref);
+                if (ret < 0)
+                        return (0);
+        }
+
+        return (ext2_alloc_blk(ip, cg, bp, bpref, rp));
+}
+
+/*
+ * Allocate a block using reservation window in ext2 file system.
+ *
+ * NOTE: This function will replace the ext2_alloc() function.
+ */
+int
+ext2_alloc_rsv(struct inode *ip, int32_t lbn, int32_t bpref,
+    int size, struct ucred *cred, int32_t *bnp)
+{
+	struct m_ext2fs *fs;
+	struct ext2mount *ump;
+        struct buf *bp;
+	int32_t bno = 0;
+	int i, cg, error;
+
+	*bnp = 0;
+	fs = ip->i_e2fs;
+	ump = ip->i_ump;
+	mtx_assert(EXT2_MTX(ump), MA_OWNED);
+
+	if (size == fs->e2fs_bsize && fs->e2fs->e2fs_fbcount == 0)
+		goto nospace;
+	if (cred->cr_uid != 0 && 
+	    fs->e2fs->e2fs_fbcount < fs->e2fs->e2fs_rbcount)
+		goto nospace;
+
+	if (bpref >= fs->e2fs->e2fs_bcount)
+		bpref = 0;
+	if (bpref == 0)
+		cg = ino_to_cg(fs, ip->i_number);
+	else
+		cg = dtog(fs, bpref);
+
+        /* If cg has some free blocks, then try to allocate a free block from this cg */
+        if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) {
+                /* Read block bitmap from buffer */
+                EXT2_UNLOCK(ump);
+                error = bread(ip->i_devvp,
+                    fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap),
+                    (int)fs->e2fs_bsize, NOCRED, &bp);
+                if (error) {
+                        brelse(bp);
+                        goto ioerror;
+                }
+
+                EXT2_RSV_LOCK(ip);
+                /* Try to allocate from reservation window */
+                bno = ext2_rsvalloc(fs, ip, cg, bp, bpref, size);
+                EXT2_RSV_UNLOCK(ip);
+                if (bno > 0)
+                        goto allocated;
+
+                brelse(bp);
+                EXT2_LOCK(ump);
+        }
+
+        /* Just need to try to allocate a free block from rest groups. */
+        cg = (cg + 1) % fs->e2fs_gcount;
+        for (i = 1; i < fs->e2fs_gcount; i++) {
+                if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) {
+                        /* Read block bitmap from buffer */
+                        EXT2_UNLOCK(ump);
+                        error = bread(ip->i_devvp,
+                            fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap),
+                            (int)fs->e2fs_bsize, NOCRED, &bp);
+                        if (error) {
+                                brelse(bp);
+                                goto ioerror;
+                        }
+
+                        EXT2_RSV_LOCK(ip);
+                        bno = ext2_rsvalloc(fs, ip, cg, bp, -1, size);
+                        EXT2_RSV_UNLOCK(ip);
+                        if (bno > 0)
+                                goto allocated;
+
+                        brelse(bp);
+                        EXT2_LOCK(ump);
+                }
+
+                cg++;
+                if (cg == fs->e2fs_gcount)
+                        cg = 0;
+        }
+
+allocated:
+        if (bno > 0) {
+                ip->i_next_alloc_block = lbn;
+                ip->i_next_alloc_goal = bno;
+
+                ip->i_blocks += btodb(fs->e2fs_bsize);
+                ip->i_flag |= IN_CHANGE | IN_UPDATE;
+                *bnp = bno;
+                return (0);
+        }
+
+nospace:
+	EXT2_UNLOCK(ump);
+	ext2_fserr(fs, cred->cr_uid, "file system full");
+	uprintf("\n%s: write failed, file system is full\n", fs->e2fs_fsmnt);
+	return (ENOSPC);
+
+ioerror:
+        ext2_fserr(fs, cred->cr_uid, "file system IO error");
+        uprintf("\n%s: write failed, file system IO error\n", fs->e2fs_fsmnt);
+        return (EIO);
+}
 
 int
 ext2_alloc(ip, lbn, bpref, size, cred, bnp)
@@ -923,9 +1413,11 @@
 		start = 0;
 		loc = skpc(0xff, len, &bbp[start]);
 		if (loc == 0) {
-			printf("start = %d, len = %d, fs = %s\n",
-				start, len, fs->e2fs_fsmnt);
-			panic("ext2fs_alloccg: map corrupted");
+                        /* XXX: just for reservation window */
+                        return -1;
+			/*printf("start = %d, len = %d, fs = %s\n",*/
+				/*start, len, fs->e2fs_fsmnt);*/
+			/*panic("ext2fs_alloccg: map corrupted");*/
 			/* NOTREACHED */
 		}
 	}
diff -urN /usr/src/sys/fs/ext2fs/ext2_balloc.c new/ext2_balloc.c
--- /usr/src/sys/fs/ext2fs/ext2_balloc.c	2010-01-14 22:30:54.000000000 +0800
+++ new/ext2_balloc.c	2010-08-19 02:47:29.000000000 +0800
@@ -49,6 +49,7 @@
 #include <fs/ext2fs/fs.h>
 #include <fs/ext2fs/ext2_extern.h>
 #include <fs/ext2fs/ext2_mount.h>
+#include <fs/ext2fs/ext2_rsv_win.h>
 /*
  * Balloc defines the structure of file system storage
  * by allocating the physical blocks on a device given
@@ -78,6 +79,9 @@
 	fs = ip->i_e2fs;
 	ump = ip->i_ump;
 
+        if (ip->i_rsv == NULL)
+                ext2_init_rsv(ip);
+
 	/*
 	 * check if this is a sequential block allocation. 
 	 * If so, increment next_alloc fields to allow ext2_blkpref 
@@ -136,9 +140,9 @@
 			else
 				nsize = fs->e2fs_bsize;
 			EXT2_LOCK(ump);
-			error = ext2_alloc(ip, lbn,
-			    ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0),
-			    nsize, cred, &newb);
+			error = ext2_alloc_rsv(ip, lbn,
+				    ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0),
+				    nsize, cred, &newb);
 			if (error)
 				return (error);
 			bp = getblk(vp, lbn, nsize, 0, 0, 0);
@@ -170,9 +174,9 @@
 		EXT2_LOCK(ump);
 		pref = ext2_blkpref(ip, lbn, indirs[0].in_off + 
 					     EXT2_NDIR_BLOCKS, &ip->i_db[0], 0);
-	        if ((error = ext2_alloc(ip, lbn, pref, 
-			(int)fs->e2fs_bsize, cred, &newb)))
-			return (error);
+	        if ((error = ext2_alloc_rsv(ip, lbn, pref, 
+				(int)fs->e2fs_bsize, cred, &newb)))
+				return (error);
 		nb = newb;
 		bp = getblk(vp, indirs[1].in_lbn, fs->e2fs_bsize, 0, 0, 0);
 		bp->b_blkno = fsbtodb(fs, newb);
@@ -211,7 +215,7 @@
 		if (pref == 0)
 			pref = ext2_blkpref(ip, lbn, indirs[i].in_off, bap,
 						bp->b_lblkno);
-		error =  ext2_alloc(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb);
+		error =  ext2_alloc_rsv(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb);
 		if (error) {
 			brelse(bp);
 			return (error);
@@ -250,8 +254,8 @@
 		EXT2_LOCK(ump);
 		pref = ext2_blkpref(ip, lbn, indirs[i].in_off, &bap[0], 
 				bp->b_lblkno);
-		if ((error = ext2_alloc(ip,
-		    lbn, pref, (int)fs->e2fs_bsize, cred, &newb)) != 0) {
+		if ((error = ext2_alloc_rsv(ip, lbn, pref,
+				(int)fs->e2fs_bsize, cred, &newb)) != 0) {
 			brelse(bp);
 			return (error);
 		}
diff -urN /usr/src/sys/fs/ext2fs/ext2_inode.c new/ext2_inode.c
--- /usr/src/sys/fs/ext2fs/ext2_inode.c	2010-01-14 22:30:54.000000000 +0800
+++ new/ext2_inode.c	2010-08-19 02:47:29.000000000 +0800
@@ -52,6 +52,7 @@
 #include <fs/ext2fs/ext2fs.h>
 #include <fs/ext2fs/fs.h>
 #include <fs/ext2fs/ext2_extern.h>
+#include <fs/ext2fs/ext2_rsv_win.h>
 
 static int ext2_indirtrunc(struct inode *, int32_t, int32_t, int32_t, int,
 	    long *);
@@ -153,6 +154,11 @@
 	}
 	fs = oip->i_e2fs;
 	osize = oip->i_size;
+
+        EXT2_RSV_LOCK(oip);
+	ext2_discard_rsv(oip);
+        EXT2_RSV_UNLOCK(oip);
+
 	/*
 	 * Lengthen the size of the file. We must ensure that the
 	 * last byte of the file is allocated. Since the smallest
@@ -484,6 +490,10 @@
 	if (prtactive && vrefcnt(vp) != 0)
 		vprint("ext2_inactive: pushing active", vp);
 
+        EXT2_RSV_LOCK(ip);
+        ext2_discard_rsv(ip);
+        EXT2_RSV_UNLOCK(ip);
+
 	/*
 	 * Ignore inodes related to stale file handles.
 	 */
@@ -525,11 +535,21 @@
 	if (prtactive && vrefcnt(vp) != 0)
 		vprint("ufs_reclaim: pushing active", vp);
 	ip = VTOI(vp);
+
 	if (ip->i_flag & IN_LAZYMOD) {
 		ip->i_flag |= IN_MODIFIED;
 		ext2_update(vp, 0);
 	}
 	vfs_hash_remove(vp);
+
+        EXT2_RSV_LOCK(ip);
+        if (ip->i_rsv != NULL) {
+                free(ip->i_rsv, M_EXT2NODE);
+                ip->i_rsv = NULL;
+        }
+        EXT2_RSV_UNLOCK(ip);
+        mtx_destroy(&ip->i_rsv_lock);
+
 	free(vp->v_data, M_EXT2NODE);
 	vp->v_data = 0;
 	vnode_destroy_vobject(vp);
diff -urN /usr/src/sys/fs/ext2fs/ext2_rsv_win.h new/ext2_rsv_win.h
--- /usr/src/sys/fs/ext2fs/ext2_rsv_win.h	1970-01-01 08:00:00.000000000 +0800
+++ new/ext2_rsv_win.h	2010-08-19 02:47:29.000000000 +0800
@@ -0,0 +1,78 @@
+/*-
+ * Copyright (c) 2010, 2010 Zheng Liu <lz@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD: src/sys/fs/ext2fs/ext2_rsv_win.h,v 0.1 2010/05/08 12:41:51 lz Exp $
+ */
+#ifndef _FS_EXT2FS_EXT2_RSV_WIN_H_
+#define _FS_EXT2FS_EXT2_RSV_WIN_H_
+
+#include <sys/tree.h>
+
+#define EXT2_RSV_DEFAULT_RESERVE_BLKS 8
+#define EXT2_RSV_MAX_RESERVE_BLKS     1024
+#define EXT2_RSV_NOT_ALLOCATED        0
+
+#define EXT2_RSV_LOCK(ip)   mtx_lock(&ip->i_rsv_lock)
+#define EXT2_RSV_UNLOCK(ip) mtx_unlock(&ip->i_rsv_lock)
+
+#define EXT2_TREE_LOCK(fs)   mtx_lock(&fs->e2fs_rsv_lock);
+#define EXT2_TREE_UNLOCK(fs) mtx_unlock(&fs->e2fs_rsv_lock);
+
+/*
+ * Reservation window entry
+ */
+struct ext2_rsv_win {
+	RB_ENTRY(ext2_rsv_win) rsv_link; /* RB tree links */
+
+	int32_t rsv_goal_size; /* Default reservation window size */
+	int32_t rsv_alloc_hit; /* Number of allocated windows */
+
+	int32_t rsv_start; /* First bytes of window */
+	int32_t rsv_end;   /* End bytes of window */
+};
+
+RB_HEAD(ext2_rsv_win_tree, ext2_rsv_win);
+
+static __inline int
+ext2_rsv_win_cmp(const struct ext2_rsv_win *a,
+		 const struct ext2_rsv_win *b)
+{
+	if (a->rsv_start < b->rsv_start)
+		return (-1);
+	if (a->rsv_start == b->rsv_start)
+		return (0);
+
+	return (1);
+}
+RB_PROTOTYPE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp);
+
+/* predefine */
+struct inode;
+/* ext2_alloc.c */
+void    ext2_init_rsv(struct inode *ip);
+void    ext2_discard_rsv(struct inode *ip);
+int     ext2_alloc_rsv(struct inode *, int32_t, int32_t, int, struct ucred *, int32_t *);
+
+#endif /* !_FS_EXT2FS_EXT2_RSV_WIN_H_ */
diff -urN /usr/src/sys/fs/ext2fs/ext2_vfsops.c new/ext2_vfsops.c
--- /usr/src/sys/fs/ext2fs/ext2_vfsops.c	2010-01-14 22:30:54.000000000 +0800
+++ new/ext2_vfsops.c	2010-08-19 02:47:29.000000000 +0800
@@ -1,4 +1,4 @@
-/*-
+/*
  *  modified for EXT2FS support in Lites 1.1
  *
  *  Aug 1995, Godmar Back (gback@cs.utah.edu)
@@ -61,6 +61,7 @@
 #include <fs/ext2fs/fs.h>
 #include <fs/ext2fs/ext2_extern.h>
 #include <fs/ext2fs/ext2fs.h>
+#include <fs/ext2fs/ext2_rsv_win.h>
 
 static int	ext2_flushfiles(struct mount *mp, int flags, struct thread *td);
 static int	ext2_mountfs(struct vnode *, struct mount *);
@@ -95,9 +96,9 @@
 static int	compute_sb_data(struct vnode * devvp,
 		    struct ext2fs * es, struct m_ext2fs * fs);
 
-static const char *ext2_opts[] = { "from", "export", "acls", "noexec",
-    "noatime", "union", "suiddir", "multilabel", "nosymfollow",
-    "noclusterr", "noclusterw", "force", NULL };
+static const char *ext2_opts[] = { "acls", "async", "export", "force",
+    "from", "multilabel", "noatime", "noclusterr", "noclusterw",
+    "noexec", "nosymfollow", "suiddir", "union", NULL };
 
 /*
  * VFS Operations.
@@ -581,6 +582,14 @@
 	if ((error = compute_sb_data(devvp, ump->um_e2fs->e2fs, ump->um_e2fs)))
 		goto out;
 
+	/* Initial reservation window index and lock */
+	bzero(&ump->um_e2fs->e2fs_rsv_lock, sizeof(struct mtx));
+	mtx_init(&ump->um_e2fs->e2fs_rsv_lock,
+            "rsv tree lock", NULL, MTX_DEF);
+        ump->um_e2fs->e2fs_rsv_tree = malloc(sizeof(struct ext2_rsv_win_tree),
+            M_EXT2MNT, M_WAITOK | M_ZERO);
+	RB_INIT(ump->um_e2fs->e2fs_rsv_tree);
+
 	brelse(bp);
 	bp = NULL;
 	fs = ump->um_e2fs;
@@ -680,6 +689,8 @@
 	g_topology_unlock();
 	PICKUP_GIANT();
 	vrele(ump->um_devvp);
+        free(fs->e2fs_rsv_tree, M_EXT2MNT);
+	mtx_destroy(&fs->e2fs_rsv_lock);
 	free(fs->e2fs_gd, M_EXT2MNT);
 	free(fs->e2fs_contigdirs, M_EXT2MNT);
 	free(fs->e2fs, M_EXT2MNT);
@@ -919,6 +930,10 @@
 	ip->i_prealloc_count = 0;
 	ip->i_prealloc_block = 0;
 
+	bzero(&ip->i_rsv_lock, sizeof(struct mtx));
+	mtx_init(&ip->i_rsv_lock, "inode rsv lock", NULL, MTX_DEF);
+        ip->i_rsv = NULL;
+
 	/*
 	 * Now we want to make sure that block pointers for unused
 	 * blocks are zeroed out - ext2_balloc depends on this
diff -urN /usr/src/sys/fs/ext2fs/ext2fs.h new/ext2fs.h
--- /usr/src/sys/fs/ext2fs/ext2fs.h	2010-01-14 22:30:54.000000000 +0800
+++ new/ext2fs.h	2010-08-19 02:47:29.000000000 +0800
@@ -38,6 +38,7 @@
 #define _FS_EXT2FS_EXT2_FS_H
 
 #include <sys/types.h>
+#include <sys/lock.h>
 
 /*
  * Special inode numbers
@@ -174,6 +175,9 @@
 	char e2fs_wasvalid;       /* valid at mount time */
 	off_t e2fs_maxfilesize;
 	struct ext2_gd *e2fs_gd; /* Group Descriptors */
+
+	struct mtx e2fs_rsv_lock;                /* Protect reservation window RB tree */
+	struct ext2_rsv_win_tree *e2fs_rsv_tree; /* Reservation window index */
 };
 
 /*
diff -urN /usr/src/sys/fs/ext2fs/inode.h new/inode.h
--- /usr/src/sys/fs/ext2fs/inode.h	2010-01-14 22:30:54.000000000 +0800
+++ new/inode.h	2010-08-19 02:47:29.000000000 +0800
@@ -100,6 +100,10 @@
 	int32_t		i_gen;		/* Generation number. */
 	u_int32_t	i_uid;		/* File owner. */
 	u_int32_t	i_gid;		/* File group. */
+
+        /* Fields for reservation window */
+        struct mtx          i_rsv_lock; /* Protects i_rsv */
+	struct ext2_rsv_win *i_rsv;     /* Reservation window */
 };
 
 /*


--Boundary-00=_QxzoMs/Iug8+N80--

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 15:03:53 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F2CF31065674
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:03:53 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 450D88FC14
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:03:52 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA29301;
	Wed, 29 Sep 2010 18:03:48 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA35553.1080804@icyb.net.ua>
Date: Wed, 29 Sep 2010 18:03:47 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Karl Pielorz <kpielorz_lst@tdx.co.uk>
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>	<201009291024.o8TAOnph013730@higson.cam.lispworks.com>
	<8CF1F1F15531907E2F8DC2A2@HexaDeca64.dmpriest.net.uk>
In-Reply-To: <8CF1F1F15531907E2F8DC2A2@HexaDeca64.dmpriest.net.uk>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 15:03:54 -0000

on 29/09/2010 13:30 Karl Pielorz said the following:
> --On 29 September 2010 11:24 +0100 Martin Simmons <martin@lispworks.com> wrote:
> 
>>> I saw a while ago a command under 8.1 to get 'more info' for these stuck
>>> processes, but can't for the life of me remember it?
>>
>> Maybe procstat -k -k $pid is what you are looking for (i.e. a kernel
>> backtrace)?  Use -a instead of $pid to get all processes.
> 
> Yup, that's it - thanks!
> 
> Having run it I get:
> 
> procstat -k -k 1927 (PID 1927 is the 'ls' that's locked up)
> 
>  PID    TID COMM             TDNAME           KSTACK
> 1927 100206 ls               -                mi_switch+0x16f sleepq_wait+0x42
> _cv_wait+0x111 zio_wait+0x61 dbuf_read+0x39a dnode_hold_impl+0xe7
> dmu_bonus_hold+0x2a zfs_zget+0x227 zfs_dirent_lock+0x4e3 zfs_dirlook+0x69
> zfs_lookup+0x1f0 zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf0 VOP_LOOKUP_APV+0x40
> lookup+0x40a namei+0x52b kern_statat_vnhook+0x8f kern_statat+0x15
> 
> 
> Which will hopefully mean something more to someone here than it does me at the
> moment ;)

This looks like the process is stuck waiting for I/O completion.
Can't tell whether it's an I/O problem, or perhaps the I/O operation has long
completed but wakeup from it was lost...

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 15:11:13 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D65371065673
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:11:13 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta03.emeryville.ca.mail.comcast.net
	(qmta03.emeryville.ca.mail.comcast.net [76.96.30.32])
	by mx1.freebsd.org (Postfix) with ESMTP id BCBBD8FC13
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:11:13 +0000 (UTC)
Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89])
	by qmta03.emeryville.ca.mail.comcast.net with comcast
	id CdRT1f0061vN32cA3fBDjw; Wed, 29 Sep 2010 15:11:13 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta22.emeryville.ca.mail.comcast.net with comcast
	id CfBB1f00R3LrwQ28ifBCu6; Wed, 29 Sep 2010 15:11:13 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 602A39B418; Wed, 29 Sep 2010 08:11:11 -0700 (PDT)
Date: Wed, 29 Sep 2010 08:11:11 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Karl Pielorz <kpielorz_lst@tdx.co.uk>
Message-ID: <20100929151111.GA91705@icarus.home.lan>
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 15:11:13 -0000

On Wed, Sep 29, 2010 at 10:20:22AM +0100, Karl Pielorz wrote:
> 
> Hi All,
> 
> I moved my machine from FreeBSD 7.2-S/amd64 to 8.1-R/amd64 about a
> week ago. Since then I've noticed that ZFS just 'hangs' - e.g. it'll
> work fine for a few days, then a process will get 'hung up' waiting
> on ZFS.
> 
> The machine is a Tyan motherboard (dual Opteron, dual cores, w/10Gb
> of RAM). 7.2-S & ZFS ran perfectly under it.
> 
> Anything else then that touches the pools, also 'hangs' - in top the
> original process shows as:
> 
> "
> 1927 root        1  44    0  8224K  1544K zio->i  0   0:00  0.00% ls
> "
> 
> Anything else that touches the ZFS pools, ends up like:
> 
> "
> 2082 root        1  44    0 10284K  2976K zfs     3   0:00  0.00% csh
> "
> 
> I saw a while ago a command under 8.1 to get 'more info' for these
> stuck processes, but can't for the life of me remember it?
> 
> If someone can give me some pointers to try and track down what's hanging?
> 
> The drives are spread over two Marvell 88SX6081's. I've tried the
> mvs driver for that controller, which gave me a bucket load of
> errors, and data corruption :(
> 
> Switching back to the standard ATA drivers for that card, I just get
> hangs :( - nothing is logged on the console, or syslog when this
> happens.

Can you provide (with the disks hooked to an ATA/SATA controller and
the system not wedged waiting for ZFS I/O) the output from smartctl
(you'll probably have to install ports/sysutils/smartmontools) as:

smartctl -a /dev/adXX (one command per device)

I can review these statistics to see if any of the disks look like they
may be misbehaving.

It would also help if you could provide dmesg output associated with the
ATA/SATA controller which they're hooked to.

Thanks!

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 15:19:10 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 07EDA106566C
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:19:10 +0000 (UTC)
	(envelope-from kpielorz_lst@tdx.co.uk)
Received: from mail.tdx.com (mail.tdx.com [62.13.128.18])
	by mx1.freebsd.org (Postfix) with ESMTP id 983FA8FC0C
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:19:09 +0000 (UTC)
Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk
	[62.13.130.30]) (authenticated bits=0)
	by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8TFJ74e061126
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Wed, 29 Sep 2010 16:19:08 +0100 (BST)
Date: Wed, 29 Sep 2010 16:18:21 +0100
From: Karl Pielorz <kpielorz_lst@tdx.co.uk>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <24249EBE70346EDE973308F6@HexaDeca64.dmpriest.net.uk>
In-Reply-To: <20100929151111.GA91705@icarus.home.lan>
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
	<20100929151111.GA91705@icarus.home.lan>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: freebsd-fs@freebsd.org
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 15:19:10 -0000


--On 29 September 2010 08:11 -0700 Jeremy Chadwick 
<freebsd@jdc.parodius.com> wrote:

> Can you provide (with the disks hooked to an ATA/SATA controller and
> the system not wedged waiting for ZFS I/O) the output from smartctl
> (you'll probably have to install ports/sysutils/smartmontools) as:
>
> smartctl -a /dev/adXX (one command per device)
>
> I can review these statistics to see if any of the disks look like they
> may be misbehaving.
>
> It would also help if you could provide dmesg output associated with the
> ATA/SATA controller which they're hooked to.

I can do - I'll try to get all the relevant stuff up on a site and send you 
the URL  (i.e. dmesg output, zpool status output, smartctl output etc.)

The same system works fine under 7.2-Stable with the same drives, hardware 
etc. - But under 8.1 anything more than "a ZFS little I/O" (e.g. scrubs, 
copying lots of files etc.) - and it just hangs at a random point, with no 
error given - and anything that touches the zpool 'hangs next'.

I already run smartmontools - though it's currently disabled (incase it was 
that causing issues). The drives on the whole look OK, I think two have 
picked up a couple of reallocated blocks (and failed reads) - but it's 
nothing that's increasing or anything - and like I said, 7.2-S is fine with 
the same setup...

If I break to KDB with the current system, is there anything I can do to 
get a better look at what's hung up/where?

-Karl

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 16:43:07 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 92E6B1065673
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 16:43:07 +0000 (UTC)
	(envelope-from martin@lispworks.com)
Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com
	[193.34.186.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 327FD8FC1D
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 16:43:06 +0000 (UTC)
Received: from higson.cam.lispworks.com
	(IDENT:U2FsdGVkX1/DDsrOp/fc0hHDkudR8RwKYqZIPzbTCBA@higson
	[192.168.1.7])
	by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id
	o8TGh4Sg094770; Wed, 29 Sep 2010 17:43:04 +0100 (BST)
	(envelope-from martin@lispworks.com)
Received: from higson.cam.lispworks.com by higson.cam.lispworks.com (8.13.1)
	id o8TGh41B016652; Wed, 29 Sep 2010 17:43:04 +0100
Received: (from martin@localhost)
	by higson.cam.lispworks.com (8.13.1/8.13.1/Submit) id o8TGh4hx016649;
	Wed, 29 Sep 2010 17:43:04 +0100
Date: Wed, 29 Sep 2010 17:43:04 +0100
Message-Id: <201009291643.o8TGh4hx016649@higson.cam.lispworks.com>
From: Martin Simmons <martin@lispworks.com>
To: freebsd-fs@freebsd.org
In-reply-to: <707981.30589.qm@web110702.mail.gq1.yahoo.com> (message from
	Scott Johnson on Tue, 28 Sep 2010 17:01:47 -0700 (PDT))
References: <707981.30589.qm@web110702.mail.gq1.yahoo.com>
Subject: Re: zfs+smb checksum errors
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 16:43:07 -0000

>>>>> On Tue, 28 Sep 2010 17:01:47 -0700 (PDT), Scott Johnson said:
> 
> I've been running FBSD 8.0-rel and now 8.1-rel for 6 months or so, with a 4 disk 
> raidz, weekly scrubs. Never saw a checksum error until last week when I got 
> about 5 during operation, followed by another 10 or so on the next scrub.
> 
> They all occurred when I was accessing the server through SMB from my WinXP 
> desktop. I've been doing this for months, transferring files to and from 
> regularly, but this was the first time I'd been doing simultaneous heavy reading 
> & writing.
> 
> I was running Imgburn on WinXP, creating an iso file from folders. Both source 
> files and destination iso file were on the same SMB share on the FBSD server.
> 
> After all the checksum errors, there was 1 unrecoverable error, on one of the 
> destination iso files.
> 
> There are 0 read errors, 0 write errors, and 0 new SMART errors on the 4 disks. 
> The checksum errors were spread roughly equally across the 4 disks. All of which 
> leads me to believe this is a software problem at the filesystem level.

Or maybe it is some other hardware problem, such as the disk controller or
RAM?

__Martin

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 18:10:49 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EBA5B106564A
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 18:10:49 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 304588FC17
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 18:10:48 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA02414;
	Wed, 29 Sep 2010 21:10:45 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Message-ID: <4CA38124.60902@freebsd.org>
Date: Wed, 29 Sep 2010 21:10:44 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <5DB6E7C798E44D33A05673F4B773405E@multiplay.co.uk><4C8D087B.5040404@freebsd.org><03537796FAB54E02959E2D64FC83004F@multiplay.co.uk><4C8D280F.3040803@freebsd.org><3FBF66BF11AA4CBBA6124CA435A4A31B@multiplay.co.uk><4C8E4212.30000@freebsd.org>
	<B98EBECBD399417CA5390C20627384B1@multiplay.co.uk>
	<D79F15FEB5794315BD8668E40B414BF0@multiplay.co.uk>
	<4C90B4C8.90203@freebsd.org>
	<6DFACB27CA8A4A22898BC81E55C4FD36@multiplay.co.uk>
	<4C90D3A1.7030008@freebsd.org>
	<0B1A90A08DFE4ADA9540F9F3846FDF38@multiplay.co.uk>
	<4C90EDB8.3040709@freebsd.org>
	<3F29E8CED7B24805B2D93F62A4EC9559@multiplay.co.uk>
	<4C9126FB.2020707@freebsd.org>
	<1E0B9C1145784776A773B99FC1139CD5@multiplay.co.uk>
	<4C987F90.6000006@freebsd.org> <4C98803F.7000901@freebsd.org>
	<879BF5981D1B4C7290BDF18286BA1EEC@multiplay.co.uk> <4C989201.2
	0506@freebsd.org>
	<A77828512281413B8B38EF02732D081C@multiplay.co.uk>
	<4C98A2BA.1080004@freebsd.org>
	<EAA0054303614AB2A8CB8863BD54F68C@multiplay.co.uk>
	<4C98BFCE.2020202@freebsd.org>
In-Reply-To: <4C98BFCE.2020202@freebsd.org>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs very poor performance compared to ufs due to lack of cache?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 18:10:50 -0000


[ping]

on 21/09/2010 17:23 Andriy Gapon said the following:
> on 21/09/2010 16:53 Steven Hartland said the following:
>> That's what I thought you where saying. Is there a test you would suggest to confirm
>> either way more accurately?
> 
> Perhaps you can try the test scenario that you described and monitor parameters
> suggested by Wiktor in this thread.
> 
> That is, have two large files and set arc max size such that one of them can fit
> in ARC readily, but two of them won't fit by a large margin.  Make sure that
> remaining RAM is large enough to hold both files in page cache.
> 
> 1. sendfile one file, then the other
> 2. record kstat.zfs.misc.arcstats values
> 3. sendfile the first file again
> 4. record kstat.zfs.misc.arcstats values
> 
> If the first file data was re-used from page cache, then you won't see much
> changes in kstat.zfs.misc.arcstats.  If it had to be taken from ARC or from disk,
> then either ARC hits or ARC misses will grow noticeably.
> 
> Make sure to not have any parallel activity that could affect kstat.zfs.misc.arcstats.
> 
> I think kstat.zfs.misc.arcstats.hits and kstat.zfs.misc.arcstats.misses are two
> primary indicators in this test.
> 


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 18:44:28 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 090A310656AA;
	Wed, 29 Sep 2010 18:44:28 +0000 (UTC)
	(envelope-from prvs=1888902647=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E1388FC0A;
	Wed, 29 Sep 2010 18:44:27 +0000 (UTC)
X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 29 Sep 2010 19:33:27 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Wed, 29 Sep 2010 19:33:27 +0100
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
	shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 by mail1.multiplay.co.uk (MDaemon PRO v10.0.4)
	with ESMTP id md50011327826.msg; Wed, 29 Sep 2010 19:33:26 +0100
X-Authenticated-Sender: Killing@multiplay.co.uk
X-MDRemoteIP: 188.220.16.49
X-Return-Path: prvs=1888902647=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
Message-ID: <2A903E7281D340BBA77338F714AACB84@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Andriy Gapon" <avg@freebsd.org>
References: <5DB6E7C798E44D33A05673F4B773405E@multiplay.co.uk><4C8D087B.5040404@freebsd.org><03537796FAB54E02959E2D64FC83004F@multiplay.co.uk><4C8D280F.3040803@freebsd.org><3FBF66BF11AA4CBBA6124CA435A4A31B@multiplay.co.uk><4C8E4212.30000@freebsd.org>
	<D79F15FEB5794315BD8668E40B414BF0@multiplay.co.uk>
	<4C90B4C8.90203@freebsd.org>
	<6DFACB27CA8A4A22898BC81E55C4FD36@multiplay.co.uk>
	<4C90D3A1.7030008@freebsd.org>
	<0B1A90A08DFE4ADA9540F9F3846FDF38@multiplay.co.uk>
	<4C90EDB8.3040709@freebsd.org>
	<3F29E8CED7B24805B2D93F62A4EC9559@multiplay.co.uk>
	<4C9126FB.2020707@freebsd.org>
	<1E0B9C1145784776A773B99FC1139CD5@multiplay.co.uk>
	<4C987F90.6000006@freebsd.org> <4C98803F.7000901@freebsd.org>
	<879BF5981D1B4C7290BDF18286BA1EEC@multiplay.co.uk> <4C989201.2
	0506@freebsd.org>
	<A77828512281413B8B38EF02732D081C@multiplay.co.uk>
	<4C98A2BA.1080004@freebsd.org>
	<EAA0054303614AB2A8CB8863BD54F68C@multiplay.co.uk>
	<4C98BFCE.2020202@freebsd.org> <4CA38124.60902@freebsd.org>
Date: Wed, 29 Sep 2010 19:33:32 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5994
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs very poor performance compared to ufs due to lack of cache?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 18:44:28 -0000

----- Original Message ----- 
From: "Andriy Gapon" <avg@freebsd.org>
To: "Steven Hartland" <killing@multiplay.co.uk>
Cc: <freebsd-fs@freebsd.org>
Sent: Wednesday, September 29, 2010 7:10 PM
Subject: Re: zfs very poor performance compared to ufs due to lack of cache?


> 
> [ping]

Sorry Andriy not had chance to pick this back up yet, day to day getting in the way.

Don't worry not forgotten ;-)

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 19:13:55 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC926106564A
	for <fs@freebsd.org>; Wed, 29 Sep 2010 19:13:55 +0000 (UTC)
	(envelope-from crossd@cs.rpi.edu)
Received: from newman.cs.rpi.edu (newman.cs.rpi.edu [128.113.126.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 6D36D8FC18
	for <fs@freebsd.org>; Wed, 29 Sep 2010 19:13:54 +0000 (UTC)
X-Hash: SCt|9a86925a1455fd7f633a0196eb4c71511e3200b9|0a17d6733c62c0f80357da99bdbe33ee
X-Countries: United States
X-SMTP-From: accepted <crossd@cs.rpi.edu> monica.cs.rpi.edu [128.213.56.13]
	(monica.cs.rpi.edu) {United States}
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=cs.rpi.edu; h=date
	:from:to:subject:message-id:mime-version:content-type; s=
	default; i=128.213.56.13@cs.rpi.edu; t=1285786090; x=1286390890;
	l=1709; bh=IYFtVGeVOpLTMFPRtT2ctEv3dRE=; b=BZD1T3aQGYhUUz1J0uK1
	vCBwnCmZYPrY8wvb5npEct+/6wiwrbfapQLTT8sArYq/NYKILJ00I0vpV714qInn
	oeTc1b7c0HdsarwWAZDGosa3RvxkF7fDJvr7eg2z5F0uWIDCFuS0tWEkUnhkK0Ma
	LKunmEhZ38Hvs6pleYGdEqU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=cs.rpi.edu; h=date:from:to
	:subject:message-id:mime-version:content-type; q=dns; s=default; b=
	Q/OxuGh0WcE0NRIz7VMJf4ZwrTOZNcW5eCQp8wYHKewMgeTUs1LWzYAd/65wpG8t
	i8c73rwh3QEmDOEc+KsMFWrqBmGqNBkWL2BPL0coSNa/uh+BQHAx0BN6+Kck6Be1
	9EdiwxPEu2Dw+jX1JTxMPggZVDqhape//Om3aHiqMn4=
X-Spam-Report: Spam Report from newman.cs.rpi.edu (SA:3.2.5):
	-1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP
	-0.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score:
	0.0000] 0.0 AWL AWL: From: address is in the auto white-list
X-Spam-Info: -2.7, local; ALL_TRUSTED,AWL,BAYES_00
X-Spam-Scanned-By: newman.cs.rpi.edu using SpamAssassin 3.2.5
X-Virus-Scanned-By: newman.cs.rpi.edu
Received: from monica.cs.rpi.edu (root@monica.cs.rpi.edu [128.213.56.13])
	by newman.cs.rpi.edu (8.14.3/8.14.3) with ESMTP id o8TIm83O053818
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <fs@freebsd.org>; Wed, 29 Sep 2010 14:48:08 -0400 (EDT)
	(envelope-from crossd@cs.rpi.edu)
Received: from monica.cs.rpi.edu (crossd@localhost [127.0.0.1])
	by monica.cs.rpi.edu (8.14.3/8.12.6) with ESMTP id o8TIm8xG008130
	for <fs@freebsd.org>; Wed, 29 Sep 2010 14:48:08 -0400 (EDT)
	(envelope-from crossd@monica.cs.rpi.edu)
Received: (from crossd@localhost)
	by monica.cs.rpi.edu (8.14.3/8.12.6/Submit) id o8TIlwVn008122;
	Wed, 29 Sep 2010 14:47:58 -0400 (EDT) (envelope-from crossd)
Date: Wed, 29 Sep 2010 14:47:58 -0400 (EDT)
From: "David E. Cross" <crossd@cs.rpi.edu>
To: fs@freebsd.org
Message-ID: <20100929144527.Q7702@monica.cs.rpi.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.67 on 128.113.126.12
Cc: 
Subject: Unnecessary reads on write load
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 19:13:55 -0000

(redirected from hackers, this is on both 8.0-RELEASE with GENERIC and 
8.1-RELEASE with GENERIC on amd64)

Tracking down a performance issue where the system apparently needlessly 
reads on a 100% write load... consider the following C test case:  (more 
after the code)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>

int main(int argc, char **argv)
{
     unsigned char dir1[4], dir2[4], filename[15], pathname[1024], 
buffer[130944];
     unsigned int filenum, count, filesize;
     int fd;

     arc4random_buf(buffer, 131072);

     count=atoi(argv[1]);

     for(;count>0;count--) {
         filenum=arc4random();
         filesize=arc4random_uniform(130944) + 128;
         sprintf(filename, "%08x", filenum);
         strncpy(dir1,filename,3);
         strncpy(dir2,filename+3, 3);
         dir1[3]=dir2[3]=0;
         sprintf(pathname, "%s/%s/%s", dir1, dir2, filename);
         fd=open(pathname, O_CREAT | O_WRONLY, 0644);
         if (fd < 0) {
             sprintf(pathname, "%s/%s", dir1, dir2);
             if (mkdir(pathname, 0755) < 0) {
                 mkdir(dir1, 0755);
                 mkdir(pathname, 0755);
             };
             sprintf(pathname, "%s/%s/%s", dir1, dir2, filename);
             fd=open(pathname, O_CREAT | O_WRONLY, 0644);
         }
         if (fd <0)
             continue;
         write(fd,buffer, filesize);
         close(fd);
     }
     return 0;
}

In running that in an empty directory (it takes one argument, the number of 
files to create), I see that it spends most of its time in BIORD?!.  If I 
have a debugging kernel I can see that its all in NAMI cache misses, and 
doing block fetches... but "why?"  the only directories that exist are ones 
that it just created, and should therefore be in the cache, right?  Any 
ideas?  Give it a try yourself and tell me what you get.

Thank you.
-- 
David E. Cross

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 19:16:04 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 412BC10656E8
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 19:16:04 +0000 (UTC)
	(envelope-from torbjoern@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id BD5DD8FC1E
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 19:16:03 +0000 (UTC)
Received: by bwz15 with SMTP id 15so1072774bwz.13
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 12:16:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=XUH5HffCLJhDQyb5xz7dTxdRBBpyL3nKpOelLEq5dIA=;
	b=P0BtylrISKyE1al/fR6xvkpdAPc2dYJaM5TIIL7Y2ZiaL3yGb/2XeA2YfyK3Ncmc5p
	g9Q2kdYDGr1pHnbM02HmxDuWX6LzygQ+zQ23Av3hVWUG9jDo2BGl9dbJDMPIPz8KXH4d
	hdjucdxQ2l1dyKkL4ZNdvZKr9u89w4eUJHfEc=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=v/fKO7Rpv8ITm+SZGhzCEbt/X5NKX20KxHFY1ZYs+O36o/XViXi95fWfnBRR+72g36
	CMZzOD/5nQXM5ZqSfaWShQOMyO+2ElDlUTmXi7hdIA0GZZWej1MlimWJY6macyohHf3m
	iitFQewETX1BJlJGmR0edKvNn7RyK+uOjZMkE=
MIME-Version: 1.0
Received: by 10.204.65.145 with SMTP id j17mr1451833bki.209.1285785998249;
	Wed, 29 Sep 2010 11:46:38 -0700 (PDT)
Received: by 10.204.71.138 with HTTP; Wed, 29 Sep 2010 11:46:38 -0700 (PDT)
Date: Wed, 29 Sep 2010 20:46:38 +0200
Message-ID: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
From: Torbjorn Kristoffersen <torbjoern@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Strange ZFS problem,
	filesystem claims to be full when clearly not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 19:16:04 -0000

I have a ZFS "tank" called tpool, the server runs a couple of jails (each
with a zfs filesystem).  There is a problem with one of these filesystems.
First, its disk usage as shown in ``df -h'':
...
tpool/rb.org      100G     95G    4.6G    95%    /jails/rb.org
...

The command ``zfs list'' shows the same:
..
tpool/rb.org    95.4G  4.56G  95.4G  /jails/rb.org
..

However, there is a very mysterious problem somewhere.
Something inside this jail is eating diskspace, but we can't find any
directories that is actually taking the diskspace. We first suspected either
fetchmail or spamassassin of causing a lot of space to be used, since some
of their directories were huge. (These were later deleted, and which is why
you see that 4.6GB is now available, before that 0GB was available).

However, we can't find *any trace* of an actual directory or file that is
taking all the spac.e

Take this for instance:

outsidejail# du -sh rb.org
 43G    rb.org

How can this be?  df and zfs are showing that the entire drive is nearly
full, yet I can't find any directory that is actually taking all this space.
 I've carefully looked through every single directory within the jail trying
to find something that's taking all that space, but to no avail.

----
My system stats:
# uname -a
FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC
2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
# zpool get version tpool
NAME   PROPERTY  VALUE    SOURCE
tpool  version   14       default
# zpool status
  pool: tpool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tpool       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad4s1d  ONLINE       0     0     0
            ad6s1d  ONLINE       0     0     0

errors: No known data errors

[ Note that I've also done a scrub recently ]
----

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 19:20:49 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AD979106566C;
	Wed, 29 Sep 2010 19:20:49 +0000 (UTC)
	(envelope-from jamesbrandongooch@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 160AA8FC2A;
	Wed, 29 Sep 2010 19:20:48 +0000 (UTC)
Received: by wyb32 with SMTP id 32so76104wyb.13
	for <multiple recipients>; Wed, 29 Sep 2010 12:20:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=0vw3NwTXKwU/zjRZjXzMAHLVGVoBt5XY0IUfjigoMdc=;
	b=sjddUp9XeYJNKF1LPjQXwR4I2T76eeG9vKd54quNTZYw4EuZyXJ+fKSwzH8YBXpFX2
	DNw4WbJqOg/k1Ys7R7xLWGhXMZqte1mKZwIK4H0vZarsExnJrPUJKrVSbH8bzCM8mcjD
	j9wXs8u6qhv4qy/fIpB2S/ZVBpk9R3hJE6Wzw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=JBpzGXlDBt3wPzkYa5lFkb+R3OPgDWlcpom97pgP12yHo+qC9Uo452jlM2fNEPVwPZ
	lNdOQApf2no4rwUcHwzCPT32mjYMMWR//ysd4r+DSfD7ESn9w4WQ8yxmJ2WgVHNW1irz
	fOZVLyVDj6nQn1Xa6IR8mXZpGYhcJZlcJq6CQ=
MIME-Version: 1.0
Received: by 10.216.156.21 with SMTP id l21mr2888856wek.83.1285786216920; Wed,
	29 Sep 2010 11:50:16 -0700 (PDT)
Received: by 10.216.133.133 with HTTP; Wed, 29 Sep 2010 11:50:16 -0700 (PDT)
In-Reply-To: <201009290917.05269.jhb@freebsd.org>
References: <20100929031825.L683@besplex.bde.org>
	<20100929084801.M948@besplex.bde.org>
	<20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org>
Date: Wed, 29 Sep 2010 13:50:16 -0500
Message-ID: <AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN@mail.gmail.com>
From: Brandon Gooch <jamesbrandongooch@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 19:20:49 -0000

On Wed, Sep 29, 2010 at 8:17 AM, John Baldwin <jhb@freebsd.org> wrote:
> On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote:
>> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
>> > On Wed, 29 Sep 2010, Bruce Evans wrote:
>> >
>> > > On Wed, 29 Sep 2010, Bruce Evans wrote:
>> > >
>> > >> For benchmarks on ext2fs:
>> > >>
>> > >> Under FreeBSD-~5.2 rerun today:
>> > >> untar: =A0 =A0 59.17 real
>> > >> tar: =A0 =A0 =A0 19.52 real
>> > >>
>> > >> Under -current run today:
>> > >> untar: =A0 =A0101.16 real
>> > >> tar: =A0 =A0 =A0172.03 real
>> > >>
>> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower=
 for
>> > >> untar.
>> > >> ...
>> > >> So it seems that only 1 block in every 8 is used, and there is a se=
ek
>> > >> after every block. =A0This asks for an 8-fold reduction in throughp=
ut,
>> > >> and it seems to have got that and a bit more for reading although n=
ot
>> > >> for writing. =A0Even (or especially) with perfect hardware, it must=
 give
>> > >> an 8-fold reduction. =A0And it is likely to give more, since it def=
eats
>> > >> vfs clustering by making all runs of contiguous blocks have length =
1.
>> > >>
>> > >> Simple sequential allocation should be used unless the allocation p=
olicy
>> > >> and implementation are very good.
>> > >
>> > > This work a bit better after zapping the 8-fold way:
>> > =A0 =A0Things
>> > > ...
>> > > This gives an improvement of:
>> > >
>> > > untar: =A0 =A0101.16 real -> 63.46
>> > > tar: =A0 =A0 =A0172.03 real -> 50.70
>> > >
>> > > Now -current is only 1.1 times slower for untar and 2.6 times slower=
 for
>> > > tar.
>> > >
>> > > There must be a problem with bpref for things to have been so bad. =
=A0There
>> > > is some point to leaving a gap of 7 blocks for expansion, but the ga=
p was
>> > > left even between blocks in a single file.
>> > > ...
>> > > I haven't tried the bde_blkpref hack in the above. =A0It should kill=
 bpref
>> > > completely so that there is no jump between lbn0 and lbn1, and break
>> > > cylinder group based allocation even better. =A0Setting bde_blkpref =
to 1
>> > > restores the bug that was present in ext2fs in FreeBSD between 1995 =
and
>> > > 2010. =A0This bug gave seqential allocation starting at the beginnin=
g of
>> > > the disk in almost all cases, so map searches were slow and early gr=
oups
>> > > filled up before later groups were used at all.
>> >
>> > Tried this (patch repeated below), and it gave essentially the same
>> > speed as old versions.
>> >
>> > The main problem seems to be that the `goal' variables aren't initiali=
zed.
>> > After restoring bits verbatim from an old version, things seem to work=
 as
>> > expected:
>> >
>> > % Index: ext2_alloc.c
>> > % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
>> > % retrieving revision 1.2
>> > % diff -u -2 -r1.2 ext2_alloc.c
>> > % --- ext2_alloc.c =A01 Sep 2010 05:34:17 -0000 =A0 =A0 =A0 1.2
>> > % +++ ext2_alloc.c =A028 Sep 2010 21:08:42 -0000
>> > % @@ -1,2 +1,5 @@
>> > % +int bde_blkpref =3D 0;
>> > % +int bde_alloc8 =3D 0;
>> > % +
>> > % =A0/*-
>> > % =A0 * =A0modified for Lites 1.1
>> > % @@ -117,4 +120,8 @@
>> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ext2_alloccg);
>> > % =A0 =A0 =A0 =A0 =A0if (bno > 0) {
>> > % + =A0 =A0 =A0 =A0 /* set next_alloc fields as done in block_getblk *=
/
>> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_block =3D lbn;
>> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_goal =3D bno;
>> > % +
>> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_blocks +=3D btodb(fs->e2fs_=
bsize);
>> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_flag |=3D IN_CHANGE | IN_UP=
DATE;
>> >
>> > The only things that changed recently in this block were the 4 deleted
>> > lines and 4 lines with tabs corrupted to spaces. =A0Perhaps an editing
>> > error.
>> >
>> > % @@ -542,6 +549,12 @@
>> > % =A0 =A0 =A0then set the goal to what we thought it should be
>> > % =A0 */
>> > % +if (bde_blkpref =3D=3D 0) {
>> > % =A0 if(ip->i_next_alloc_block =3D=3D lbn && ip->i_next_alloc_goal !=
=3D 0)
>> > % =A0 =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal;
>> > % +} else if (bde_blkpref =3D=3D 1) {
>> > % + if(ip->i_next_alloc_block =3D=3D lbn)
>> > % + =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal;
>> > % +} else
>> > % + return 0;
>> > %
>> > % =A0 /* now check whether we were provided with an array that basical=
ly
>> >
>> > Not needed now.
>> >
>> > % @@ -662,4 +675,5 @@
>> > % =A0 =A0* block.
>> > % =A0 =A0*/
>> > % +if (bde_alloc8 =3D=3D 0) {
>> > % =A0 if (bpref)
>> > % =A0 =A0 =A0 =A0 =A0 start =3D dtogd(fs, bpref) / NBBY;
>> > % @@ -679,4 +693,5 @@
>> > % =A0 =A0 =A0 =A0 =A0 }
>> > % =A0 }
>> > % +}
>> > %
>> > % =A0 bno =3D ext2_mapsearch(fs, bbp, bpref);
>> >
>> > The code to skip to the next 8-block boundary should be removed perman=
ently.
>> > After fixing the initialization, it doesn't generate holes inside file=
s but
>> > it still generates holes between files. =A0The holes are quite large w=
ith
>> > 4K-blocks.
>> >
>> > Benchmark results with just the initialization of `goal' variables res=
tored:
>> >
>> > %%%
>> > ext2fs-1024-1024:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 78.79 real =A0 =A0 =A0 =
=A0 0.31 user =A0 =A0 =A0 =A0 4.94 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.62 real =A0 =A0 =A0 =A0 0.19=
 user =A0 =A0 =A0 =A0 1.82 sys
>> > ext2fs-1024-1024-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 52.07 real =A0 =A0 =A0 =
=A0 0.26 user =A0 =A0 =A0 =A0 4.95 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.80 real =A0 =A0 =A0 =A0 0.10=
 user =A0 =A0 =A0 =A0 1.93 sys
>> > ext2fs-4096-4096:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 74.14 real =A0 =A0 =A0 =
=A0 0.34 user =A0 =A0 =A0 =A0 3.96 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.82 real =A0 =A0 =A0 =A0 0.10=
 user =A0 =A0 =A0 =A0 1.19 sys
>> > ext2fs-4096-4096-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 53.54 real =A0 =A0 =A0 =
=A0 0.36 user =A0 =A0 =A0 =A0 3.87 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.91 real =A0 =A0 =A0 =A0 0.14=
 user =A0 =A0 =A0 =A0 1.15 sys
>> > %%%
>> >
>> > The much larger holes between the files are apparently responsible for=
 the
>> > decreased speed with 4K-blocks. =A01K-blocks are really too small, so =
4K-blocks
>> > should be faster.
>> >
>> > Benchmark results with the fix and bde_alloc8 =3D 1.
>> >
>> > ext2fs-1024-1024:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 71.60 real =A0 =A0 =A0 =
=A0 0.15 user =A0 =A0 =A0 =A0 2.04 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 22.34 real =A0 =A0 =A0 =A0 0.05=
 user =A0 =A0 =A0 =A0 0.79 sys
>> > ext2fs-1024-1024-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 46.03 real =A0 =A0 =A0 =
=A0 0.14 user =A0 =A0 =A0 =A0 2.02 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 21.97 real =A0 =A0 =A0 =A0 0.05=
 user =A0 =A0 =A0 =A0 0.80 sys
>> > ext2fs-4096-4096:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 59.66 real =A0 =A0 =A0 =
=A0 0.13 user =A0 =A0 =A0 =A0 1.63 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.88 real =A0 =A0 =A0 =A0 0.07=
 user =A0 =A0 =A0 =A0 0.46 sys
>> > ext2fs-4096-4096-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 37.30 real =A0 =A0 =A0 =
=A0 0.12 user =A0 =A0 =A0 =A0 1.60 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.93 real =A0 =A0 =A0 =A0 0.05=
 user =A0 =A0 =A0 =A0 0.49 sys
>> >
>> > Bruce
>>
>> Hi,
>>
>> I see what you are saying. The gap of 8 block between the files
>> is due to the old preallocation which used to allocate additional
>> 8 blocks in advance for a particular inode when allocating a block
>> for it. The gap between blocks of the same file shouldn't be there
>> too. Both of these cases should be removed. I will look into this
>> during this week. The slowness is also due to lack of preallocation
>> in the new code.
>
> One of the GSoC students worked on a patch to add preallocation back to
> ext2fs this summer. =A0Would you be interested in reviewing and/or testin=
g
> that patch? =A0(I've attached it). =A0Here is his original e-mail:
>
> <quote>
> Hi all,
>
> There is a patch in attachment which implements a preallocation
> algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010.
>
> This patch implements the in-memory ext2/3 block preallocation algorithm
> from reservation window. It uses a RB-tree to index block allocation
> request and reserve a number of blocks for each file which has requested
> to allocate a block. When a file request to allocate a block, it will
> find a block to allocate to this file. When it find the block to
> allocate, it will try to allocate a block, which is in the same cylinder
> group with inode and is not in other reservation window in RB-tree.
> Meanwhile there are some contiguous free blocks after this block. It
> uses a data structure to store this block's position and the length of
> contiguous free blocks. Then it inserts this data structure into
> RB-tree. When this file request to allocate a block again, It will find
> corresponding data structure in RB-tree. If it can find, the next free
> block will be allocated to this file directly. Otherwise, it will search
> a new block again.
>
> I have run some benchmarks to test this algorithm. Please review it in
> wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance
> is better when the number of threads is smaller than 4. When the number
> of threads is greater than 4, the performance can be increased a little.
>
> Please test it.
>
>
> Thanks and best regards,
>
> lz
> </quote>

Wow, this is really awesome! What are the chances of this code being
committed before a 9.0 release (assuming we have enough user testing)?

-Brandon

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 19:25:36 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AFE89106566B
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 19:25:36 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta06.emeryville.ca.mail.comcast.net
	(qmta06.emeryville.ca.mail.comcast.net [76.96.30.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 959D78FC1A
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 19:25:36 +0000 (UTC)
Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36])
	by qmta06.emeryville.ca.mail.comcast.net with comcast
	id CcTG1f0070mlR8UA6jRcVP; Wed, 29 Sep 2010 19:25:36 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta11.emeryville.ca.mail.comcast.net with comcast
	id CjRa1f00T3LrwQ28XjRbow; Wed, 29 Sep 2010 19:25:35 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id C6B8F9B418; Wed, 29 Sep 2010 12:25:34 -0700 (PDT)
Date: Wed, 29 Sep 2010 12:25:34 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Torbjorn Kristoffersen <torbjoern@gmail.com>
Message-ID: <20100929192534.GA97031@icarus.home.lan>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 19:25:36 -0000

On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote:
> I have a ZFS "tank" called tpool, the server runs a couple of jails (each
> with a zfs filesystem).  There is a problem with one of these filesystems.
> First, its disk usage as shown in ``df -h'':
> ...
> tpool/rb.org      100G     95G    4.6G    95%    /jails/rb.org
> ...
> 
> The command ``zfs list'' shows the same:
> ..
> tpool/rb.org    95.4G  4.56G  95.4G  /jails/rb.org
> ..
> 
> However, there is a very mysterious problem somewhere.
> Something inside this jail is eating diskspace, but we can't find any
> directories that is actually taking the diskspace. We first suspected either
> fetchmail or spamassassin of causing a lot of space to be used, since some
> of their directories were huge. (These were later deleted, and which is why
> you see that 4.6GB is now available, before that 0GB was available).
> 
> However, we can't find *any trace* of an actual directory or file that is
> taking all the spac.e
> 
> Take this for instance:
> 
> outsidejail# du -sh rb.org
>  43G    rb.org
> 
> How can this be?  df and zfs are showing that the entire drive is nearly
> full, yet I can't find any directory that is actually taking all this space.
>  I've carefully looked through every single directory within the jail trying
> to find something that's taking all that space, but to no avail.
> 
> ----
> My system stats:
> # uname -a
> FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC
> 2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
> # zpool get version tpool
> NAME   PROPERTY  VALUE    SOURCE
> tpool  version   14       default
> # zpool status
>   pool: tpool
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tpool       ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             ad4s1d  ONLINE       0     0     0
>             ad6s1d  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> [ Note that I've also done a scrub recently ]

1) Have you checked using fstat to ensure that no file descriptors
remain open on any of your ZFS filesystems (not pools)?

2) Are you using compression on any of your ZFS filesystems?

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 19:30:25 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 27CD01065670
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 19:30:25 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id E9B368FC15
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 19:30:24 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 9782D46B8B;
	Wed, 29 Sep 2010 15:30:24 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6A22B8A03C;
	Wed, 29 Sep 2010 15:30:23 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Brandon Gooch <jamesbrandongooch@gmail.com>
Date: Wed, 29 Sep 2010 15:30:13 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; )
References: <20100929031825.L683@besplex.bde.org>
	<201009290917.05269.jhb@freebsd.org>
	<AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN@mail.gmail.com>
In-Reply-To: <AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201009291530.13434.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Wed, 29 Sep 2010 15:30:23 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 19:30:25 -0000

On Wednesday, September 29, 2010 2:50:16 pm Brandon Gooch wrote:
> On Wed, Sep 29, 2010 at 8:17 AM, John Baldwin <jhb@freebsd.org> wrote:
> > On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote:
> >> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
> >> > On Wed, 29 Sep 2010, Bruce Evans wrote:
> >> >
> >> > > On Wed, 29 Sep 2010, Bruce Evans wrote:
> >> > >
> >> > >> For benchmarks on ext2fs:
> >> > >>
> >> > >> Under FreeBSD-~5.2 rerun today:
> >> > >> untar:     59.17 real
> >> > >> tar:       19.52 real
> >> > >>
> >> > >> Under -current run today:
> >> > >> untar:    101.16 real
> >> > >> tar:      172.03 real
> >> > >>
> >> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower 
for
> >> > >> untar.
> >> > >> ...
> >> > >> So it seems that only 1 block in every 8 is used, and there is a 
seek
> >> > >> after every block.  This asks for an 8-fold reduction in throughput,
> >> > >> and it seems to have got that and a bit more for reading although 
not
> >> > >> for writing.  Even (or especially) with perfect hardware, it must 
give
> >> > >> an 8-fold reduction.  And it is likely to give more, since it 
defeats
> >> > >> vfs clustering by making all runs of contiguous blocks have length 
1.
> >> > >>
> >> > >> Simple sequential allocation should be used unless the allocation 
policy
> >> > >> and implementation are very good.
> >> > >
> >> > > This work a bit better after zapping the 8-fold way:
> >> >    Things
> >> > > ...
> >> > > This gives an improvement of:
> >> > >
> >> > > untar:    101.16 real -> 63.46
> >> > > tar:      172.03 real -> 50.70
> >> > >
> >> > > Now -current is only 1.1 times slower for untar and 2.6 times slower 
for
> >> > > tar.
> >> > >
> >> > > There must be a problem with bpref for things to have been so bad. 
 There
> >> > > is some point to leaving a gap of 7 blocks for expansion, but the gap 
was
> >> > > left even between blocks in a single file.
> >> > > ...
> >> > > I haven't tried the bde_blkpref hack in the above.  It should kill 
bpref
> >> > > completely so that there is no jump between lbn0 and lbn1, and break
> >> > > cylinder group based allocation even better.  Setting bde_blkpref to 
1
> >> > > restores the bug that was present in ext2fs in FreeBSD between 1995 
and
> >> > > 2010.  This bug gave seqential allocation starting at the beginning 
of
> >> > > the disk in almost all cases, so map searches were slow and early 
groups
> >> > > filled up before later groups were used at all.
> >> >
> >> > Tried this (patch repeated below), and it gave essentially the same
> >> > speed as old versions.
> >> >
> >> > The main problem seems to be that the `goal' variables aren't 
initialized.
> >> > After restoring bits verbatim from an old version, things seem to work 
as
> >> > expected:
> >> >
> >> > % Index: ext2_alloc.c
> >> > % ===================================================================
> >> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
> >> > % retrieving revision 1.2
> >> > % diff -u -2 -r1.2 ext2_alloc.c
> >> > % --- ext2_alloc.c  1 Sep 2010 05:34:17 -0000       1.2
> >> > % +++ ext2_alloc.c  28 Sep 2010 21:08:42 -0000
> >> > % @@ -1,2 +1,5 @@
> >> > % +int bde_blkpref = 0;
> >> > % +int bde_alloc8 = 0;
> >> > % +
> >> > %  /*-
> >> > %   *  modified for Lites 1.1
> >> > % @@ -117,4 +120,8 @@
> >> > %                                                   ext2_alloccg);
> >> > %          if (bno > 0) {
> >> > % +         /* set next_alloc fields as done in block_getblk */
> >> > % +         ip->i_next_alloc_block = lbn;
> >> > % +         ip->i_next_alloc_goal = bno;
> >> > % +
> >> > %                  ip->i_blocks += btodb(fs->e2fs_bsize);
> >> > %                  ip->i_flag |= IN_CHANGE | IN_UPDATE;
> >> >
> >> > The only things that changed recently in this block were the 4 deleted
> >> > lines and 4 lines with tabs corrupted to spaces.  Perhaps an editing
> >> > error.
> >> >
> >> > % @@ -542,6 +549,12 @@
> >> > %      then set the goal to what we thought it should be
> >> > %   */
> >> > % +if (bde_blkpref == 0) {
> >> > %   if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
> >> > %           return ip->i_next_alloc_goal;
> >> > % +} else if (bde_blkpref == 1) {
> >> > % + if(ip->i_next_alloc_block == lbn)
> >> > % +         return ip->i_next_alloc_goal;
> >> > % +} else
> >> > % + return 0;
> >> > %
> >> > %   /* now check whether we were provided with an array that basically
> >> >
> >> > Not needed now.
> >> >
> >> > % @@ -662,4 +675,5 @@
> >> > %    * block.
> >> > %    */
> >> > % +if (bde_alloc8 == 0) {
> >> > %   if (bpref)
> >> > %           start = dtogd(fs, bpref) / NBBY;
> >> > % @@ -679,4 +693,5 @@
> >> > %           }
> >> > %   }
> >> > % +}
> >> > %
> >> > %   bno = ext2_mapsearch(fs, bbp, bpref);
> >> >
> >> > The code to skip to the next 8-block boundary should be removed 
permanently.
> >> > After fixing the initialization, it doesn't generate holes inside files 
but
> >> > it still generates holes between files.  The holes are quite large with
> >> > 4K-blocks.
> >> >
> >> > Benchmark results with just the initialization of `goal' variables 
restored:
> >> >
> >> > %%%
> >> > ext2fs-1024-1024:
> >> > tarcp /f srcs:                 78.79 real         0.31 user         
4.94 sys
> >> > tar cf /dev/zero srcs:         24.62 real         0.19 user         
1.82 sys
> >> > ext2fs-1024-1024-as:
> >> > tarcp /f srcs:                 52.07 real         0.26 user         
4.95 sys
> >> > tar cf /dev/zero srcs:         24.80 real         0.10 user         
1.93 sys
> >> > ext2fs-4096-4096:
> >> > tarcp /f srcs:                 74.14 real         0.34 user         
3.96 sys
> >> > tar cf /dev/zero srcs:         33.82 real         0.10 user         
1.19 sys
> >> > ext2fs-4096-4096-as:
> >> > tarcp /f srcs:                 53.54 real         0.36 user         
3.87 sys
> >> > tar cf /dev/zero srcs:         33.91 real         0.14 user         
1.15 sys
> >> > %%%
> >> >
> >> > The much larger holes between the files are apparently responsible for 
the
> >> > decreased speed with 4K-blocks.  1K-blocks are really too small, so 4K-
blocks
> >> > should be faster.
> >> >
> >> > Benchmark results with the fix and bde_alloc8 = 1.
> >> >
> >> > ext2fs-1024-1024:
> >> > tarcp /f srcs:                 71.60 real         0.15 user         
2.04 sys
> >> > tar cf /dev/zero srcs:         22.34 real         0.05 user         
0.79 sys
> >> > ext2fs-1024-1024-as:
> >> > tarcp /f srcs:                 46.03 real         0.14 user         
2.02 sys
> >> > tar cf /dev/zero srcs:         21.97 real         0.05 user         
0.80 sys
> >> > ext2fs-4096-4096:
> >> > tarcp /f srcs:                 59.66 real         0.13 user         
1.63 sys
> >> > tar cf /dev/zero srcs:         19.88 real         0.07 user         
0.46 sys
> >> > ext2fs-4096-4096-as:
> >> > tarcp /f srcs:                 37.30 real         0.12 user         
1.60 sys
> >> > tar cf /dev/zero srcs:         19.93 real         0.05 user         
0.49 sys
> >> >
> >> > Bruce
> >>
> >> Hi,
> >>
> >> I see what you are saying. The gap of 8 block between the files
> >> is due to the old preallocation which used to allocate additional
> >> 8 blocks in advance for a particular inode when allocating a block
> >> for it. The gap between blocks of the same file shouldn't be there
> >> too. Both of these cases should be removed. I will look into this
> >> during this week. The slowness is also due to lack of preallocation
> >> in the new code.
> >
> > One of the GSoC students worked on a patch to add preallocation back to
> > ext2fs this summer.  Would you be interested in reviewing and/or testing
> > that patch?  (I've attached it).  Here is his original e-mail:
> >
> > <quote>
> > Hi all,
> >
> > There is a patch in attachment which implements a preallocation
> > algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010.
> >
> > This patch implements the in-memory ext2/3 block preallocation algorithm
> > from reservation window. It uses a RB-tree to index block allocation
> > request and reserve a number of blocks for each file which has requested
> > to allocate a block. When a file request to allocate a block, it will
> > find a block to allocate to this file. When it find the block to
> > allocate, it will try to allocate a block, which is in the same cylinder
> > group with inode and is not in other reservation window in RB-tree.
> > Meanwhile there are some contiguous free blocks after this block. It
> > uses a data structure to store this block's position and the length of
> > contiguous free blocks. Then it inserts this data structure into
> > RB-tree. When this file request to allocate a block again, It will find
> > corresponding data structure in RB-tree. If it can find, the next free
> > block will be allocated to this file directly. Otherwise, it will search
> > a new block again.
> >
> > I have run some benchmarks to test this algorithm. Please review it in
> > wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance
> > is better when the number of threads is smaller than 4. When the number
> > of threads is greater than 4, the performance can be increased a little.
> >
> > Please test it.
> >
> >
> > Thanks and best regards,
> >
> > lz
> > </quote>
> 
> Wow, this is really awesome! What are the chances of this code being
> committed before a 9.0 release (assuming we have enough user testing)?

Good if it gets testing and review.  He also worked on read-only support for 
ext4 (in a second patch).  Both patches were posted to this list (fs@) several 
weeks ago.

-- 
John Baldwin

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 20:52:42 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 38EA01065672;
	Wed, 29 Sep 2010 20:52:42 +0000 (UTC)
	(envelope-from sarawgi.aditya@gmail.com)
Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com
	[209.85.210.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 03CA38FC18;
	Wed, 29 Sep 2010 20:52:40 +0000 (UTC)
Received: by pzk7 with SMTP id 7so371564pzk.13
	for <multiple recipients>; Wed, 29 Sep 2010 13:52:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:date:from:to:cc:subject
	:message-id:references:mime-version:content-type:content-disposition
	:in-reply-to:user-agent;
	bh=P/PES2pIGV/yZ0yr5D/nGkQUodVYXjaVN7tf6/D/EvE=;
	b=ODQV8y+iFjfsczbMuHQZp4r+/4gEDxM7oTlQEWRkLv910xyjaMYwuzZ4TZF3lOPs3+
	FZyQBsoHRTve7W9Unf56P1UH47yI7n+KGamQIxbQtDx1yu3UxEYTTkcFJk+rhsiHtk4q
	dqwUzqpkh5c7GcM/z7PzPmGOryFzYQWk0XGWw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	b=MM1W3/+bqnduqQgcYtwATTvXzyTWbugtzMRseD/qyiJpgIXoMM+UAGe46jh5lL7Q9s
	E3JjF2ZLln1pE/6qTKJ6AoPj5tBerv1epcD7vBmTItCqb8SxymaaO1ds1bazm0f65Z9a
	/lPhbS1z38SANed2QCNaKI1EPzP3G7HH2umtU=
Received: by 10.142.191.10 with SMTP id o10mr1994505wff.16.1285791806679;
	Wed, 29 Sep 2010 13:23:26 -0700 (PDT)
Received: from aditya ([183.87.49.240])
	by mx.google.com with ESMTPS id o9sm10653265wfd.4.2010.09.29.13.23.24
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Wed, 29 Sep 2010 13:23:26 -0700 (PDT)
Date: Thu, 30 Sep 2010 01:55:29 +0530
From: Aditya Sarawgi <sarawgi.aditya@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20100929202526.GA1564@aditya>
References: <20100929031825.L683@besplex.bde.org>
	<20100929084801.M948@besplex.bde.org>
	<20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201009290917.05269.jhb@freebsd.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: freebsd-fs@freebsd.org
Subject: Re: ext2fs now extremely slow
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 20:52:42 -0000

[snip]
> > I see what you are saying. The gap of 8 block between the files 
> > is due to the old preallocation which used to allocate additional 
> > 8 blocks in advance for a particular inode when allocating a block
> > for it. The gap between blocks of the same file shouldn't be there 
> > too. Both of these cases should be removed. I will look into this 
> > during this week. The slowness is also due to lack of preallocation
> > in the new code.
> 
> One of the GSoC students worked on a patch to add preallocation back to
> ext2fs this summer.  Would you be interested in reviewing and/or testing
> that patch?  (I've attached it).  Here is his original e-mail:
[snip]

Hi John,

I did a review of Zheng Liu's reservation window patch last week and 
I suggested him a few changes. Otherwise the code looks awesome. 
But it would be great if someone else can review the patch too and if
everything goes well, we should merge this to HEAD.  
For the ext4 part, I still have to review his patches and I am planning 
to do it soon. Zheng is planning to have a separate module for ext4, 
and it does make sense. We are aiming at bringing ext4 to a usable state
for 9-RELEASE (atleast read-only).

Cheers
Aditya Sarawgi

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 22:11:12 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5AE8D106566C
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 22:11:12 +0000 (UTC)
	(envelope-from torbjoern@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id C49AB8FC19
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 22:11:11 +0000 (UTC)
Received: by bwz15 with SMTP id 15so1261483bwz.13
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 15:11:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type;
	bh=ITbXkRueR7DEq0EkNFGyTjw/gfv1J1W6lhjlXTA/P0c=;
	b=txYN2aYlFiRnPRJQlvZXRVuG5HAY68CdQr+MbzIeEtgH3sMbV0+1umWMJPnmzjcOoc
	jdRn9DxDVvJZT82c9kEVRpVRtRy7rXTCE9pjDIsR0lcrAVsQ+yBeQAAhzU1DJZAMFCbq
	zCsjlWfwxCQpOQaYPvq0nZynQKd3l+poFOOko=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type;
	b=TLoivsc+bGi2PM65lHWEcQRlOfxwPkNE9w7pDBl5PH1LSmqW1yJwQQZYuBO/ksQInl
	BsungBjZ8qkcbTLNSJjFzM/l0L0xr+HtV8AWdNgwf68dEnZJAlfm5xA+H1wTIjgnFYV6
	Bpr9AuMb2ZQyTJQOOsDNmid301lUbIvbEBiE0=
MIME-Version: 1.0
Received: by 10.204.118.65 with SMTP id u1mr1796491bkq.169.1285798269363; Wed,
	29 Sep 2010 15:11:09 -0700 (PDT)
Received: by 10.204.71.138 with HTTP; Wed, 29 Sep 2010 15:11:09 -0700 (PDT)
In-Reply-To: <AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
Date: Thu, 30 Sep 2010 00:11:09 +0200
Message-ID: <AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
From: Torbjorn Kristoffersen <torbjoern@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 22:11:12 -0000

I'm at a complete loss here. I shut down the jail completely, and I am
watching the jail's ZFS filesystem grow as we speak.  No process is using
it.   It only grows in "df" and "zfs list", I can't find any files that are
growing.  I have to re-set the quota to be higher and higher to accommodate
the space.

On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen <
torbjoern@gmail.com> wrote:

> Hi Jeremy.
>
> 1) I checked now, and found nothing extraordinary. Just processes that have
> been running for a long while, such as screen, cron, sshd, bash, irssi,
> syslogd, etc.
>
> 2) No compression used on this zfs filesystem (or any of the others).
>
> I completedly stopped the jail now, and removed some of the directories
> with the most data in them, but to no avail.
>
>
> On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick <freebsd@jdc.parodius.com
> > wrote:
>
>> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote:
>> > I have a ZFS "tank" called tpool, the server runs a couple of jails
>> (each
>> > with a zfs filesystem).  There is a problem with one of these
>> filesystems.
>> > First, its disk usage as shown in ``df -h'':
>> > ...
>> > tpool/rb.org      100G     95G    4.6G    95%    /jails/rb.org
>> > ...
>> >
>> > The command ``zfs list'' shows the same:
>> > ..
>> > tpool/rb.org    95.4G  4.56G  95.4G  /jails/rb.org
>> > ..
>> >
>> > However, there is a very mysterious problem somewhere.
>> > Something inside this jail is eating diskspace, but we can't find any
>> > directories that is actually taking the diskspace. We first suspected
>> either
>> > fetchmail or spamassassin of causing a lot of space to be used, since
>> some
>> > of their directories were huge. (These were later deleted, and which is
>> why
>> > you see that 4.6GB is now available, before that 0GB was available).
>> >
>> > However, we can't find *any trace* of an actual directory or file that
>> is
>> > taking all the spac.e
>> >
>> > Take this for instance:
>> >
>> > outsidejail# du -sh rb.org
>> >  43G    rb.org
>> >
>> > How can this be?  df and zfs are showing that the entire drive is nearly
>> > full, yet I can't find any directory that is actually taking all this
>> space.
>> >  I've carefully looked through every single directory within the jail
>> trying
>> > to find something that's taking all that space, but to no avail.
>> >
>> > ----
>> > My system stats:
>> > # uname -a
>> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC
>> > 2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>> > # zpool get version tpool
>> > NAME   PROPERTY  VALUE    SOURCE
>> > tpool  version   14       default
>> > # zpool status
>> >   pool: tpool
>> >  state: ONLINE
>> >  scrub: none requested
>> > config:
>> >
>> >         NAME        STATE     READ WRITE CKSUM
>> >         tpool       ONLINE       0     0     0
>> >           mirror    ONLINE       0     0     0
>> >             ad4s1d  ONLINE       0     0     0
>> >             ad6s1d  ONLINE       0     0     0
>> >
>> > errors: No known data errors
>> >
>> > [ Note that I've also done a scrub recently ]
>>
>> 1) Have you checked using fstat to ensure that no file descriptors
>> remain open on any of your ZFS filesystems (not pools)?
>>
>> 2) Are you using compression on any of your ZFS filesystems?
>>
>> --
>> | Jeremy Chadwick                                   jdc@parodius.com |
>> | Parodius Networking                       http://www.parodius.com/ |
>> | UNIX Systems Administrator                  Mountain View, CA, USA |
>> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
>>
>>
>

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 29 22:15:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 910CE1065673
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 22:15:52 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta03.westchester.pa.mail.comcast.net
	(qmta03.westchester.pa.mail.comcast.net [76.96.62.32])
	by mx1.freebsd.org (Postfix) with ESMTP id 3A0288FC12
	for <freebsd-fs@freebsd.org>; Wed, 29 Sep 2010 22:15:51 +0000 (UTC)
Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta03.westchester.pa.mail.comcast.net with comcast
	id CbLE1f0061uE5Es53mFsit; Wed, 29 Sep 2010 22:15:52 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id CmFq1f00E3LrwQ23cmFrtZ; Wed, 29 Sep 2010 22:15:52 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 6B6099B418; Wed, 29 Sep 2010 15:15:49 -0700 (PDT)
Date: Wed, 29 Sep 2010 15:15:49 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Torbjorn Kristoffersen <torbjoern@gmail.com>
Message-ID: <20100929221549.GA343@icarus.home.lan>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>,
	Andriy Gapon <avg@icyb.net.ua>
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 22:15:52 -0000

On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote:
> I'm at a complete loss here. I shut down the jail completely, and I am
> watching the jail's ZFS filesystem grow as we speak.  No process is using
> it.   It only grows in "df" and "zfs list", I can't find any files that are
> growing.  I have to re-set the quota to be higher and higher to accommodate
> the space.
> 
> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen <
> torbjoern@gmail.com> wrote:
> 
> > Hi Jeremy.
> >
> > 1) I checked now, and found nothing extraordinary. Just processes that have
> > been running for a long while, such as screen, cron, sshd, bash, irssi,
> > syslogd, etc.
> >
> > 2) No compression used on this zfs filesystem (or any of the others).
> >
> > I completedly stopped the jail now, and removed some of the directories
> > with the most data in them, but to no avail.
> >
> >
> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick <freebsd@jdc.parodius.com
> > > wrote:
> >
> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote:
> >> > I have a ZFS "tank" called tpool, the server runs a couple of jails
> >> (each
> >> > with a zfs filesystem).  There is a problem with one of these
> >> filesystems.
> >> > First, its disk usage as shown in ``df -h'':
> >> > ...
> >> > tpool/rb.org      100G     95G    4.6G    95%    /jails/rb.org
> >> > ...
> >> >
> >> > The command ``zfs list'' shows the same:
> >> > ..
> >> > tpool/rb.org    95.4G  4.56G  95.4G  /jails/rb.org
> >> > ..
> >> >
> >> > However, there is a very mysterious problem somewhere.
> >> > Something inside this jail is eating diskspace, but we can't find any
> >> > directories that is actually taking the diskspace. We first suspected
> >> either
> >> > fetchmail or spamassassin of causing a lot of space to be used, since
> >> some
> >> > of their directories were huge. (These were later deleted, and which is
> >> why
> >> > you see that 4.6GB is now available, before that 0GB was available).
> >> >
> >> > However, we can't find *any trace* of an actual directory or file that
> >> is
> >> > taking all the spac.e
> >> >
> >> > Take this for instance:
> >> >
> >> > outsidejail# du -sh rb.org
> >> >  43G    rb.org
> >> >
> >> > How can this be?  df and zfs are showing that the entire drive is nearly
> >> > full, yet I can't find any directory that is actually taking all this
> >> space.
> >> >  I've carefully looked through every single directory within the jail
> >> trying
> >> > to find something that's taking all that space, but to no avail.
> >> >
> >> > ----
> >> > My system stats:
> >> > # uname -a
> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC
> >> > 2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
> >> > # zpool get version tpool
> >> > NAME   PROPERTY  VALUE    SOURCE
> >> > tpool  version   14       default
> >> > # zpool status
> >> >   pool: tpool
> >> >  state: ONLINE
> >> >  scrub: none requested
> >> > config:
> >> >
> >> >         NAME        STATE     READ WRITE CKSUM
> >> >         tpool       ONLINE       0     0     0
> >> >           mirror    ONLINE       0     0     0
> >> >             ad4s1d  ONLINE       0     0     0
> >> >             ad6s1d  ONLINE       0     0     0
> >> >
> >> > errors: No known data errors
> >> >
> >> > [ Note that I've also done a scrub recently ]
> >>
> >> 1) Have you checked using fstat to ensure that no file descriptors
> >> remain open on any of your ZFS filesystems (not pools)?
> >>
> >> 2) Are you using compression on any of your ZFS filesystems?

Andriy and Pawel,

Do either of you have ideas as to what could cause the issue Torbjorn's
experiencing?  I swear I remember some bug or quirk that got fixed with
regards to free space on ZFS, but as has been proven time and time again
my memory is horrible.  His kernel's 8.1-RELEASE dated July 19th.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 06:36:02 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7953C1065670
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 06:36:02 +0000 (UTC)
	(envelope-from cal@linu.gs)
Received: from mxout006.mail.hostpoint.ch (mxout006.mail.hostpoint.ch
	[217.26.49.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 39D068FC1C
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 06:36:02 +0000 (UTC)
Received: from [10.0.2.10] (helo=asmtp001.mail.hostpoint.ch)
	by mxout006.mail.hostpoint.ch with esmtp (Exim 4.72 (FreeBSD))
	(envelope-from <cal@linu.gs>) id 1P1CJV-0000eu-3u
	for freebsd-fs@freebsd.org; Thu, 30 Sep 2010 08:08:17 +0200
Received: from [46.127.80.198] (helo=helvetia.localnet)
	by asmtp001.mail.hostpoint.ch with esmtpa (Exim 4.72 (FreeBSD))
	(envelope-from <cal@linu.gs>) id 1P1CJV-000CzQ-0L
	for freebsd-fs@freebsd.org; Thu, 30 Sep 2010 08:08:17 +0200
X-Authenticated-Sender-Id: cal@rubberfrog.net
From: Michael Naef <cal@linu.gs>
To: "freebsd-fs" <freebsd-fs@freebsd.org>
Date: Thu, 30 Sep 2010 08:08:11 +0200
User-Agent: KMail/1.13.2 (Linux/2.6.32-24-generic; KDE/4.4.2; x86_64; ; )
References: <201009231938.09548.cal@linu.gs> <201009271624.46655.cal@linu.gs>
	<20100927181233.0e8c2869@ernst.jennejohn.org>
In-Reply-To: <20100927181233.0e8c2869@ernst.jennejohn.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201009300808.12514.cal@linu.gs>
Subject: Re: Strange behaviour with sappend flag set on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 06:36:02 -0000

Hi 

> Sending a PR is always a good idea - it ends up in the tracking
> system and doesn't get lost in the mailing-list noise.

It's now (hopefully not) getting lost in the PR system:

http://www.freebsd.org/cgi/query-pr.cgi?pr=151082

Thank you all, Michi

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 08:09:48 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6E5451065670
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 08:09:48 +0000 (UTC)
	(envelope-from kpielorz_lst@tdx.co.uk)
Received: from mail.tdx.com (mail.tdx.com [62.13.128.18])
	by mx1.freebsd.org (Postfix) with ESMTP id 0EDFD8FC14
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 08:09:47 +0000 (UTC)
Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk
	[62.13.130.30]) (authenticated bits=0)
	by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8U89k47047154
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Thu, 30 Sep 2010 09:09:46 +0100 (BST)
Date: Thu, 30 Sep 2010 09:08:39 +0100
From: Karl Pielorz <kpielorz_lst@tdx.co.uk>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <BA59949D5CDBDAAE579D39CC@HexaDeca64.dmpriest.net.uk>
In-Reply-To: <20100929151111.GA91705@icarus.home.lan>
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
	<20100929151111.GA91705@icarus.home.lan>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: freebsd-fs@freebsd.org
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 08:09:48 -0000


--On 29 September 2010 08:11 -0700 Jeremy Chadwick 
<freebsd@jdc.parodius.com> wrote:

> I can review these statistics to see if any of the disks look like they
> may be misbehaving.

I had to back off to 7.3-R. Unfortunately the machine is the 'everything' 
server at home (routing, dhcp, storage, mail etc.) - so it wasn't proving 
very popular messing around with it :(

I put 7.3-R back on - everything works as it did. ZFS re-scrubbed the 
pools, and I'm good to go.

We have a 'very similar' machine at the office (same controllers etc.) - 
I'll see if I can get enough drives together and run that up. If that fails 
as well it's a much better platform to debug on :)

-Kp

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 08:37:02 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 23B07106566B;
	Thu, 30 Sep 2010 08:37:02 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 979938FC14;
	Thu, 30 Sep 2010 08:37:01 +0000 (UTC)
Received: from outgoing.leidinger.net (p57B3ABE8.dip.t-dialin.net
	[87.179.171.232])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D986684400A;
	Thu, 30 Sep 2010 10:36:56 +0200 (CEST)
Received: from webmail.leidinger.net (unknown [IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id D41A91198;
	Thu, 30 Sep 2010 10:36:53 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o8U8alK8006368;
	Thu, 30 Sep 2010 10:36:47 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 30 Sep 2010
	10:36:47 +0200
Message-ID: <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
Date: Thu, 30 Sep 2010 10:36:47 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
	<20100929221549.GA343@icarus.home.lan>
In-Reply-To: <20100929221549.GA343@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: D986684400A.A90B1
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=2.028, required 6,
	autolearn=disabled, J_CHICKENPOX_41 0.60, RDNS_NONE 1.27, TW_JL 0.08,
	TW_ZF 0.08)
X-EBL-MailScanner-SpamScore: ss
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1286440618.16161@dE0iwi07uJgVRYkV8syYQQ
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>,
	Andriy Gapon <avg@icyb.net.ua>
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 08:37:02 -0000


Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Wed, 29 Sep  
2010 15:15:49 -0700):

> On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote:
>> I'm at a complete loss here. I shut down the jail completely, and I am
>> watching the jail's ZFS filesystem grow as we speak.  No process is using
>> it.   It only grows in "df" and "zfs list", I can't find any files that are
>> growing.  I have to re-set the quota to be higher and higher to accommodate
>> the space.
>>
>> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen <
>> torbjoern@gmail.com> wrote:
>>
>> > Hi Jeremy.
>> >
>> > 1) I checked now, and found nothing extraordinary. Just processes  
>> that have
>> > been running for a long while, such as screen, cron, sshd, bash, irssi,
>> > syslogd, etc.
>> >
>> > 2) No compression used on this zfs filesystem (or any of the others).
>> >
>> > I completedly stopped the jail now, and removed some of the directories
>> > with the most data in them, but to no avail.
>> >
>> >
>> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick <freebsd@jdc.parodius.com
>> > > wrote:
>> >
>> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote:
>> >> > I have a ZFS "tank" called tpool, the server runs a couple of jails
>> >> (each
>> >> > with a zfs filesystem).  There is a problem with one of these
>> >> filesystems.
>> >> > First, its disk usage as shown in ``df -h'':
>> >> > ...
>> >> > tpool/rb.org      100G     95G    4.6G    95%    /jails/rb.org
>> >> > ...
>> >> >
>> >> > The command ``zfs list'' shows the same:
>> >> > ..
>> >> > tpool/rb.org    95.4G  4.56G  95.4G  /jails/rb.org
>> >> > ..
>> >> >
>> >> > However, there is a very mysterious problem somewhere.
>> >> > Something inside this jail is eating diskspace, but we can't find any
>> >> > directories that is actually taking the diskspace. We first suspected
>> >> either
>> >> > fetchmail or spamassassin of causing a lot of space to be used, since
>> >> some
>> >> > of their directories were huge. (These were later deleted, and which is
>> >> why
>> >> > you see that 4.6GB is now available, before that 0GB was available).
>> >> >
>> >> > However, we can't find *any trace* of an actual directory or file that
>> >> is
>> >> > taking all the spac.e
>> >> >
>> >> > Take this for instance:
>> >> >
>> >> > outsidejail# du -sh rb.org
>> >> >  43G    rb.org
>> >> >
>> >> > How can this be?  df and zfs are showing that the entire drive  
>> is nearly
>> >> > full, yet I can't find any directory that is actually taking all this
>> >> space.
>> >> >  I've carefully looked through every single directory within the jail
>> >> trying
>> >> > to find something that's taking all that space, but to no avail.
>> >> >
>> >> > ----
>> >> > My system stats:
>> >> > # uname -a
>> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19  
>> 02:36:49 UTC
>> >> > 2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>> >> > # zpool get version tpool
>> >> > NAME   PROPERTY  VALUE    SOURCE
>> >> > tpool  version   14       default
>> >> > # zpool status
>> >> >   pool: tpool
>> >> >  state: ONLINE
>> >> >  scrub: none requested
>> >> > config:
>> >> >
>> >> >         NAME        STATE     READ WRITE CKSUM
>> >> >         tpool       ONLINE       0     0     0
>> >> >           mirror    ONLINE       0     0     0
>> >> >             ad4s1d  ONLINE       0     0     0
>> >> >             ad6s1d  ONLINE       0     0     0
>> >> >
>> >> > errors: No known data errors
>> >> >
>> >> > [ Note that I've also done a scrub recently ]
>> >>
>> >> 1) Have you checked using fstat to ensure that no file descriptors
>> >> remain open on any of your ZFS filesystems (not pools)?
>> >>
>> >> 2) Are you using compression on any of your ZFS filesystems?
>
> Andriy and Pawel,
>
> Do either of you have ideas as to what could cause the issue Torbjorn's
> experiencing?  I swear I remember some bug or quirk that got fixed with
> regards to free space on ZFS, but as has been proven time and time again
> my memory is horrible.  His kernel's 8.1-RELEASE dated July 19th.

IIRC the commit you talk about was by Martin (CCed). I do not know if  
it is (already) MFCed.

I'm not sure the bug you talk about is related to what Torbjorn is  
talking about. The fact that the free space is going down while the  
jail is shutdown (and I assume jls does not show his JID anymore, so  
all of its processes are really gone) points more to some other  
process (outside of the jail) which is filling some (maybe already  
deleted, so not visible anymore with du) file.

Bye,
Alexander.

-- 
A wide-eyed, innocent UNICORN, poised delicately in a MEADOW filled
with LILACS, LOLLIPOPS & small CHILDREN at the HUSH of twilight??

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 09:11:48 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 01F2610656A4
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 09:11:48 +0000 (UTC)
	(envelope-from fbsd@dannysplace.net)
Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184])
	by mx1.freebsd.org (Postfix) with ESMTP id BFF038FC15
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 09:11:47 +0000 (UTC)
Received: from [203.206.171.212] (helo=[192.168.10.10])
	by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <fbsd@dannysplace.net>)
	id 1P1FBZ-0003Ii-Gg
	for freebsd-fs@freebsd.org; Thu, 30 Sep 2010 19:12:19 +1000
Message-ID: <4CA45444.6070002@dannysplace.net>
Date: Thu, 30 Sep 2010 19:11:32 +1000
From: Danny Carroll <fbsd@dannysplace.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>	<20100929192534.GA97031@icarus.home.lan>	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>	<20100929221549.GA343@icarus.home.lan>
	<20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
In-Reply-To: <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Authenticated-User: danny
X-Authenticator: plain
X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29)
X-Date: 2010-09-30 19:12:17
X-Connected-IP: 203.206.171.212:57823
X-Message-Linecount: 159
X-Body-Linecount: 145
X-Message-Size: 5967
X-Body-Size: 5077
X-Received-Count: 1
X-Recipient-Count: 1
X-Local-Recipient-Count: 1
X-Local-Recipient-Defer-Count: 0
X-Local-Recipient-Fail-Count: 0
X-SA-Exim-Connect-IP: 203.206.171.212
X-SA-Exim-Mail-From: fbsd@dannysplace.net
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	damka.dannysplace.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-SA-Exim-Version: 4.2
X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net)
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: fbsd@dannysplace.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 09:11:48 -0000

 On 30/09/2010 6:36 PM, Alexander Leidinger wrote:
>
> Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Wed, 29 Sep
> 2010 15:15:49 -0700):
>
>> On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote:
>>> I'm at a complete loss here. I shut down the jail completely, and I am
>>> watching the jail's ZFS filesystem grow as we speak.  No process is
>>> using
>>> it.   It only grows in "df" and "zfs list", I can't find any files
>>> that are
>>> growing.  I have to re-set the quota to be higher and higher to
>>> accommodate
>>> the space.
>>>
>>> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen <
>>> torbjoern@gmail.com> wrote:
>>>
>>> > Hi Jeremy.
>>> >
>>> > 1) I checked now, and found nothing extraordinary. Just processes
>>> that have
>>> > been running for a long while, such as screen, cron, sshd, bash,
>>> irssi,
>>> > syslogd, etc.
>>> >
>>> > 2) No compression used on this zfs filesystem (or any of the others).
>>> >
>>> > I completedly stopped the jail now, and removed some of the
>>> directories
>>> > with the most data in them, but to no avail.
>>> >
>>> >
>>> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick
>>> <freebsd@jdc.parodius.com
>>> > > wrote:
>>> >
>>> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen
>>> wrote:
>>> >> > I have a ZFS "tank" called tpool, the server runs a couple of
>>> jails
>>> >> (each
>>> >> > with a zfs filesystem).  There is a problem with one of these
>>> >> filesystems.
>>> >> > First, its disk usage as shown in ``df -h'':
>>> >> > ...
>>> >> > tpool/rb.org      100G     95G    4.6G    95%    /jails/rb.org
>>> >> > ...
>>> >> >
>>> >> > The command ``zfs list'' shows the same:
>>> >> > ..
>>> >> > tpool/rb.org    95.4G  4.56G  95.4G  /jails/rb.org
>>> >> > ..
>>> >> >
>>> >> > However, there is a very mysterious problem somewhere.
>>> >> > Something inside this jail is eating diskspace, but we can't
>>> find any
>>> >> > directories that is actually taking the diskspace. We first
>>> suspected
>>> >> either
>>> >> > fetchmail or spamassassin of causing a lot of space to be used,
>>> since
>>> >> some
>>> >> > of their directories were huge. (These were later deleted, and
>>> which is
>>> >> why
>>> >> > you see that 4.6GB is now available, before that 0GB was
>>> available).
>>> >> >
>>> >> > However, we can't find *any trace* of an actual directory or
>>> file that
>>> >> is
>>> >> > taking all the spac.e
>>> >> >
>>> >> > Take this for instance:
>>> >> >
>>> >> > outsidejail# du -sh rb.org
>>> >> >  43G    rb.org
>>> >> >
>>> >> > How can this be?  df and zfs are showing that the entire drive
>>> is nearly
>>> >> > full, yet I can't find any directory that is actually taking
>>> all this
>>> >> space.
>>> >> >  I've carefully looked through every single directory within
>>> the jail
>>> >> trying
>>> >> > to find something that's taking all that space, but to no avail.
>>> >> >
>>> >> > ----
>>> >> > My system stats:
>>> >> > # uname -a
>>> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19
>>> 02:36:49 UTC
>>> >> > 2010    
>>> root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>>> >> > # zpool get version tpool
>>> >> > NAME   PROPERTY  VALUE    SOURCE
>>> >> > tpool  version   14       default
>>> >> > # zpool status
>>> >> >   pool: tpool
>>> >> >  state: ONLINE
>>> >> >  scrub: none requested
>>> >> > config:
>>> >> >
>>> >> >         NAME        STATE     READ WRITE CKSUM
>>> >> >         tpool       ONLINE       0     0     0
>>> >> >           mirror    ONLINE       0     0     0
>>> >> >             ad4s1d  ONLINE       0     0     0
>>> >> >             ad6s1d  ONLINE       0     0     0
>>> >> >
>>> >> > errors: No known data errors
>>> >> >
>>> >> > [ Note that I've also done a scrub recently ]
>>> >>
>>> >> 1) Have you checked using fstat to ensure that no file descriptors
>>> >> remain open on any of your ZFS filesystems (not pools)?
>>> >>
>>> >> 2) Are you using compression on any of your ZFS filesystems?
>>
>> Andriy and Pawel,
>>
>> Do either of you have ideas as to what could cause the issue Torbjorn's
>> experiencing?  I swear I remember some bug or quirk that got fixed with
>> regards to free space on ZFS, but as has been proven time and time again
>> my memory is horrible.  His kernel's 8.1-RELEASE dated July 19th.
>
> IIRC the commit you talk about was by Martin (CCed). I do not know if
> it is (already) MFCed.
>
> I'm not sure the bug you talk about is related to what Torbjorn is
> talking about. The fact that the free space is going down while the
> jail is shutdown (and I assume jls does not show his JID anymore, so
> all of its processes are really gone) points more to some other
> process (outside of the jail) which is filling some (maybe already
> deleted, so not visible anymore with du) file.
>

It certainly smells like a process still writing to a file that is unlinked.
I wonder if it would show up with lsof.

If dtrace is enabled on that machine then I think it should be easy to
see which process is performing write operations.

-D

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 09:17:53 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E08631065693
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 09:17:53 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta09.emeryville.ca.mail.comcast.net
	(qmta09.emeryville.ca.mail.comcast.net [76.96.30.96])
	by mx1.freebsd.org (Postfix) with ESMTP id C5FD78FC0A
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 09:17:53 +0000 (UTC)
Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44])
	by qmta09.emeryville.ca.mail.comcast.net with comcast
	id CxC91f0070x6nqcA9xHtG2; Thu, 30 Sep 2010 09:17:53 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta12.emeryville.ca.mail.comcast.net with comcast
	id CxHr1f0083LrwQ28YxHsdd; Thu, 30 Sep 2010 09:17:52 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id B9C539B418; Thu, 30 Sep 2010 02:17:51 -0700 (PDT)
Date: Thu, 30 Sep 2010 02:17:51 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Karl Pielorz <kpielorz_lst@tdx.co.uk>
Message-ID: <20100930091751.GA13840@icarus.home.lan>
References: <E535F873EBCF64D19AC01035@HexaDeca64.dmpriest.net.uk>
	<20100929151111.GA91705@icarus.home.lan>
	<BA59949D5CDBDAAE579D39CC@HexaDeca64.dmpriest.net.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BA59949D5CDBDAAE579D39CC@HexaDeca64.dmpriest.net.uk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 09:17:54 -0000

On Thu, Sep 30, 2010 at 09:08:39AM +0100, Karl Pielorz wrote:
> --On 29 September 2010 08:11 -0700 Jeremy Chadwick
> <freebsd@jdc.parodius.com> wrote:
> 
> >I can review these statistics to see if any of the disks look like they
> >may be misbehaving.
> 
> I had to back off to 7.3-R. Unfortunately the machine is the
> 'everything' server at home (routing, dhcp, storage, mail etc.) - so
> it wasn't proving very popular messing around with it :(
> 
> I put 7.3-R back on - everything works as it did. ZFS re-scrubbed
> the pools, and I'm good to go.
> 
> We have a 'very similar' machine at the office (same controllers
> etc.) - I'll see if I can get enough drives together and run that
> up. If that fails as well it's a much better platform to debug on :)

All I'm interested in at this point are the drives in the machine which
is having the problem.  It doesn't matter if you're running smartctl on
7.3-RELEASE or 8.x -- the disk SMART stats will be the same.

So if you can provide them, I can review them and point to issues which
might explain disk I/O deadlock.

You did mention some bad sectors, and with regards to those, people very
often misread the attributes and assume the wrong thing (meaning, "oh
looks like the disk found some bad LBAs and so they're fixed" when the
situation is actually "the LBAs are bad and not fully remapped" which
can cause I/O deadlock if those blocks are read and/or sometimes written
to).

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 13:28:27 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3607F1065674
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 13:28:27 +0000 (UTC)
	(envelope-from torbjoern@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id AC24F8FC1C
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 13:28:26 +0000 (UTC)
Received: by bwz15 with SMTP id 15so1764988bwz.13
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 06:28:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=Ip0w/go5Hd/apJrlZ+IfX/SJ3XmKiOK97viq1nla3lk=;
	b=uGe3Uqf1tyQOUJkHj7obrWi0PxkF9VOFNn4P7KcPSQMQIfIypcolfsGF6xCIag1YYH
	hnsYLHdNHiRse0s6ixukUXAjQCjLcO/64xySTj2GYMrhQzRR71UjZjT2X5UW3m9NHydD
	Nsxh+350mLvW++/9N4p8Bi4gD+YKHwxJEwKlk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=baBuh7MX3QnIirYZ7jJ4oPFISvTddUifvnloV4a8zmPepYGtS3MTbo8RJ+muz//3+w
	S++RwEORWQJnSjGbvGQBxwLqFWgsQjybaPluUpWtjNCPlx9YVm9MimpacMKkHgZmgeSg
	aejAnTreCig0q/WIV6I77l4533KIJMsxzErEI=
MIME-Version: 1.0
Received: by 10.204.126.92 with SMTP id b28mr2686416bks.47.1285853305557; Thu,
	30 Sep 2010 06:28:25 -0700 (PDT)
Received: by 10.204.71.138 with HTTP; Thu, 30 Sep 2010 06:28:25 -0700 (PDT)
In-Reply-To: <4CA45444.6070002@dannysplace.net>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
	<20100929221549.GA343@icarus.home.lan>
	<20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
	<4CA45444.6070002@dannysplace.net>
Date: Thu, 30 Sep 2010 15:28:25 +0200
Message-ID: <AANLkTi=x5irhAM8uhiZJLztE230=Q9CAMDeja=Bo4fVL@mail.gmail.com>
From: Torbjorn Kristoffersen <torbjoern@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 13:28:27 -0000

On Thu, Sep 30, 2010 at 11:11 AM, Danny Carroll <fbsd@dannysplace.net> wrot=
e:
>
> =A0On 30/09/2010 6:36 PM, Alexander Leidinger wrote:
> >
> > Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Wed, 29 Sep
> > 2010 15:15:49 -0700):
> >
> >> On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote=
:
> >>> I'm at a complete loss here. I shut down the jail completely, and I a=
m
> >>> watching the jail's ZFS filesystem grow as we speak. =A0No process is
> >>> using
> >>> it. =A0 It only grows in "df" and "zfs list", I can't find any files
> >>> that are
> >>> growing. =A0I have to re-set the quota to be higher and higher to
> >>> accommodate
> >>> the space.
> >>>
> >>> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen <
> >>> torbjoern@gmail.com> wrote:
> >>>
> >>> > Hi Jeremy.
> >>> >
> >>> > 1) I checked now, and found nothing extraordinary. Just processes
> >>> that have
> >>> > been running for a long while, such as screen, cron, sshd, bash,
> >>> irssi,
> >>> > syslogd, etc.
> >>> >
> >>> > 2) No compression used on this zfs filesystem (or any of the others=
).
> >>> >
> >>> > I completedly stopped the jail now, and removed some of the
> >>> directories
> >>> > with the most data in them, but to no avail.
> >>> >
> >>> >
> >>> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick
> >>> <freebsd@jdc.parodius.com
> >>> > > wrote:
> >>> >
> >>> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen
> >>> wrote:
> >>> >> > I have a ZFS "tank" called tpool, the server runs a couple of
> >>> jails
> >>> >> (each
> >>> >> > with a zfs filesystem). =A0There is a problem with one of these
> >>> >> filesystems.
> >>> >> > First, its disk usage as shown in ``df -h'':
> >>> >> > ...
> >>> >> > tpool/rb.org =A0 =A0 =A0100G =A0 =A0 95G =A0 =A04.6G =A0 =A095% =
=A0 =A0/jails/rb.org
> >>> >> > ...
> >>> >> >
> >>> >> > The command ``zfs list'' shows the same:
> >>> >> > ..
> >>> >> > tpool/rb.org =A0 =A095.4G =A04.56G =A095.4G =A0/jails/rb.org
> >>> >> > ..
> >>> >> >
> >>> >> > However, there is a very mysterious problem somewhere.
> >>> >> > Something inside this jail is eating diskspace, but we can't
> >>> find any
> >>> >> > directories that is actually taking the diskspace. We first
> >>> suspected
> >>> >> either
> >>> >> > fetchmail or spamassassin of causing a lot of space to be used,
> >>> since
> >>> >> some
> >>> >> > of their directories were huge. (These were later deleted, and
> >>> which is
> >>> >> why
> >>> >> > you see that 4.6GB is now available, before that 0GB was
> >>> available).
> >>> >> >
> >>> >> > However, we can't find *any trace* of an actual directory or
> >>> file that
> >>> >> is
> >>> >> > taking all the spac.e
> >>> >> >
> >>> >> > Take this for instance:
> >>> >> >
> >>> >> > outsidejail# du -sh rb.org
> >>> >> > =A043G =A0 =A0rb.org
> >>> >> >
> >>> >> > How can this be? =A0df and zfs are showing that the entire drive
> >>> is nearly
> >>> >> > full, yet I can't find any directory that is actually taking
> >>> all this
> >>> >> space.
> >>> >> > =A0I've carefully looked through every single directory within
> >>> the jail
> >>> >> trying
> >>> >> > to find something that's taking all that space, but to no avail.
> >>> >> >
> >>> >> > ----
> >>> >> > My system stats:
> >>> >> > # uname -a
> >>> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19
> >>> 02:36:49 UTC
> >>> >> > 2010
> >>> root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC =A0amd64
> >>> >> > # zpool get version tpool
> >>> >> > NAME =A0 PROPERTY =A0VALUE =A0 =A0SOURCE
> >>> >> > tpool =A0version =A0 14 =A0 =A0 =A0 default
> >>> >> > # zpool status
> >>> >> > =A0 pool: tpool
> >>> >> > =A0state: ONLINE
> >>> >> > =A0scrub: none requested
> >>> >> > config:
> >>> >> >
> >>> >> > =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKS=
UM
> >>> >> > =A0 =A0 =A0 =A0 tpool =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0=
 =A0 =A0 0
> >>> >> > =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0=
 =A0 =A0 0
> >>> >> > =A0 =A0 =A0 =A0 =A0 =A0 ad4s1d =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0=
 =A0 =A0 0
> >>> >> > =A0 =A0 =A0 =A0 =A0 =A0 ad6s1d =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0=
 =A0 =A0 0
> >>> >> >
> >>> >> > errors: No known data errors
> >>> >> >
> >>> >> > [ Note that I've also done a scrub recently ]
> >>> >>
> >>> >> 1) Have you checked using fstat to ensure that no file descriptors
> >>> >> remain open on any of your ZFS filesystems (not pools)?
> >>> >>
> >>> >> 2) Are you using compression on any of your ZFS filesystems?
> >>
> >> Andriy and Pawel,
> >>
> >> Do either of you have ideas as to what could cause the issue Torbjorn'=
s
> >> experiencing? =A0I swear I remember some bug or quirk that got fixed w=
ith
> >> regards to free space on ZFS, but as has been proven time and time aga=
in
> >> my memory is horrible. =A0His kernel's 8.1-RELEASE dated July 19th.
> >
> > IIRC the commit you talk about was by Martin (CCed). I do not know if
> > it is (already) MFCed.
> >
> > I'm not sure the bug you talk about is related to what Torbjorn is
> > talking about. The fact that the free space is going down while the
> > jail is shutdown (and I assume jls does not show his JID anymore, so
> > all of its processes are really gone) points more to some other
> > process (outside of the jail) which is filling some (maybe already
> > deleted, so not visible anymore with du) file.
> >
>
> It certainly smells like a process still writing to a file that is unlink=
ed.
> I wonder if it would show up with lsof.
>
> If dtrace is enabled on that machine then I think it should be easy to
> see which process is performing write operations.
>

That could very well be.  Interestingly, dtrace is not installed and
doesn't even load.  When I do
kldload dtraceall it says:

    kldload: can't load dtraceall: Exec format error

=A0Perhaps I should recompile the kernel on this server, and build in
Dtrace into the kernel.  Perhaps I should first update to
FreeBSD-STABLE, as it is more cutting edge?

Actually, I'll first do a complete backup of this jail, remove the zfs
filesystem, then re-create it, put the files back, and see what
happens.  The unfortunate thing is that I will be ruining a chance to
find out what really happened.

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 14:34:19 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1CC4D106566C
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 14:34:19 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id BFF428FC13
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 14:34:18 +0000 (UTC)
Received: from outgoing.leidinger.net (p57B3ABE8.dip.t-dialin.net
	[87.179.171.232])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D94E884400A;
	Thu, 30 Sep 2010 16:34:12 +0200 (CEST)
Received: from webmail.leidinger.net (unknown [IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id E51C611F6;
	Thu, 30 Sep 2010 16:34:09 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o8UEY70w088149;
	Thu, 30 Sep 2010 16:34:07 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 30 Sep 2010
	16:34:06 +0200
Message-ID: <20100930163406.330767vpzidjygow@webmail.leidinger.net>
Date: Thu, 30 Sep 2010 16:34:06 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Torbjorn Kristoffersen <torbjoern@gmail.com>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
	<20100929221549.GA343@icarus.home.lan>
	<20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
	<4CA45444.6070002@dannysplace.net>
	<AANLkTi=x5irhAM8uhiZJLztE230=Q9CAMDeja=Bo4fVL@mail.gmail.com>
In-Reply-To: <AANLkTi=x5irhAM8uhiZJLztE230=Q9CAMDeja=Bo4fVL@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: D94E884400A.A8ADB
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=1.351, required 6,
	autolearn=disabled, RDNS_NONE 1.27, TW_ZF 0.08)
X-EBL-MailScanner-SpamScore: s
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1286462055.53009@R0kNRr5etFsbNPTjpmK0kw
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 14:34:19 -0000

Quoting Torbjorn Kristoffersen <torbjoern@gmail.com> (from Thu, 30 Sep =20
2010 15:28:25 +0200):

> That could very well be.  Interestingly, dtrace is not installed and
> doesn't even load.  When I do
> kldload dtraceall it says:
>
>     kldload: can't load dtraceall: Exec format error
>
> =C2=A0Perhaps I should recompile the kernel on this server, and build in
> Dtrace into the kernel.  Perhaps I should first update to
> FreeBSD-STABLE, as it is more cutting edge?
>
> Actually, I'll first do a complete backup of this jail, remove the zfs
> filesystem, then re-create it, put the files back, and see what
> happens.  The unfortunate thing is that I will be ruining a chance to
> find out what really happened.

I would give lsof a try first. Installing it from ports or packages is =20
not as much time consuming as updating the server, and may pinpoint =20
the problem.

Bye,
Alexander.

--=20
Things are more like they are today than they ever were before.
=09=09-- Dwight D. Eisenhower

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID =3D B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID =3D 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 14:39:08 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0F564106564A
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 14:39:08 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 829E28FC17
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 14:39:07 +0000 (UTC)
Received: from lurza.secnetix.de (localhost [127.0.0.1])
	by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8UEck7w019474;
	Thu, 30 Sep 2010 16:39:01 +0200 (CEST)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8UEckoY019473;
	Thu, 30 Sep 2010 16:38:46 +0200 (CEST) (envelope-from olli)
Date: Thu, 30 Sep 2010 16:38:46 +0200 (CEST)
Message-Id: <201009301438.o8UEckoY019473@lurza.secnetix.de>
From: Oliver Fromme <olli@lurza.secnetix.de>
To: freebsd-fs@FreeBSD.ORG, fbsd@dannysplace.net, torbjoern@gmail.com
In-Reply-To: <4CA45444.6070002@dannysplace.net>
X-Newsgroups: list.freebsd-fs
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX)
	(FreeBSD/6.4-PRERELEASE-20080904 (i386))
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5
	(lurza.secnetix.de [127.0.0.1]);
	Thu, 30 Sep 2010 16:39:02 +0200 (CEST)
Cc: 
Subject: Re: Strange ZFS problem,
	filesystem claims to be full when clearly  not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: freebsd-fs@FreeBSD.ORG, fbsd@dannysplace.net, torbjoern@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 14:39:08 -0000

Danny Carroll <fbsd@dannysplace.net> wrote:
 > [...]
 > It certainly smells like a process still writing to a file that is unlinked.
 > I wonder if it would show up with lsof.

If it's a file that was unlinked that is still held open by
a process, then lsof will definitely list it.  The command

# lsof +L1

lists all open files with a link count of zero.  You can
restrict it to a certain file system like this:

# lsof +aL1 /var

Of course, lsof won't list the file name because the file
doesn't have a name anymore.  But it lists the process by
name, PID and user, the file system and the file size.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch�ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M�n-
chen, HRB 125758,  Gesch�ftsf�hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"IRIX is about as stable as a one-legged drunk with hypothermia
in a four-hundred mile per hour wind, balancing on a banana
peel on a greased cookie sheet -- when someone throws him an
elephant with bad breath and a worse temper."
        -- Ralf Hildebrandt

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 14:48:47 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5A89F1065672
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 14:48:47 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta02.emeryville.ca.mail.comcast.net
	(qmta02.emeryville.ca.mail.comcast.net [76.96.30.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 409F48FC0A
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 14:48:46 +0000 (UTC)
Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36])
	by qmta02.emeryville.ca.mail.comcast.net with comcast
	id CzPr1f0040mlR8UA22omu2; Thu, 30 Sep 2010 14:48:46 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta11.emeryville.ca.mail.comcast.net with comcast
	id D2ol1f00U3LrwQ28X2olLh; Thu, 30 Sep 2010 14:48:46 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 5A82E9B418; Thu, 30 Sep 2010 07:48:45 -0700 (PDT)
Date: Thu, 30 Sep 2010 07:48:45 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: freebsd-fs@FreeBSD.ORG, fbsd@dannysplace.net, torbjoern@gmail.com
Message-ID: <20100930144845.GA19926@icarus.home.lan>
References: <4CA45444.6070002@dannysplace.net>
	<201009301438.o8UEckoY019473@lurza.secnetix.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201009301438.o8UEckoY019473@lurza.secnetix.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: 
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 14:48:47 -0000

On Thu, Sep 30, 2010 at 04:38:46PM +0200, Oliver Fromme wrote:
> Danny Carroll <fbsd@dannysplace.net> wrote:
>  > [...]
>  > It certainly smells like a process still writing to a file that is unlinked.
>  > I wonder if it would show up with lsof.
> 
> If it's a file that was unlinked that is still held open by
> a process, then lsof will definitely list it.  The command
> 
> # lsof +L1
> 
> lists all open files with a link count of zero.  You can
> restrict it to a certain file system like this:
> 
> # lsof +aL1 /var
> 
> Of course, lsof won't list the file name because the file
> doesn't have a name anymore.  But it lists the process by
> name, PID and user, the file system and the file size.

Can someone explain how use of lsof in this regard is different than use
of fstat(1) like I originally mentioned?  Does lsof do something more
thorough or differently that what fstat does?

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 15:48:02 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1847C106564A
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 15:48:02 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 60BB18FC13
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 15:48:01 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA24598;
	Thu, 30 Sep 2010 18:47:56 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA4B12B.7050307@icyb.net.ua>
Date: Thu, 30 Sep 2010 18:47:55 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4CA45444.6070002@dannysplace.net>	<201009301438.o8UEckoY019473@lurza.secnetix.de>
	<20100930144845.GA19926@icarus.home.lan>
In-Reply-To: <20100930144845.GA19926@icarus.home.lan>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 15:48:02 -0000

on 30/09/2010 17:48 Jeremy Chadwick said the following:
> On Thu, Sep 30, 2010 at 04:38:46PM +0200, Oliver Fromme wrote:
>> Danny Carroll <fbsd@dannysplace.net> wrote:
>>  > [...]
>>  > It certainly smells like a process still writing to a file that is unlinked.
>>  > I wonder if it would show up with lsof.
>>
>> If it's a file that was unlinked that is still held open by
>> a process, then lsof will definitely list it.  The command
>>
>> # lsof +L1
>>
>> lists all open files with a link count of zero.  You can
>> restrict it to a certain file system like this:
>>
>> # lsof +aL1 /var
>>
>> Of course, lsof won't list the file name because the file
>> doesn't have a name anymore.  But it lists the process by
>> name, PID and user, the file system and the file size.
> 
> Can someone explain how use of lsof in this regard is different than use
> of fstat(1) like I originally mentioned?  Does lsof do something more
> thorough or differently that what fstat does?

I believe that there is no reason to prefer lsof except for those who spent more
time with Linux than with FreeBSD.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 15:50:03 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9B1E31065672
	for <freebsd-fs@hub.freebsd.org>; Thu, 30 Sep 2010 15:50:03 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 8AD938FC15
	for <freebsd-fs@hub.freebsd.org>; Thu, 30 Sep 2010 15:50:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8UFo3dp033266
	for <freebsd-fs@freefall.freebsd.org>; Thu, 30 Sep 2010 15:50:03 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8UFo3Ua033265;
	Thu, 30 Sep 2010 15:50:03 GMT (envelope-from gnats)
Date: Thu, 30 Sep 2010 15:50:03 GMT
Message-Id: <201009301550.o8UFo3Ua033265@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Mark Atkinson <darkmark@filament.org>
Cc: 
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support utf-encoded
 international characters in filr names
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Mark Atkinson <darkmark@filament.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 15:50:03 -0000

The following reply was made to PR kern/133174; it has been noted by GNATS.

From: Mark Atkinson <darkmark@filament.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support utf-encoded
 international characters in filr names
Date: Thu, 30 Sep 2010 08:40:52 -0700

 The currently direct link to the url patch.   I hope to try this patch
 out soon as this is bothering me moving mp3 files back and forth to my
 phone over USB with non-ascii encoded chars in the filenames.
 
 http://btload.googlegroups.com/web/msdosfs.patch?gda=6OJa5z8AAABTKdAk9D4djfQOfSDW4ZV9vKlhdfRkDKO3uYPnaA-gp-toi5oIt3BJMRGeqGBbbj-ccyFKn-rNKC-d1pM_IdV0
 
 or via the google url shortener:
 
 http://goo.gl/CwRn
 

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 17:08:17 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 552A6106566B
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 17:08:17 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2])
	by mx1.freebsd.org (Postfix) with ESMTP id C0E668FC19
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 17:08:16 +0000 (UTC)
Received: from lurza.secnetix.de (localhost [127.0.0.1])
	by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8UH807Q026169;
	Thu, 30 Sep 2010 19:08:15 +0200 (CEST)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8UH7xAs026168;
	Thu, 30 Sep 2010 19:07:59 +0200 (CEST) (envelope-from olli)
Date: Thu, 30 Sep 2010 19:07:59 +0200 (CEST)
Message-Id: <201009301707.o8UH7xAs026168@lurza.secnetix.de>
From: Oliver Fromme <olli@lurza.secnetix.de>
To: freebsd-fs@FreeBSD.ORG, Jeremy Chadwick <freebsd@jdc.parodius.com>,
	Andriy Gapon <avg@icyb.net.ua>
In-Reply-To: <4CA4B12B.7050307@icyb.net.ua>
X-Newsgroups: list.freebsd-fs
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX)
	(FreeBSD/6.4-PRERELEASE-20080904 (i386))
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5
	(lurza.secnetix.de [127.0.0.1]);
	Thu, 30 Sep 2010 19:08:15 +0200 (CEST)
Cc: 
Subject: Re: Strange ZFS problem,
	filesystem claims to be full when clearly  not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: freebsd-fs@FreeBSD.ORG, Jeremy Chadwick <freebsd@jdc.parodius.com>,
	Andriy Gapon <avg@icyb.net.ua>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 17:08:17 -0000

Andriy Gapon wrote:
 > on 30/09/2010 17:48 Jeremy Chadwick said the following:
 > > Can someone explain how use of lsof in this regard is different than use
 > > of fstat(1) like I originally mentioned?  Does lsof do something more
 > > thorough or differently that what fstat does?
 > 
 > I believe that there is no reason to prefer lsof except for those who spent more
 > time with Linux than with FreeBSD.

Last time I had a try at fstat(1), it wasn't able to print
actual file names, while lsof was able to do it.  That's
why I generally prefer lsof over fstat(1).  For most of my
needs fstat(1) is useless if it can't display file names.
(I think DragonFly's fstat(1) can do it, FWIW.)

Of course, in this particular case it might be irrelevant
because the files in questions don't have names anymore.

On the other hand, I'm not sure how to use fstat(1) to
identify files with link count zero ...  I'm looking at
the manpage, but maybe it's just too late in the evening.
What command line would you suggest, exactly?  At least
it doesn't seem to be as easy as "lsof +L1".

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch�ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M�n-
chen, HRB 125758,  Gesch�ftsf�hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

It's trivial to make fun of Microsoft products,
but it takes a real man to make them work,
and a God to make them do anything useful.

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 17:08:55 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D5A0F1065675
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 17:08:55 +0000 (UTC)
	(envelope-from torbjoern@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 630E48FC1A
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 17:08:54 +0000 (UTC)
Received: by bwz15 with SMTP id 15so2045521bwz.13
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 10:08:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=Q9C6ztQhJlATbYUMqtzfJH68X3l3NO9+/ezEjA2o9UE=;
	b=T3mN+4uZQfCf4C9C5FwTYprdKCmQCjNBq0pEFnWjT+kRwCWmEwdCi2nwpUIFanq1c6
	tCcSj4f0UApseUDMnPa4L8mSrhmt1v1ypWFu0lpbjHLZ+kVHRF5pX+ygNExRehCXgvjk
	4j0nb7T/5+AKWZkOs8PwUurc3DKeVWjINPw7Y=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=clGNU903fGvzl0vxsIWw5Ifcb2BKQSYRl7S5HfWXYNmyHScY1d91n+5jegVr2G7Lpk
	syQmzNKKBRiWFHsyQZ+9HfVRUZFL98GvWAocZjSz8aC8xv1RXCvNEHJEIOjsAbAXnQcc
	kQ/RwBepbeDi5vcqk6n6szCKLcyr5hyucdpBU=
MIME-Version: 1.0
Received: by 10.204.82.167 with SMTP id b39mr3002759bkl.164.1285866530661;
	Thu, 30 Sep 2010 10:08:50 -0700 (PDT)
Received: by 10.204.71.138 with HTTP; Thu, 30 Sep 2010 10:08:50 -0700 (PDT)
In-Reply-To: <AANLkTinHoxX4MfVCEB2rrdcS1ubwQp+c37uP2BcP2Crm@mail.gmail.com>
References: <4CA45444.6070002@dannysplace.net>
	<201009301438.o8UEckoY019473@lurza.secnetix.de>
	<20100930144845.GA19926@icarus.home.lan>
	<4CA4B12B.7050307@icyb.net.ua>
	<AANLkTinHoxX4MfVCEB2rrdcS1ubwQp+c37uP2BcP2Crm@mail.gmail.com>
Date: Thu, 30 Sep 2010 19:08:50 +0200
Message-ID: <AANLkTimsSpP4nCE18H+QJCS1iKqw-wmoUdCc1OdU1tM2@mail.gmail.com>
From: Torbjorn Kristoffersen <torbjoern@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Fwd: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 17:08:56 -0000

On Thu, Sep 30, 2010 at 5:47 PM, Andriy Gapon <avg@icyb.net.ua> wrote:
> on 30/09/2010 17:48 Jeremy Chadwick said the following:
>> On Thu, Sep 30, 2010 at 04:38:46PM +0200, Oliver Fromme wrote:
>>> Danny Carroll <fbsd@dannysplace.net> wrote:
>>> =A0> [...]
>>> =A0> It certainly smells like a process still writing to a file that is=
 unlinked.
>>> =A0> I wonder if it would show up with lsof.
>>>
>>> If it's a file that was unlinked that is still held open by
>>> a process, then lsof will definitely list it. =A0The command
>>>
>>> # lsof +L1
>>>
>>> lists all open files with a link count of zero. =A0You can
>>> restrict it to a certain file system like this:
>>>
>>> # lsof +aL1 /var
>>>
>>> Of course, lsof won't list the file name because the file
>>> doesn't have a name anymore. =A0But it lists the process by
>>> name, PID and user, the file system and the file size.
>>
>> Can someone explain how use of lsof in this regard is different than use
>> of fstat(1) like I originally mentioned? =A0Does lsof do something more
>> thorough or differently that what fstat does?
>
> I believe that there is no reason to prefer lsof except for those who spe=
nt more
> time with Linux than with FreeBSD.
>

I tried fstat earlier and now I tried lsof as suggested. =A0Doing lsof
+L1 only gave me:
COMMAND =A0PID =A0USER =A0 FD =A0 TYPE DEVICE SIZE/OFF NLINK =A0 NODE NAME
mysqld =A01030 mysql =A0 =A04u =A0VREG =A0 0,99 =A0 =A0 =A0 =A00 =A0 =A0 0 =
800965 / (/dev/mirror/root)
mysqld =A01030 mysql =A0 =A05u =A0VREG =A0 0,99 =A0 =A0 =A0 =A00 =A0 =A0 0 =
800969 / (/dev/mirror/root)
mysqld =A01030 mysql =A0 =A06u =A0VREG =A0 0,99 =A0 =A0 =A0 =A00 =A0 =A0 0 =
800970 / (/dev/mirror/root)
....

Basically, it only gives me mysqld which runs outside the jails.
Nothing else was listed.

I noticed that the filesystem has stopped growing now though, so that
may also be the reason
why lsof does not show anything anymore. The "du -sh /jails/rb.org"
still gives a low usage value.
Also, this is the output from df -h (I've since resized the ZFS quota
to make the filesystem bigger for this jail):

tpool/rb.org =A0 =A0 =A0200G =A0 =A0111G =A0 =A0 89G =A0 =A056% =A0 =A0/jai=
ls/rb.org

If the process causing this is gone, or is working correctly (seeing
that the fs is no longer growing, I hope),
can dead unlinked files still remain, is there a way to purge them?

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 17:23:08 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F13751065695
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 17:23:08 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 462238FC32
	for <freebsd-fs@FreeBSD.ORG>; Thu, 30 Sep 2010 17:23:07 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA25971;
	Thu, 30 Sep 2010 20:23:06 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA4C77A.2030807@icyb.net.ua>
Date: Thu, 30 Sep 2010 20:23:06 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.ORG, Jeremy Chadwick <freebsd@jdc.parodius.com>,
	Andriy Gapon <avg@icyb.net.ua>, Oliver Fromme <olli@lurza.secnetix.de>
References: <201009301707.o8UH7xAs026168@lurza.secnetix.de>
In-Reply-To: <201009301707.o8UH7xAs026168@lurza.secnetix.de>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 17:23:09 -0000

on 30/09/2010 20:07 Oliver Fromme said the following:
> Last time I had a try at fstat(1), it wasn't able to print
> actual file names, while lsof was able to do it.  That's
> why I generally prefer lsof over fstat(1).  For most of my
> needs fstat(1) is useless if it can't display file names.
> (I think DragonFly's fstat(1) can do it, FWIW.)

Point taken.
However fstat still does print inode numbers.

> Of course, in this particular case it might be irrelevant
> because the files in questions don't have names anymore.

Right.

> On the other hand, I'm not sure how to use fstat(1) to
> identify files with link count zero ...  I'm looking at
> the manpage, but maybe it's just too late in the evening.
> What command line would you suggest, exactly?  At least
> it doesn't seem to be as easy as "lsof +L1".

Well, I am believer in a Unix way - each tool for its own small job, combine the
tools to get a big job done.  One tool that does all with a million obscure
options does not appeal to me.  But that's me.

And in this particular case what you ask is irrelevant.
We just need to find all processes having opened files on a particular filesystem.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 18:55:05 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2DFCE1065695
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 18:55:05 +0000 (UTC)
	(envelope-from torbjoern@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 9E5AB8FC36
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 18:55:04 +0000 (UTC)
Received: by bwz15 with SMTP id 15so2166078bwz.13
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 11:55:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=Y6/wjsINFVgE26fthiBfoxxXxPmrPOlO26HFd6ie9/o=;
	b=xbXJ7YAPmd8MtY7qdKdp+4RX9K5GNLahFChupuEtASuAlUIPiCUEt/6hQkXQLVvR7V
	tJreIjwehCIRevZBKhxDgqgGU5Njn9+w3i+08LxRT+VC60JkeYbWnm06TS8iyDXeaMb8
	OBG0mr/vGVqrfr+VumwpokpMiS3JSi3qhVf1A=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=sUqVae0Qh5Y3ODu1e20SFoAuUzUvkOhfDBEuRdWKhdhL8YlVbsJvvt1UPKhQosS/jb
	xND2IWWl8m6PSY4u3BY5h/X1NcI+1vLP4SIO3DFP6fGRpEhTaroqD7VCPlN0prxBlBix
	3KawK+L/cRcaV16n9/infRjI/UyMPco0ZGHmc=
MIME-Version: 1.0
Received: by 10.204.85.90 with SMTP id n26mr3155922bkl.109.1285872903016; Thu,
	30 Sep 2010 11:55:03 -0700 (PDT)
Received: by 10.204.71.138 with HTTP; Thu, 30 Sep 2010 11:55:02 -0700 (PDT)
In-Reply-To: <4CA4C77A.2030807@icyb.net.ua>
References: <201009301707.o8UH7xAs026168@lurza.secnetix.de>
	<4CA4C77A.2030807@icyb.net.ua>
Date: Thu, 30 Sep 2010 20:55:02 +0200
Message-ID: <AANLkTikrf1V80gawDnAShqFW79zwkVzTKZEdKt_N9kgV@mail.gmail.com>
From: Torbjorn Kristoffersen <torbjoern@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 18:55:05 -0000

On Thu, Sep 30, 2010 at 7:23 PM, Andriy Gapon <avg@icyb.net.ua> wrote:
> on 30/09/2010 20:07 Oliver Fromme said the following:
>> Last time I had a try at fstat(1), it wasn't able to print
>> actual file names, while lsof was able to do it. =A0That's
>> why I generally prefer lsof over fstat(1). =A0For most of my
>> needs fstat(1) is useless if it can't display file names.
>> (I think DragonFly's fstat(1) can do it, FWIW.)
>
> Point taken.
> However fstat still does print inode numbers.

Here's some news, I finally found a file in a user's .spamassassin director=
y.

$ ls -l .spamassassin/
total 39877936
-rw-------  1 gg  gg      76546048 Sep 30 01:13 auto-whitelist
-rw-------  1 gg  gg            48 Sep 30 01:51 bayes.lock
-rw-------  1 gg  gg      20840448 Sep 30 01:13 bayes_seen
----------  1 gg  gg  552902721536 Sep 30 01:52 temp
-rw-------  1 gg  gg          1573 Sep 30 07:51 user_prefs


Now that is an incredibly huge (and invalid) file! Something like
514GB, far more than the size of this ZFS filesystem.
I removed it, and there was no visible effect in df.  Some funny
business must be happening with spamassassin though,
otherwise this strange file would not be so huge.

I then checked the entire filesystem for files that show up as very
large in 'ls', but freak-sized files came up.

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 20:52:14 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D61D71065673;
	Thu, 30 Sep 2010 20:52:14 +0000 (UTC)
	(envelope-from arundel@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id AC7DA8FC0A;
	Thu, 30 Sep 2010 20:52:14 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8UKqENw055347;
	Thu, 30 Sep 2010 20:52:14 GMT
	(envelope-from arundel@freefall.freebsd.org)
Received: (from arundel@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8UKqD9W055342;
	Thu, 30 Sep 2010 20:52:14 GMT (envelope-from arundel)
Date: Thu, 30 Sep 2010 20:52:14 GMT
Message-Id: <201009302052.o8UKqD9W055342@freefall.freebsd.org>
To: postmaster@uni-bielefeld.de, arundel@FreeBSD.org, freebsd-fs@FreeBSD.org
From: arundel@FreeBSD.org
Cc: 
Subject: Re: kern/115645: [ffs] [snapshots] [panic] lockmgr: thread
	0xc4c00d80, not exclusive lock holder 0xc4dd7c00 unlocking
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 20:52:14 -0000

Synopsis: [ffs] [snapshots] [panic] lockmgr: thread 0xc4c00d80, not exclusive lock holder 0xc4dd7c00 unlocking

State-Changed-From-To: open->feedback
State-Changed-By: arundel
State-Changed-When: Thu Sep 30 20:50:12 UTC 2010
State-Changed-Why: 
Can you still reproduce this PR with a more recent 6.X or 7.X release?
Please note that the RELENG_6 went EoL a few weeks ago.

http://www.freebsd.org/cgi/query-pr.cgi?pr=115645

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 22:15:26 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2712C106566B
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 22:15:26 +0000 (UTC)
	(envelope-from fbsd@dannysplace.net)
Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184])
	by mx1.freebsd.org (Postfix) with ESMTP id E626C8FC24
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 22:15:25 +0000 (UTC)
Received: from [203.206.171.212] (helo=[192.168.10.10])
	by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <fbsd@dannysplace.net>)
	id 1P1RPs-0001EC-1m
	for freebsd-fs@freebsd.org; Fri, 01 Oct 2010 08:15:57 +1000
Message-ID: <4CA50BF1.60503@dannysplace.net>
Date: Fri, 01 Oct 2010 08:15:13 +1000
From: Danny Carroll <fbsd@dannysplace.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4CA45444.6070002@dannysplace.net>	<201009301438.o8UEckoY019473@lurza.secnetix.de>	<20100930144845.GA19926@icarus.home.lan>	<4CA4B12B.7050307@icyb.net.ua>	<AANLkTinHoxX4MfVCEB2rrdcS1ubwQp+c37uP2BcP2Crm@mail.gmail.com>
	<AANLkTimsSpP4nCE18H+QJCS1iKqw-wmoUdCc1OdU1tM2@mail.gmail.com>
In-Reply-To: <AANLkTimsSpP4nCE18H+QJCS1iKqw-wmoUdCc1OdU1tM2@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Authenticated-User: danny
X-Authenticator: plain
X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29)
X-Date: 2010-10-01 08:15:52
X-Connected-IP: 203.206.171.212:51791
X-Message-Linecount: 37
X-Body-Linecount: 23
X-Message-Size: 1795
X-Body-Size: 934
X-Received-Count: 1
X-Recipient-Count: 1
X-Local-Recipient-Count: 1
X-Local-Recipient-Defer-Count: 0
X-Local-Recipient-Fail-Count: 0
X-SA-Exim-Connect-IP: 203.206.171.212
X-SA-Exim-Mail-From: fbsd@dannysplace.net
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	damka.dannysplace.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-SA-Exim-Version: 4.2
X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net)
Subject: Re: Fwd: Strange ZFS problem,
 filesystem claims to be full when clearly not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: fbsd@dannysplace.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 22:15:26 -0000

 On 1/10/2010 3:08 AM, Torbjorn Kristoffersen wrote:
>
> If the process causing this is gone, or is working correctly (seeing
> that the fs is no longer growing, I hope),
> can dead unlinked files still remain, is there a way to purge them?

I can't remember exactly what happens and it's probably different for
each flavour of unix and *nux.
If a file is deleted, then the directory entry for the inode is
de-linked.   If it's the last link to that inode then usually that inode
is freed.

But when a process holds a handle to a file when it's deleted, then the
reclaim does not happen AFAIK until *after* the file handle is closed.

<speculation>
I wonder what happens when, if a file handle is opened for writing,
someone else comes along and truncates the file.  
Perhaps a the seek position of the open handle is reset to 0, or perhaps
(not likely) a write operation after truncation would result in an error.
</speculation>

-D

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 30 23:31:11 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0E0A8106566B
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 23:31:11 +0000 (UTC)
	(envelope-from markus.gebert@hostpoint.ch)
Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124])
	by mx1.freebsd.org (Postfix) with ESMTP id BDD8B8FC0A
	for <freebsd-fs@freebsd.org>; Thu, 30 Sep 2010 23:31:10 +0000 (UTC)
Received: from 46-127-29-79.dclient.hispeed.ch ([46.127.29.79]:36814
	helo=[172.16.1.3])
	by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69 (FreeBSD)) (envelope-from <markus.gebert@hostpoint.ch>)
	id 1P1Saj-00074T-6Z; Fri, 01 Oct 2010 01:31:09 +0200
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Markus Gebert <markus.gebert@hostpoint.ch>
In-Reply-To: <4CA50BF1.60503@dannysplace.net>
Date: Fri, 1 Oct 2010 01:31:08 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <8762A442-5027-48E4-B51F-73F29658CA2F@hostpoint.ch>
References: <4CA45444.6070002@dannysplace.net>	<201009301438.o8UEckoY019473@lurza.secnetix.de>	<20100930144845.GA19926@icarus.home.lan>	<4CA4B12B.7050307@icyb.net.ua>	<AANLkTinHoxX4MfVCEB2rrdcS1ubwQp+c37uP2BcP2Crm@mail.gmail.com>
	<AANLkTimsSpP4nCE18H+QJCS1iKqw-wmoUdCc1OdU1tM2@mail.gmail.com>
	<4CA50BF1.60503@dannysplace.net>
To: fbsd@dannysplace.net
X-Mailer: Apple Mail (2.1081)
Cc: freebsd-fs@freebsd.org
Subject: Re: Strange ZFS problem,
	filesystem claims to be full when clearly not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 23:31:11 -0000


On 01.10.2010, at 00:15, Danny Carroll wrote:

>>=20
>> If the process causing this is gone, or is working correctly (seeing
>> that the fs is no longer growing, I hope),
>> can dead unlinked files still remain, is there a way to purge them?
>=20
> I can't remember exactly what happens and it's probably different for
> each flavour of unix and *nux.
> If a file is deleted, then the directory entry for the inode is
> de-linked.   If it's the last link to that inode then usually that =
inode
> is freed.
>=20
> But when a process holds a handle to a file when it's deleted, then =
the
> reclaim does not happen AFAIK until *after* the file handle is closed.
>=20
> <speculation>
> I wonder what happens when, if a file handle is opened for writing,
> someone else comes along and truncates the file. =20
> Perhaps a the seek position of the open handle is reset to 0, or =
perhaps
> (not likely) a write operation after truncation would result in an =
error.
> </speculation>

AFAIK the file handle offset won't get reset to anything unless O_APPEND =
was used to open the file (maybe there are other special cases). In =
either case, the write will _not_ fail due to an offset beyond EOF, =
instead a hole is created and the new data gets written after that. (see =
man lseek(2))

The hole won't use disk space (as shown by df or zfs list), but is =
considered part of the file size (as shown by ls). In other words, =
truncating might free disk space, no matter what offsets other =
filehandles have.

However I don't see the point here. If the OP knows the file, he may as =
well delete it to free disk space. If he doesn't, or it's inaccessible =
(deleted but referenced), truncating isn't an option.


Markus


From owner-freebsd-fs@FreeBSD.ORG  Fri Oct  1 00:49:03 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 838831065694;
	Fri,  1 Oct 2010 00:49:03 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 5A23B8FC15;
	Fri,  1 Oct 2010 00:49:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o910n3G6094328;
	Fri, 1 Oct 2010 00:49:03 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o910n38N094324;
	Fri, 1 Oct 2010 00:49:03 GMT (envelope-from linimon)
Date: Fri, 1 Oct 2010 00:49:03 GMT
Message-Id: <201010010049.o910n38N094324@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/151082: [zfs] [patch] sappend-flaged files on ZFS not
	working correctly
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 00:49:03 -0000

Old Synopsis: [patch] sappend-flaged files on ZFS not working correctly
New Synopsis: [zfs] [patch] sappend-flaged files on ZFS not working correctly

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Fri Oct 1 00:48:32 UTC 2010
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=151082

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct  1 00:55:05 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A72091065679
	for <freebsd-fs@freebsd.org>; Fri,  1 Oct 2010 00:55:05 +0000 (UTC)
	(envelope-from fbsd@dannysplace.net)
Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 6C5508FC13
	for <freebsd-fs@freebsd.org>; Fri,  1 Oct 2010 00:55:05 +0000 (UTC)
Received: from [203.206.171.212] (helo=[192.168.10.10])
	by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <fbsd@dannysplace.net>)
	id 1P1TuQ-00064T-Gp; Fri, 01 Oct 2010 10:55:36 +1000
Message-ID: <4CA5315F.70105@dannysplace.net>
Date: Fri, 01 Oct 2010 10:54:55 +1000
From: Danny Carroll <fbsd@dannysplace.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Markus Gebert <markus.gebert@hostpoint.ch>
References: <4CA45444.6070002@dannysplace.net>	<201009301438.o8UEckoY019473@lurza.secnetix.de>	<20100930144845.GA19926@icarus.home.lan>	<4CA4B12B.7050307@icyb.net.ua>	<AANLkTinHoxX4MfVCEB2rrdcS1ubwQp+c37uP2BcP2Crm@mail.gmail.com>
	<AANLkTimsSpP4nCE18H+QJCS1iKqw-wmoUdCc1OdU1tM2@mail.gmail.com>
	<4CA50BF1.60503@dannysplace.net>
	<8762A442-5027-48E4-B51F-73F29658CA2F@hostpoint.ch>
In-Reply-To: <8762A442-5027-48E4-B51F-73F29658CA2F@hostpoint.ch>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Authenticated-User: danny
X-Authenticator: plain
X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29)
X-Date: 2010-10-01 10:55:34
X-Connected-IP: 203.206.171.212:65454
X-Message-Linecount: 22
X-Body-Linecount: 7
X-Message-Size: 1336
X-Body-Size: 359
X-Received-Count: 1
X-Recipient-Count: 2
X-Local-Recipient-Count: 2
X-Local-Recipient-Defer-Count: 0
X-Local-Recipient-Fail-Count: 0
X-SA-Exim-Connect-IP: 203.206.171.212
X-SA-Exim-Mail-From: fbsd@dannysplace.net
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	damka.dannysplace.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-SA-Exim-Version: 4.2
X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net)
Cc: freebsd-fs@freebsd.org
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: fbsd@dannysplace.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 00:55:05 -0000

On 1/10/2010 9:31 AM, Markus Gebert wrote:
> However I don't see the point here. If the OP knows the file, he may as well delete it to free disk space. If he doesn't, or it's inaccessible (deleted but referenced), truncating isn't an option.

Yeah.   I was just thinking about what might happen in certain
situations.   Definitely not relevant to the OP.

-D

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct  1 04:25:42 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63A051065673
	for <freebsd-fs@freebsd.org>; Fri,  1 Oct 2010 04:25:42 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 315D78FC13
	for <freebsd-fs@freebsd.org>; Fri,  1 Oct 2010 04:25:42 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD))
	id 1P1XBk-0009QM-CQ; Fri, 01 Oct 2010 00:25:40 -0400
Date: Fri, 1 Oct 2010 00:25:40 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Alexander Leidinger <Alexander@Leidinger.net>
Message-ID: <20101001042540.GA48601@in-addr.com>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
	<20100929221549.GA343@icarus.home.lan>
	<20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
	<4CA45444.6070002@dannysplace.net>
	<AANLkTi=x5irhAM8uhiZJLztE230=Q9CAMDeja=Bo4fVL@mail.gmail.com>
	<20100930163406.330767vpzidjygow@webmail.leidinger.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100930163406.330767vpzidjygow@webmail.leidinger.net>
Cc: freebsd-fs@freebsd.org
Subject: Re: Strange ZFS problem,
	filesystem claims to be full when clearly not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 04:25:42 -0000

On Thu, Sep 30, 2010 at 04:34:06PM +0200, Alexander Leidinger wrote:
> Quoting Torbjorn Kristoffersen <torbjoern@gmail.com> (from Thu, 30 Sep  
> 2010 15:28:25 +0200):
> 
> >That could very well be.  Interestingly, dtrace is not installed and
> >doesn't even load.  When I do
> >kldload dtraceall it says:
> >
> >    kldload: can't load dtraceall: Exec format error
> >
> >??Perhaps I should recompile the kernel on this server, and build in
> >Dtrace into the kernel.  Perhaps I should first update to
> >FreeBSD-STABLE, as it is more cutting edge?
> >
> >Actually, I'll first do a complete backup of this jail, remove the zfs
> >filesystem, then re-create it, put the files back, and see what
> >happens.  The unfortunate thing is that I will be ruining a chance to
> >find out what really happened.
> 
> I would give lsof a try first. Installing it from ports or packages is  
> not as much time consuming as updating the server, and may pinpoint  
> the problem.
> 
> Bye,
> Alexander.

It might be worth running

ktrace -C

as root.  I do not believe ktrace output files show up in lsof or fstat.
It seems at least theoretically possible that a ktrace output file has
been deleted so it no longer shows up in ls/du but the trace is ongoing.

Regards,

Gary

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct  1 07:03:54 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D7A71065670
	for <freebsd-fs@FreeBSD.ORG>; Fri,  1 Oct 2010 07:03:54 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2])
	by mx1.freebsd.org (Postfix) with ESMTP id D154A8FC13
	for <freebsd-fs@FreeBSD.ORG>; Fri,  1 Oct 2010 07:03:53 +0000 (UTC)
Received: from lurza.secnetix.de (localhost [127.0.0.1])
	by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o9173blT060480;
	Fri, 1 Oct 2010 09:03:52 +0200 (CEST)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o9173bgq060479;
	Fri, 1 Oct 2010 09:03:37 +0200 (CEST) (envelope-from olli)
Date: Fri, 1 Oct 2010 09:03:37 +0200 (CEST)
Message-Id: <201010010703.o9173bgq060479@lurza.secnetix.de>
From: Oliver Fromme <olli@lurza.secnetix.de>
To: freebsd-fs@FreeBSD.ORG, torbjoern@gmail.com
In-Reply-To: <AANLkTikrf1V80gawDnAShqFW79zwkVzTKZEdKt_N9kgV@mail.gmail.com>
X-Newsgroups: list.freebsd-fs
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX)
	(FreeBSD/6.4-PRERELEASE-20080904 (i386))
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5
	(lurza.secnetix.de [127.0.0.1]);
	Fri, 01 Oct 2010 09:03:52 +0200 (CEST)
Cc: 
Subject: Re: Strange ZFS problem,
	filesystem claims to be full when clearly  not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: freebsd-fs@FreeBSD.ORG, torbjoern@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 07:03:54 -0000

Torbjorn Kristoffersen wrote:
 > Here's some news, I finally found a file in a user's .spamassassin directory.
 > 
 > $ ls -l .spamassassin/
 > total 39877936
 > -rw-------  1 gg  gg      76546048 Sep 30 01:13 auto-whitelist
 > -rw-------  1 gg  gg            48 Sep 30 01:51 bayes.lock
 > -rw-------  1 gg  gg      20840448 Sep 30 01:13 bayes_seen
 > ----------  1 gg  gg  552902721536 Sep 30 01:52 temp
 > -rw-------  1 gg  gg          1573 Sep 30 07:51 user_prefs
 > 
 > 
 > Now that is an incredibly huge (and invalid) file! Something like
 > 514GB, far more than the size of this ZFS filesystem.
 > I removed it, and there was no visible effect in df.  Some funny
 > business must be happening with spamassassin though,
 > otherwise this strange file would not be so huge.

Probably a so-called sparse file, i.e. a file with "holes"
that don't actually occupy disk space.  "ls -ls" will print
the number of blocks actually allocated to the file on disk.

I once wrote a script that calculates the "sparseness" of
files.  It's designed for UFS/UFS2.  The output will be
inaccurate for ZFS, but it should still give a rough number.

http://www.secnetix.de/olli/scripts/sparsecheck

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch�ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M�n-
chen, HRB 125758,  Gesch�ftsf�hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"I invented Ctrl-Alt-Delete, but Bill Gates made it famous."
        -- David Bradley, original IBM PC design team

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct  1 07:41:14 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 355CF106564A
	for <freebsd-fs@freebsd.org>; Fri,  1 Oct 2010 07:41:14 +0000 (UTC)
	(envelope-from torbjoern@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id B062A8FC08
	for <freebsd-fs@freebsd.org>; Fri,  1 Oct 2010 07:41:13 +0000 (UTC)
Received: by bwz15 with SMTP id 15so2640939bwz.13
	for <freebsd-fs@freebsd.org>; Fri, 01 Oct 2010 00:41:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=X1kVRmhfrQMJ8KswUQoqdhcXu7cM4wnwl8DOTdyub8E=;
	b=lZtVhXuuz3VaTS794CkwxCriJwHMcCkMbNPJvYrKL8hmqF/Q23il7RjVR1ohyHoV5t
	aOcEzztHxASJ4u5QmaFIbQwqvxP8DPYWiI0MU+Wv1mzkoHVWbjrLEncRN9Dktem+Zh31
	vWBFgewU42bO5PtjaR4py0NvElFIPa8MDMxyA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=D+lAiC1yjQsADksCVv/EvuAaYjw8Pwb6IljB535XOzv7MQuPnR8HXFMe8Nsvivcz/T
	HUbHHR6EsEA/VOA40ZMeQ3IhqIZBnCHbFTMAmhKr/p1RASUBImMUmeHl122M8DKHFLPd
	I85xoTITBqS87AA09o8DwO3U0/EoqCy0OuqUg=
MIME-Version: 1.0
Received: by 10.204.117.13 with SMTP id o13mr3722797bkq.48.1285918872481; Fri,
	01 Oct 2010 00:41:12 -0700 (PDT)
Received: by 10.204.71.138 with HTTP; Fri, 1 Oct 2010 00:41:12 -0700 (PDT)
In-Reply-To: <20101001042540.GA48601@in-addr.com>
References: <AANLkTimRmBi=th1oia5ZuKcEtLR+YjK04KNYeZhu931A@mail.gmail.com>
	<20100929192534.GA97031@icarus.home.lan>
	<AANLkTi=q6adZv57mwNZVivOwLsfXBjVHki7tzP6-jD0G@mail.gmail.com>
	<AANLkTikqX1Y3qbcdr-2six+Pwv61k-Exwh142w5FFqbS@mail.gmail.com>
	<20100929221549.GA343@icarus.home.lan>
	<20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net>
	<4CA45444.6070002@dannysplace.net>
	<AANLkTi=x5irhAM8uhiZJLztE230=Q9CAMDeja=Bo4fVL@mail.gmail.com>
	<20100930163406.330767vpzidjygow@webmail.leidinger.net>
	<20101001042540.GA48601@in-addr.com>
Date: Fri, 1 Oct 2010 09:41:12 +0200
Message-ID: <AANLkTi=q0R4MdD+ho6LRzS_+XK1pRJyBPA3gnJGSnqHx@mail.gmail.com>
From: Torbjorn Kristoffersen <torbjoern@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly
 not full
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 07:41:14 -0000

On Fri, Oct 1, 2010 at 6:25 AM, Gary Palmer <gpalmer@freebsd.org> wrote:
> On Thu, Sep 30, 2010 at 04:34:06PM +0200, Alexander Leidinger wrote:
>> Quoting Torbjorn Kristoffersen <torbjoern@gmail.com> (from Thu, 30 Sep
>> 2010 15:28:25 +0200):
>>
>> >That could very well be. =A0Interestingly, dtrace is not installed and
>> >doesn't even load. =A0When I do
>> >kldload dtraceall it says:
>> >
>> > =A0 =A0kldload: can't load dtraceall: Exec format error
>> >
>> >??Perhaps I should recompile the kernel on this server, and build in
>> >Dtrace into the kernel. =A0Perhaps I should first update to
>> >FreeBSD-STABLE, as it is more cutting edge?
>> >
>> >Actually, I'll first do a complete backup of this jail, remove the zfs
>> >filesystem, then re-create it, put the files back, and see what
>> >happens. =A0The unfortunate thing is that I will be ruining a chance to
>> >find out what really happened.
>>
>> I would give lsof a try first. Installing it from ports or packages is
>> not as much time consuming as updating the server, and may pinpoint
>> the problem.
>>
>> Bye,
>> Alexander.
>
> It might be worth running
>
> ktrace -C
>
> as root. =A0I do not believe ktrace output files show up in lsof or fstat=
.
> It seems at least theoretically possible that a ktrace output file has
> been deleted so it no longer shows up in ls/du but the trace is ongoing.
>

Very unlikely as I'm the only admin outside the jails.  But I did it
regardless, you never know if I started
a ktrace in my sleep :-)

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct  1 16:42:09 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C31831065674
	for <fs@freebsd.org>; Fri,  1 Oct 2010 16:42:09 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id EB4AE8FC19
	for <fs@freebsd.org>; Fri,  1 Oct 2010 16:42:08 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA15026;
	Fri, 01 Oct 2010 19:42:06 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4CA60F5D.2070308@icyb.net.ua>
Date: Fri, 01 Oct 2010 19:42:05 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4CA1D06C.9050305@digiware.nl>
	<20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan>
	<4CA1DDE9.8090107@icyb.net.ua>
	<20100928132355.GA63149@icarus.home.lan>
	<4CA1EF69.4040402@icyb.net.ua>
	<FE116FEC-714D-4BF5-86D8-E29BFA713C69@wanderview.com>
	<4CA21809.7090504@icyb.net.ua>
	<71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com>
	<4CA22337.2010900@icyb.net.ua>
	<20100928181327.GS43070@deviant.kiev.zoral.com.ua>
In-Reply-To: <20100928181327.GS43070@deviant.kiev.zoral.com.ua>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=KOI8-U
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Still getting kmem exhausted panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 16:42:09 -0000

on 28/09/2010 21:13 Kostik Belousov said the following:
> On Tue, Sep 28, 2010 at 08:17:43PM +0300, Andriy Gapon wrote:
>> ARC is a ZFS private cache.
>> ZFS doesn't use unified buffer/page cache.
>> So ARC is not directly affected by pagedaemon.
>> But this is not exactly VFS layer thing.
> As a pure speculation, unbacked by any code reasing or understanding
> of the principles. Can ARC be changed to use some custom vm pager
> instead of managing memory on its own. As I understand it, ARC
> uses wired kernel mappings right now.

Yes, ARC uses malloc(9) and/or uma(9).

> If it starts using managed pages backed by a new pager, then pagedaemon
> might take actual decisions on the cache shrink by putting and reclaiming
> pages. Does ARC has some `active' count for the caching unit ? It might be
> translated to the active count for the page etc.

Well, not sure if I'd like pagedaemon directly poking ARC state.
ARC seems to be a little bit more sophisticated than pagedaemon.
One could argue that ARC is a distinctive and important feature of ZFS.
My understanding is that ARC buffer state is determined by a "group" to which it
belongs (MRU, MFU, ghost variations of those) and last access time.  With each new
access a buffer can be moved to a different group.  And when a buffer replacement
is needed, then that state plays a role in deciding which buffers to re-use.
Ditto when ARC size needs to be reduced.
Also worth noting that not always data in ARC has an associated vnode, at least
that's my understanding.

I had another crazy idea that is opposite to yours.
ARC keeps using wired pages, but there is a special pager of some kind and ARC's
pages get inserted into vnode's object when needed.  pagedaemon wouldn't manage
those pages as it does with normal ones, but instead would "hint" to ARC what is
active and what is not and then ARC would do its "smart thing".
But I think in this case we would need some way to steal pages from object when
ARC decides that it really needs to shed some pages.
This wouldn't solve a problem of double-management of memory, but at least it
could try to solve a problem of double-caching.

On the other hand, perhaps some people would find useful ZFS without ARC but with
integration into VM, if that variant retains most of features of ZFS and provides
some benefits in terms of resource usage and/or performance.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 11:50:51 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C71101065670
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 11:50:51 +0000 (UTC)
	(envelope-from phanquochien@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 5F7568FC14
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 11:50:50 +0000 (UTC)
Received: by wwb17 with SMTP id 17so5047175wwb.31
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 04:50:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=C4m5CcyKTR8XaMNdlynb9jertKH+g9OsrDeKRPVeRwo=;
	b=Z6GuUHBg9mQADzlj/Zg2nmkM14xgy0kYwEF48jsN/CqOauIn/xMuuXmAkFSiojU5p3
	mjyMvlfK/a0hhu1L1ZY+YKaPixw1tT2q2LmEpFDS3QKLOXZR1ZNBdNP/5qij1bmmZ9Z1
	P3ClXPosFzTQzblYXFoYDpl2PgG8mK3Npcjgc=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=j+SL6QOEXqQFEYy2vy2/5p0EVcosTC4+s712YZenVcklfvaYYOYFH9RcoKT8FmQm7Z
	FPMgtBejuI61Tg+Dv1fmU1EMva+52vdOvcl+zWSv4cg4NRNySK1L39IDJfXs0YgSWKIv
	/4v814ZWChnJBaJ5tpxQCh1ScNqP5w27M2qDo=
MIME-Version: 1.0
Received: by 10.227.134.144 with SMTP id j16mr5843309wbt.50.1286018939947;
	Sat, 02 Oct 2010 04:28:59 -0700 (PDT)
Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 04:28:59 -0700 (PDT)
Date: Sat, 2 Oct 2010 18:28:59 +0700
Message-ID: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
From: Phan Quoc Hien <phanquochien@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 11:50:51 -0000

hello everybody.
I'm new to freebsd, When I hard shutdown my freebsd box..it caused lost some
file. I used UFS2. How can prevent that? or recovery my file?
Thanks!

-- 
Best regards,
Mr.Hien
E-mail: phanquochien@gmail.com
Website: www.mrhien.info

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:15:12 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 14BC3106564A
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:15:12 +0000 (UTC)
	(envelope-from ronald-freebsd8@klop.yi.org)
Received: from smtp-out3.tiscali.nl (smtp-out3.tiscali.nl [195.241.79.178])
	by mx1.freebsd.org (Postfix) with ESMTP id C9DF28FC14
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:15:11 +0000 (UTC)
Received: from [212.123.145.58] (helo=sjakie.klop.ws)
	by smtp-out3.tiscali.nl with esmtp (Exim)
	(envelope-from <ronald-freebsd8@klop.yi.org>) id 1P20no-0006rF-TI
	for freebsd-fs@freebsd.org; Sat, 02 Oct 2010 14:02:56 +0200
Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1])
	by sjakie.klop.ws (Postfix) with ESMTP id BE8D1424F
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:02:54 +0200 (CEST)
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
To: freebsd-fs@freebsd.org
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
Date: Sat, 02 Oct 2010 14:02:54 +0200
MIME-Version: 1.0
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
In-Reply-To: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
User-Agent: Opera Mail/10.62 (FreeBSD)
Content-Transfer-Encoding: quoted-printable
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:15:12 -0000

On Sat, 02 Oct 2010 13:28:59 +0200, Phan Quoc Hien =20
<phanquochien@gmail.com> wrote:

> hello everybody.
> I'm new to freebsd, When I hard shutdown my freebsd box..it caused lost=
 =20
> some
> file. I used UFS2. How can prevent that? or recovery my file?
> Thanks!
>

By hard shutdown you mean pulling the power plug?

UFS2 (and most other filesystems on other operating systems) guarantee =20
consistency of metadata (filenames, directory structures, etc.) after a =20
crash. However it is possible to loose the last X seconds of unwritten =20
data. That can be the complete contents of a new file.

If it is really important you can mount your filesystem 'sync' see 'man =20
mount' in which case it will become slow, but more up-to-date.

Ronald.

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:20:06 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7F6891065674
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:20:06 +0000 (UTC)
	(envelope-from phanquochien@gmail.com)
Received: from mail-ww0-f42.google.com (mail-ww0-f42.google.com [74.125.82.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 3E2728FC08
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:20:00 +0000 (UTC)
Received: by wwi18 with SMTP id 18so64542wwi.1
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 05:19:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=xrXqo/x4RIwnxLrxzvMxRnWP/uQFyIGPSj972XIgB3I=;
	b=YPWAiM74Wp6mTiZ7Op++r1IV9pTDmnXqRWFi9bnC5QaBWT6PWvOj63sqHAyoyWJoXL
	5G8mmH0Mv9O/Hq9p9ZMRpl622rULPfPSlw1OiIJviTX7xteYNNUz8NBH90EuPb8K957q
	4s9DpMWeXOu6qu2xsGztpbrTFUd2TeH79nCwE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=m/WgWQgtTPVmLOidFS1Zdmh8jdZzvW6FNauVrU3cjK2T1e2q0U94t2v2fgNnZRDfkx
	u+8ZmNTIH32D5xE58jXqlFh/oIlR0AhNYiuJDGy+ch1YBlzXiEU0DsykO4msMZnU14wv
	3UhcnLj103JjCli+G1vFZLjfhB+DV4k3vnNOk=
MIME-Version: 1.0
Received: by 10.227.137.15 with SMTP id u15mr5885391wbt.129.1286021952810;
	Sat, 02 Oct 2010 05:19:12 -0700 (PDT)
Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 05:19:12 -0700 (PDT)
In-Reply-To: <op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
Date: Sat, 2 Oct 2010 19:19:12 +0700
Message-ID: <AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
From: Phan Quoc Hien <phanquochien@gmail.com>
To: Ronald Klop <ronald-freebsd8@klop.yi.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:20:06 -0000

Thanks for your respond.!
Yes. I pulled the power plug .
I edited rc.conf and save it then pulling the power plug. And system boot
next time rc.conf is a blank file...!


On Sat, Oct 2, 2010 at 7:02 PM, Ronald Klop <ronald-freebsd8@klop.yi.org>wrote:

> On Sat, 02 Oct 2010 13:28:59 +0200, Phan Quoc Hien <phanquochien@gmail.com>
> wrote:
>
>  hello everybody.
>> I'm new to freebsd, When I hard shutdown my freebsd box..it caused lost
>> some
>> file. I used UFS2. How can prevent that? or recovery my file?
>> Thanks!
>>
>>
> By hard shutdown you mean pulling the power plug?
>
> UFS2 (and most other filesystems on other operating systems) guarantee
> consistency of metadata (filenames, directory structures, etc.) after a
> crash. However it is possible to loose the last X seconds of unwritten data.
> That can be the complete contents of a new file.
>
> If it is really important you can mount your filesystem 'sync' see 'man
> mount' in which case it will become slow, but more up-to-date.
>
> Ronald.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


-- 
Best regards,
Mr.Hien
E-mail: phanquochien@gmail.com
Website: www.mrhien.info

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:20:58 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BDFF21065673
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:20:58 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 4EDC58FC14
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:20:57 +0000 (UTC)
Received: by fxm9 with SMTP id 9so3272503fxm.13
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 05:20:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:from:to:subject:date
	:message-id:user-agent:mime-version:content-type;
	bh=Rv1UxpXhIk1+YoVWCmSgiyy9MRK4ZWbyawBKD4LvuZI=;
	b=tl2rbxgHDdd/24IxeYDJ4/Bku5tPPxca00PwJ1iahS9Bpxg7cTZ93BFIKKnqgtGEGi
	DQGKczEr8tf3jz7c/Vpqnmwdeb4OhsfCHXWsDJ2iRnpEUYv0i9DAXMR3rKfLRu+sYXqb
	A0RMIlnZL/zOcHBcIEEQtcNYvL/nhNXBuJp2U=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:subject:date:message-id:user-agent:mime-version
	:content-type;
	b=WINb/0ytegrnguDI7dQwlEiK+F2C5SUX1kpuhpF8pTzx3ZwmQERQGWkwqcwY1CYOoO
	6ZA/pipsf76UUi6S9HVbv0/vRRrqlgwzqEMkoQpGI+HR9xcnVKkeWA3fqWrf8rxpwmUa
	qd2Y1tecVaNyWDkAT/FJmRDk47cSbhWTPZfX0=
Received: by 10.223.125.70 with SMTP id x6mr6497403far.85.1286022056865;
	Sat, 02 Oct 2010 05:20:56 -0700 (PDT)
Received: from localhost ([95.69.162.97])
	by mx.google.com with ESMTPS id a6sm1165582faa.20.2010.10.02.05.20.55
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sat, 02 Oct 2010 05:20:56 -0700 (PDT)
From: Mikolaj Golub <to.my.trociny@gmail.com>
To: freebsd-fs@freebsd.org
Date: Sat, 02 Oct 2010 15:20:58 +0300
Message-ID: <86hbh44wgl.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Subject: hastd: assertion (res->hr_event != NULL) fails in secondary on
	split-brain
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:20:58 -0000

--=-=-=

Hi,

After recent changes in hastd (I think r213006: Fix descriptor leaks) if
split-brain occurs hastd will abort in child_cleanup() on assertion
(res->hr_event != NULL).

Oct  2 17:24:17 lolek hastd[39334]: [storage] (init) Role changed to secondary.
Oct  2 17:24:17 lolek hastd[39334]: Accepting connection to tcp4://0.0.0.0:8457.
Oct  2 17:24:17 lolek hastd[39334]: Connection from tcp4://172.20.68.12:17367 to tcp4://172.20.68.11:8457.
Oct  2 17:24:17 lolek hastd[39334]: tcp4://172.20.68.12:17367: resource=storage
Oct  2 17:24:17 lolek hastd[39334]: [storage] (secondary) Initial connection from tcp4://172.20.68.12:17367.
Oct  2 17:24:17 lolek hastd[39334]: [storage] (secondary) Incoming connection from tcp4://172.20.68.12:17367 configured.
Oct  2 17:24:17 lolek hastd[39334]: Accepting connection to tcp4://0.0.0.0:8457.
Oct  2 17:24:17 lolek hastd[39334]: Connection from tcp4://172.20.68.12:13769 to tcp4://172.20.68.11:8457.
Oct  2 17:24:17 lolek hastd[39334]: tcp4://172.20.68.12:13769: resource=storage
Oct  2 17:24:17 lolek hastd[39334]: [storage] (secondary) Outgoing connection to tcp4://172.20.68.12:13769 configured.
Oct  2 17:24:17 lolek hastd[39339]: [storage] (secondary) Obtained info about /dev/ad4.
Oct  2 17:24:17 lolek hastd[39339]: [storage] (secondary) Locked /dev/ad4.
Oct  2 17:24:17 lolek hastd[39339]: [storage] (secondary) Split-brain detected, exiting.
Oct  2 17:24:17 lolek hastd[39334]: Unable to receive event header: Socket is not connected.
Oct  2 17:24:28 lolek hastd[39334]: Accepting connection to tcp4://0.0.0.0:8457.
Oct  2 17:24:28 lolek hastd[39334]: Connection from tcp4://172.20.68.12:59760 to tcp4://172.20.68.11:8457.
Oct  2 17:24:28 lolek hastd[39334]: tcp4://172.20.68.12:59760: resource=storage
Oct  2 17:24:28 lolek hastd[39334]: [storage] (secondary) Initial connection from tcp4://172.20.68.12:59760.
Oct  2 17:24:28 lolek hastd[39334]: [storage] (secondary) Worker process exists (pid=39339), stopping it.
Oct  2 17:24:28 lolek hastd[39334]: [storage] (secondary) Worker process exited ungracefully (pid=39339, exitcode=78).
Oct  2 17:24:28 lolek kernel: pid 39334 (hastd), uid 0: exited on signal 6 (core dumped)

(gdb) bt
#0  0x28348d87 in kill () from /lib/libc.so.7
#1  0x280e1017 in raise () from /lib/libthr.so.3
#2  0x2834787a in abort () from /lib/libc.so.7
#3  0x2832fc86 in __assert () from /lib/libc.so.7
#4  0x0805f300 in proto_close (conn=0x0) at /usr/src/sbin/hastd/proto.c:287
#5  0x0804c445 in child_cleanup (res=0x284eb500) at /usr/src/sbin/hastd/control.c:61
#6  0x0804fc6d in listen_accept () at /usr/src/sbin/hastd/hastd.c:526
#7  0x0805059a in main_loop () at /usr/src/sbin/hastd/hastd.c:673
#8  0x08050a7f in main (argc=0, argv=0xbfbfed80) at /usr/src/sbin/hastd/hastd.c:784
(gdb) fr 5
#5  0x0804c445 in child_cleanup (res=0x284eb500) at /usr/src/sbin/hastd/control.c:61
61              proto_close(res->hr_event);
(gdb) list
56      child_cleanup(struct hast_resource *res)
57      {
58
59              proto_close(res->hr_ctrl);
60              res->hr_ctrl = NULL;
61              proto_close(res->hr_event);
62              res->hr_event = NULL;
63              res->hr_workerpid = 0;
64      }
65

So we have double close of res->hr_event. The first time it is closed when
parent detects that worker exited in main_loop(), and the second time when a
new connection from primary comes and the parent does cleanup after previously
terminated child before starting new one.

The straightforward fix is to check res->hr_event before closing, like in the
patch below.

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline; filename=control.c.patch

Index: sbin/hastd/control.c
===================================================================
--- sbin/hastd/control.c	(revision 213357)
+++ sbin/hastd/control.c	(working copy)
@@ -58,8 +58,10 @@ child_cleanup(struct hast_resource *res)
 
 	proto_close(res->hr_ctrl);
 	res->hr_ctrl = NULL;
-	proto_close(res->hr_event);
-	res->hr_event = NULL;
+	if (res->hr_event != NULL) {
+		proto_close(res->hr_event);
+		res->hr_event = NULL;
+	}
 	res->hr_workerpid = 0;
 }
 

--=-=-=--

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:25:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AC7D4106566C
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:25:52 +0000 (UTC)
	(envelope-from bruce@cran.org.uk)
Received: from muon.cran.org.uk (unknown [IPv6:2a01:348:0:15:5d59:5c40:0:1])
	by mx1.freebsd.org (Postfix) with ESMTP id 417838FC08
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:25:52 +0000 (UTC)
Received: from muon.cran.org.uk (localhost [127.0.0.1])
	by muon.cran.org.uk (Postfix) with ESMTP id 58CB1E7F74;
	Sat,  2 Oct 2010 13:25:51 +0100 (BST)
Received: from unknown (client-82-31-11-222.midd.adsl.virginmedia.com
	[82.31.11.222])
	(using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits))
	(No client certificate requested)
	by muon.cran.org.uk (Postfix) with ESMTPSA;
	Sat,  2 Oct 2010 13:25:50 +0100 (BST)
Date: Sat, 2 Oct 2010 13:25:48 +0100
From: Bruce Cran <bruce@cran.org.uk>
To: Phan Quoc Hien <phanquochien@gmail.com>
Message-ID: <20101002132548.00002898@unknown>
In-Reply-To: <AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.16.6; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:25:52 -0000

On Sat, 2 Oct 2010 19:19:12 +0700
Phan Quoc Hien <phanquochien@gmail.com> wrote:

> Thanks for your respond.!
> Yes. I pulled the power plug .
> I edited rc.conf and save it then pulling the power plug. And system
> boot next time rc.conf is a blank file...!

This is an issue when using SoftUpdates - data isn't written to disk
immediately, and empty files are produced when power is lost. See
http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES for
details. 

-- 
Bruce Cran

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:42:11 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5621C106566B
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:42:11 +0000 (UTC)
	(envelope-from ronald-freebsd8@klop.yi.org)
Received: from smtp-out3.tiscali.nl (smtp-out3.tiscali.nl [195.241.79.178])
	by mx1.freebsd.org (Postfix) with ESMTP id 152F98FC0C
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:42:10 +0000 (UTC)
Received: from [212.123.145.58] (helo=sjakie.klop.ws)
	by smtp-out3.tiscali.nl with esmtp (Exim)
	(envelope-from <ronald-freebsd8@klop.yi.org>) id 1P21Pl-0002sg-VN
	for freebsd-fs@freebsd.org; Sat, 02 Oct 2010 14:42:10 +0200
Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1])
	by sjakie.klop.ws (Postfix) with ESMTP id 33ACF42A2
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:42:08 +0200 (CEST)
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
To: freebsd-fs@freebsd.org
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
Date: Sat, 02 Oct 2010 14:42:08 +0200
MIME-Version: 1.0
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.vjx78i0i8527sy@212-123-145-58.ip.telfort.nl>
In-Reply-To: <AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
User-Agent: Opera Mail/10.62 (FreeBSD)
Content-Transfer-Encoding: quoted-printable
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:42:11 -0000

On Sat, 02 Oct 2010 14:19:12 +0200, Phan Quoc Hien =20
<phanquochien@gmail.com> wrote:

> Thanks for your respond.!
> Yes. I pulled the power plug .
> I edited rc.conf and save it then pulling the power plug. And system bo=
ot
> next time rc.conf is a blank file...!

Don't pull the power plug if you don't have to. The command to reboot is =
=20
'shutdown -r now' and than all pending data will be saved safely.

Ronald.

> On Sat, Oct 2, 2010 at 7:02 PM, Ronald Klop =20
> <ronald-freebsd8@klop.yi.org>wrote:
>
>> On Sat, 02 Oct 2010 13:28:59 +0200, Phan Quoc Hien =20
>> <phanquochien@gmail.com>
>> wrote:
>>
>>  hello everybody.
>>> I'm new to freebsd, When I hard shutdown my freebsd box..it caused lo=
st
>>> some
>>> file. I used UFS2. How can prevent that? or recovery my file?
>>> Thanks!
>>>
>>>
>> By hard shutdown you mean pulling the power plug?
>>
>> UFS2 (and most other filesystems on other operating systems) guarantee
>> consistency of metadata (filenames, directory structures, etc.) after =
a
>> crash. However it is possible to loose the last X seconds of unwritten=
 =20
>> data.
>> That can be the complete contents of a new file.
>>
>> If it is really important you can mount your filesystem 'sync' see 'ma=
n
>> mount' in which case it will become slow, but more up-to-date.
>>
>> Ronald.
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>
>

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:49:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 302D3106564A
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:49:52 +0000 (UTC)
	(envelope-from phanquochien@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id B5F108FC1C
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:49:51 +0000 (UTC)
Received: by wyb29 with SMTP id 29so2592705wyb.13
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 05:49:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=Sp6yWP1DBXm2eVs+n/kL43YSqRwOnvJGaR0qUDwB2bY=;
	b=kCFdxcNSmKuiahoHV7qdE2LVW8jhcrJmmYvxCHjlEplikTCqD7fbfh9hIkY61ULlXj
	+rhB2o2SF76hllIYLXE/4XZtyzjL88goLUB974Tv5QWwjzU5uXVwEFcDTMwaJ5FPvEXd
	ZseBe7YrhwhQEsuCSyZz71pOtIaC5D/bXhdVY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=hFzU0IWWkddHoWuajKO345EioUEZEWom2T0IBHcrSg2i/IWXRqAM8OI6Dh18K7uFZA
	ptsCaVUUf+iqcylRpHdvYa6hCIdkASXWumcj9MhVfPIYhrZdv8xOehLkugo1voLJVI3Y
	DQnIb5N+s7/jVf/hGLWxXW5K4yh0lA9QMA0Ao=
MIME-Version: 1.0
Received: by 10.227.151.195 with SMTP id d3mr5646668wbw.170.1286023790106;
	Sat, 02 Oct 2010 05:49:50 -0700 (PDT)
Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 05:49:50 -0700 (PDT)
In-Reply-To: <20101002132548.00002898@unknown>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
Date: Sat, 2 Oct 2010 19:49:50 +0700
Message-ID: <AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
From: Phan Quoc Hien <phanquochien@gmail.com>
To: Bruce Cran <bruce@cran.org.uk>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:49:52 -0000

Thank for your respond. I have checked my fstab file. I didn't see any
option about SoftUpdates for my / partition.

On Sat, Oct 2, 2010 at 7:25 PM, Bruce Cran <bruce@cran.org.uk> wrote:

> On Sat, 2 Oct 2010 19:19:12 +0700
> Phan Quoc Hien <phanquochien@gmail.com> wrote:
>
> > Thanks for your respond.!
> > Yes. I pulled the power plug .
> > I edited rc.conf and save it then pulling the power plug. And system
> > boot next time rc.conf is a blank file...!
>
> This is an issue when using SoftUpdates - data isn't written to disk
> immediately, and empty files are produced when power is lost. See
> http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES for
> details.
>
> --
> Bruce Cran
>


-- 
Best regards,
Mr.Hien
E-mail: phanquochien@gmail.com
Website: www.mrhien.info

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 12:59:05 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5AE7C106564A
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:59:05 +0000 (UTC)
	(envelope-from ronald-freebsd8@klop.yi.org)
Received: from smtp-out1.tiscali.nl (smtp-out1.tiscali.nl [195.241.79.176])
	by mx1.freebsd.org (Postfix) with ESMTP id 185AD8FC1B
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 12:59:05 +0000 (UTC)
Received: from [212.123.145.58] (helo=sjakie.klop.ws)
	by smtp-out1.tiscali.nl with esmtp (Exim)
	(envelope-from <ronald-freebsd8@klop.yi.org>) id 1P21g8-000445-0X
	for freebsd-fs@freebsd.org; Sat, 02 Oct 2010 14:59:04 +0200
Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1])
	by sjakie.klop.ws (Postfix) with ESMTP id 5918C42BA
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:59:02 +0200 (CEST)
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
To: freebsd-fs@freebsd.org
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
	<AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
Date: Sat, 02 Oct 2010 14:59:02 +0200
MIME-Version: 1.0
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
In-Reply-To: <AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
User-Agent: Opera Mail/10.62 (FreeBSD)
Content-Transfer-Encoding: quoted-printable
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 12:59:05 -0000

On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien =20
<phanquochien@gmail.com> wrote:

> Thank for your respond. I have checked my fstab file. I didn't see any
> option about SoftUpdates for my / partition.

When you give the command 'mount' you will see several lines like this.

/dev/ad8s1d on /var (ufs, local, soft-updates)

Softupdates can be enabled/disabled with the command tunefs. See 'man =20
tunefs'.

My advice is to not pull the power plug after changing critical files, bu=
t =20
to reboot cleanly. Than there is no problem for 99% of the time and your =
=20
computer is fast also.
Or is there a reason for you to prefer the power plug?

Ronald.


> On Sat, Oct 2, 2010 at 7:25 PM, Bruce Cran <bruce@cran.org.uk> wrote:
>
>> On Sat, 2 Oct 2010 19:19:12 +0700
>> Phan Quoc Hien <phanquochien@gmail.com> wrote:
>>
>> > Thanks for your respond.!
>> > Yes. I pulled the power plug .
>> > I edited rc.conf and save it then pulling the power plug. And system
>> > boot next time rc.conf is a blank file...!
>>
>> This is an issue when using SoftUpdates - data isn't written to disk
>> immediately, and empty files are produced when power is lost. See
>> http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES fo=
r
>> details.
>>
>> --
>> Bruce Cran
>>
>
>

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 13:07:30 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8A8C1065673
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 13:07:30 +0000 (UTC)
	(envelope-from phanquochien@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 378888FC15
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 13:07:30 +0000 (UTC)
Received: by wyb29 with SMTP id 29so2602553wyb.13
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 06:07:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=q4wMwwrkNN5vOWJMKRGD4qXBwlNKI28ZA9LH/fEWSJE=;
	b=JNjciMuiroaAAV7NmyFlA7NhzbG1WYWNeC5+eTfAChkQs5ipts6KdYHaRzd5xtYQSE
	khYQpyO6ztlrisc0SitPhvgk6tFRl1/Q27gsccDqtCQ6myYYS8iySZcCF9WiFCw8Fuf4
	5HTF16WoicI/R3P6dWoSvc3aAi/UJdZ7WMFvg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=xbpDz3SSbZY1Gc0f25E3Fkm5s0OqRpMv4bj6h61gtfAwsnUF+aiJDVVpmHr62GRRFt
	fU0yiUVxbEQ3DZ0cA0dAom7WfWxyoID5g6qwG8hx2uqSW4sUXv1oX02PjILcWBNNEb99
	p2f9zei54e5gf8RRYRqOqlM7DZBLzU0PrJ1js=
MIME-Version: 1.0
Received: by 10.227.134.136 with SMTP id j8mr5463231wbt.206.1286024848461;
	Sat, 02 Oct 2010 06:07:28 -0700 (PDT)
Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 06:07:28 -0700 (PDT)
In-Reply-To: <op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
	<AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
	<op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
Date: Sat, 2 Oct 2010 20:07:28 +0700
Message-ID: <AANLkTi=Urr3B4Rxgm3OqffiSSDAeZpaJnSUX5sKSB9zW@mail.gmail.com>
From: Phan Quoc Hien <phanquochien@gmail.com>
To: Ronald Klop <ronald-freebsd8@klop.yi.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 13:07:30 -0000

The reason is power supply problems! Thank for your respond again.
Have a nice day.

On Sat, Oct 2, 2010 at 7:59 PM, Ronald Klop <ronald-freebsd8@klop.yi.org>wrote:

> On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien <phanquochien@gmail.com>
> wrote:
>
>  Thank for your respond. I have checked my fstab file. I didn't see any
>> option about SoftUpdates for my / partition.
>>
>
> When you give the command 'mount' you will see several lines like this.
>
> /dev/ad8s1d on /var (ufs, local, soft-updates)
>
> Softupdates can be enabled/disabled with the command tunefs. See 'man
> tunefs'.
>
> My advice is to not pull the power plug after changing critical files, but
> to reboot cleanly. Than there is no problem for 99% of the time and your
> computer is fast also.
> Or is there a reason for you to prefer the power plug?
>
> Ronald.
>
>
>
>  On Sat, Oct 2, 2010 at 7:25 PM, Bruce Cran <bruce@cran.org.uk> wrote:
>>
>>  On Sat, 2 Oct 2010 19:19:12 +0700
>>> Phan Quoc Hien <phanquochien@gmail.com> wrote:
>>>
>>> > Thanks for your respond.!
>>> > Yes. I pulled the power plug .
>>> > I edited rc.conf and save it then pulling the power plug. And system
>>> > boot next time rc.conf is a blank file...!
>>>
>>> This is an issue when using SoftUpdates - data isn't written to disk
>>> immediately, and empty files are produced when power is lost. See
>>> http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES for
>>> details.
>>>
>>> --
>>> Bruce Cran
>>>
>>>
>>
>>  _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


-- 
Best regards,
Mr.Hien
E-mail: phanquochien@gmail.com
Website: www.mrhien.info

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 13:16:09 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EEB2C106566B
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 13:16:09 +0000 (UTC)
	(envelope-from numisemis@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id B247E8FC08
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 13:16:09 +0000 (UTC)
Received: by iwn34 with SMTP id 34so6046160iwn.13
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 06:16:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=lLxnl89tClF5S9RxUC9baCzcYJT+jq5mA/iSpxygqOs=;
	b=E64Ba3MoVsPY+Z/QZI3T77PeYLbta8OJTuJkVlk/Y7PkkvqyKFKUckWnp7zP8AR7Uv
	jPXuV5ZEzSHZaln4zwBc6h03pgJCJQzd/W/R7JJULevy9cmkPaROLwZy7R+UR0Itk6eI
	9BAHQPfYcy9Nur3iH2ac9EjxHyVojdXESbIa0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=RjrX914t7WManA2wU4olPzq4fcTiT6HZ1QiZDDoC0t1z5nZSBIrkVvfAW8KRsCSD03
	7sICVa7FPohh2FeX2s66bLfjsgqGohdKOt3nsoc6lJm/ZUOTOXH6aZQww87+VpqCDNdj
	nAAOwvnl9jeGY6akxYmsOZaMT6kxHvJCpiVX0=
MIME-Version: 1.0
Received: by 10.231.167.130 with SMTP id q2mr7200070iby.163.1286025028464;
	Sat, 02 Oct 2010 06:10:28 -0700 (PDT)
Received: by 10.231.139.90 with HTTP; Sat, 2 Oct 2010 06:10:28 -0700 (PDT)
In-Reply-To: <AANLkTi=Urr3B4Rxgm3OqffiSSDAeZpaJnSUX5sKSB9zW@mail.gmail.com>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
	<AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
	<op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTi=Urr3B4Rxgm3OqffiSSDAeZpaJnSUX5sKSB9zW@mail.gmail.com>
Date: Sat, 2 Oct 2010 15:10:28 +0200
Message-ID: <AANLkTi=Pf5yK5c-AW9CyexBKh=zDgaKdTQzWauhVZaRS@mail.gmail.com>
From: =?UTF-8?Q?=C5=A0imun_Mikecin?= <numisemis@gmail.com>
To: Phan Quoc Hien <phanquochien@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs@freebsd.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 13:16:10 -0000

2010/10/2, Phan Quoc Hien <phanquochien@gmail.com>:
> The reason is power supply problems! Thank for your respond again.
> Have a nice day.

In that case, ZFS should be your friend! :-)

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 13:26:50 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A85F5106566B
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 13:26:50 +0000 (UTC)
	(envelope-from phanquochien@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3A8DC8FC0A
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 13:26:49 +0000 (UTC)
Received: by wyb29 with SMTP id 29so2613223wyb.13
	for <freebsd-fs@freebsd.org>; Sat, 02 Oct 2010 06:26:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=K9QIAzQVBH0O+RdfAfMw50mR3K0i5fWijkIdbMPQLrE=;
	b=tzPLZgi8Yvv2ewEoQ1R6FeNIT02bIt3XMlGSaMtbmot3I1yQfbivu40zNj+9p9NPbj
	TzhaUdG1nw/cyqA6MwGBiOFdyhgGZFSJ3o9ygto4A16McHxZ6aSxbIwv1tIUvCRUo5TE
	wZ0SbMzxGLZYrKUCKqN5GuWOievTksmSfwrAs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=g8mqE+c7ZqycrLSVj0t8iTlRt4r6VQjCKZ4rXlfv8ZagcAs9GuEEcya/d+Xe2cAMI7
	FFgmb+jW+xtuXy4eRkFXPYznx2QDB3pZcUa6WiOMNcsQET41n1cntsvs7VNNX7GW+YVS
	YPThQ65+ol5ZuZdOqwFfKTg+95pFaRl/dYISQ=
MIME-Version: 1.0
Received: by 10.227.151.195 with SMTP id d3mr5677070wbw.170.1286026008289;
	Sat, 02 Oct 2010 06:26:48 -0700 (PDT)
Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 06:26:48 -0700 (PDT)
In-Reply-To: <op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
	<AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
	<op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
Date: Sat, 2 Oct 2010 20:26:48 +0700
Message-ID: <AANLkTikL=wXPG2JnNdBX0AxXiGHCMyXDz0_sjsDOziTF@mail.gmail.com>
From: Phan Quoc Hien <phanquochien@gmail.com>
To: Ronald Klop <ronald-freebsd8@klop.yi.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 13:26:50 -0000

On Sat, Oct 2, 2010 at 7:59 PM, Ronald Klop <ronald-freebsd8@klop.yi.org>wrote:

> On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien <phanquochien@gmail.com>
> wrote:
>
>  Thank for your respond. I have checked my fstab file. I didn't see any
>> option about SoftUpdates for my / partition.
>>
>
> When you give the command 'mount' you will see several lines like this.
>
> /dev/ad8s1d on /var (ufs, local, soft-updates)
>
> Softupdates can be enabled/disabled with the command tunefs. See 'man
> tunefs'.
>
> When I run mount command. it shown:
$ mount
/dev/ad0s1a on / (ufs, local)
devfs on /dev (devfs, local, multilabel)
-- 
Best regards,
Mr.Hien
E-mail: phanquochien@gmail.com
Website: www.mrhien.info

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 14:21:47 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2F6C7106566B
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:21:47 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta02.emeryville.ca.mail.comcast.net
	(qmta02.emeryville.ca.mail.comcast.net [76.96.30.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 11DCD8FC19
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:21:46 +0000 (UTC)
Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90])
	by qmta02.emeryville.ca.mail.comcast.net with comcast
	id Dq5S1f0031wfjNsA2qMmGY; Sat, 02 Oct 2010 14:21:46 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta23.emeryville.ca.mail.comcast.net with comcast
	id DqMl1f00A3LrwQ28jqMlXu; Sat, 02 Oct 2010 14:21:46 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 713489B418; Sat,  2 Oct 2010 07:21:45 -0700 (PDT)
Date: Sat, 2 Oct 2010 07:21:45 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Phan Quoc Hien <phanquochien@gmail.com>
Message-ID: <20101002142145.GA70541@icarus.home.lan>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
	<AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
	<op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTikL=wXPG2JnNdBX0AxXiGHCMyXDz0_sjsDOziTF@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <AANLkTikL=wXPG2JnNdBX0AxXiGHCMyXDz0_sjsDOziTF@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 14:21:47 -0000

On Sat, Oct 02, 2010 at 08:26:48PM +0700, Phan Quoc Hien wrote:
> On Sat, Oct 2, 2010 at 7:59 PM, Ronald Klop <ronald-freebsd8@klop.yi.org>wrote:
> 
> > On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien <phanquochien@gmail.com>
> > wrote:
> >
> >  Thank for your respond. I have checked my fstab file. I didn't see any
> >> option about SoftUpdates for my / partition.
> >>
> >
> > When you give the command 'mount' you will see several lines like this.
> >
> > /dev/ad8s1d on /var (ufs, local, soft-updates)
> >
> > Softupdates can be enabled/disabled with the command tunefs. See 'man
> > tunefs'.
> >
> > When I run mount command. it shown:
> $ mount
> /dev/ad0s1a on / (ufs, local)
> devfs on /dev (devfs, local, multilabel)

I didn't see anyone mention this to you in the thread, but:

By default in FreeBSD (during the installation phase), softupdates are
explicitly **not** applied to the root filesystem.  This is intentional,
but the reason for it I do not know.  I imagine it's justified though.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 14:30:45 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4162D1065693
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:30:45 +0000 (UTC)
	(envelope-from bruce@cran.org.uk)
Received: from muon.cran.org.uk (unknown [IPv6:2a01:348:0:15:5d59:5c40:0:1])
	by mx1.freebsd.org (Postfix) with ESMTP id C8F838FC08
	for <freebsd-fs@freebsd.org>; Sat,  2 Oct 2010 14:30:44 +0000 (UTC)
Received: from muon.cran.org.uk (localhost [127.0.0.1])
	by muon.cran.org.uk (Postfix) with ESMTP id DBED0E616D;
	Sat,  2 Oct 2010 15:30:43 +0100 (BST)
Received: from unknown (client-82-31-11-222.midd.adsl.virginmedia.com
	[82.31.11.222])
	(using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits))
	(No client certificate requested)
	by muon.cran.org.uk (Postfix) with ESMTPSA;
	Sat,  2 Oct 2010 15:30:42 +0100 (BST)
Date: Sat, 2 Oct 2010 15:30:40 +0100
From: Bruce Cran <bruce@cran.org.uk>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20101002153040.00001993@unknown>
In-Reply-To: <20101002142145.GA70541@icarus.home.lan>
References: <AANLkTi=eib81J1zUT_HmPcXOBFiLDFmyA8Q-KZV5HikE@mail.gmail.com>
	<op.vjx6e4o08527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTik53kP7K-EcHy=uO2QdAJe5r7MvJxmbsCQ_b0Ye@mail.gmail.com>
	<20101002132548.00002898@unknown>
	<AANLkTina-12oxTxaBgzeQqF7n4GroDd76hiMprky6_r=@mail.gmail.com>
	<op.vjx80ows8527sy@212-123-145-58.ip.telfort.nl>
	<AANLkTikL=wXPG2JnNdBX0AxXiGHCMyXDz0_sjsDOziTF@mail.gmail.com>
	<20101002142145.GA70541@icarus.home.lan>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.16.6; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Klop <ronald-freebsd8@klop.yi.org>, Ronald
Subject: Re: Data loss when hard shutdown!
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 14:30:45 -0000

On Sat, 2 Oct 2010 07:21:45 -0700
Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:

> By default in FreeBSD (during the installation phase), softupdates are
> explicitly **not** applied to the root filesystem.  This is
> intentional, but the reason for it I do not know.  I imagine it's
> justified though.

See http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES

-- 
Bruce Cran

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 16:26:06 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D5E14106564A;
	Sat,  2 Oct 2010 16:26:06 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 3872D8FC17;
	Sat,  2 Oct 2010 16:26:05 +0000 (UTC)
Received: by fxm9 with SMTP id 9so3354844fxm.13
	for <multiple recipients>; Sat, 02 Oct 2010 09:26:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:from:to:cc:subject:references
	:x-comment-to:date:in-reply-to:message-id:user-agent:mime-version
	:content-type; bh=xcK9desYQpRHqcDeASxNdUusoVhCZM3HTGqO7r6gcLY=;
	b=q2MR3fEFZeIwGGqzSgMLaObtJkxNMGVrf4r8pv9vxQz17f7sLxS9NYwCq1mWHrxuEK
	0TrzyBVJMpD1M6jR9pu7AeY7YNIarFElYCoLaNsCrc+/pGfiRuFa0mOEd+7Z+mShXo1t
	uKmC26r3OO/KaI0zd5Fqv7kVYoXo1iW4cX0Ew=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:cc:subject:references:x-comment-to:date:in-reply-to
	:message-id:user-agent:mime-version:content-type;
	b=fUwvsbBB0ZEzqYWOIgbeZdCavU6uQd/chjWIXabeFgvwl/LSN9mFfKEL/8HiKvJ6OP
	ePWYHTNreUCYBXpetIruK5j8yH7R61gUQIUDDus0eNci4vQ+OB+y9uiy6vU8+cvf8Zuw
	Z2z/4pmbz/ZAqxEtrfSbYZR0wm6L4MYNzQ+bA=
Received: by 10.223.103.84 with SMTP id j20mr6907188fao.35.1286036765051;
	Sat, 02 Oct 2010 09:26:05 -0700 (PDT)
Received: from localhost ([95.69.162.97])
	by mx.google.com with ESMTPS id h12sm1250305faa.13.2010.10.02.09.26.03
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sat, 02 Oct 2010 09:26:04 -0700 (PDT)
From: Mikolaj Golub <to.my.trociny@gmail.com>
To: freebsd-fs@freebsd.org
References: <86hbh44wgl.fsf@kopusha.home.net>
X-Comment-To: Mikolaj Golub
Date: Sat, 02 Oct 2010 19:26:05 +0300
In-Reply-To: <86hbh44wgl.fsf@kopusha.home.net> (Mikolaj Golub's message of
	"Sat, 02 Oct 2010 15:20:58 +0300")
Message-ID: <86aamw4l42.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Cc: pjd@freebsd.org
Subject: Re: hastd: assertion (res->hr_event != NULL) fails in secondary on
	split-brain
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 16:26:07 -0000

--=-=-=


On Sat, 02 Oct 2010 15:20:58 +0300 Mikolaj Golub wrote:

 MG> After recent changes in hastd (I think r213006: Fix descriptor leaks) if
 MG> split-brain occurs hastd will abort in child_cleanup() on assertion
 MG> (res->hr_event != NULL).
 ...
 MG> So we have double close of res->hr_event. The first time it is closed when
 MG> parent detects that worker exited in main_loop(), and the second time when a
 MG> new connection from primary comes and the parent does cleanup after previously
 MG> terminated child before starting new one.

 MG> The straightforward fix is to check res->hr_event before closing, like in the
 MG> patch below.

 MG> -- 
 MG> Mikolaj Golub

 MG> Index: sbin/hastd/control.c
 MG> ===================================================================
 MG> --- sbin/hastd/control.c        (revision 213357)
 MG> +++ sbin/hastd/control.c        (working copy)
 MG> @@ -58,8 +58,10 @@ child_cleanup(struct hast_resource *res)
 MG>  
 MG>          proto_close(res->hr_ctrl);
 MG>          res->hr_ctrl = NULL;
 MG> -        proto_close(res->hr_event);
 MG> -        res->hr_event = NULL;
 MG> +        if (res->hr_event != NULL) {
 MG> +                proto_close(res->hr_event);
 MG> +                res->hr_event = NULL;
 MG> +        }
 MG>          res->hr_workerpid = 0;
 MG>  }
 MG>  

Running with this fix another issue is observed. On split-brain `hastctl
status' on secondary will return "[ERROR] Error 32 received from hastd" most
of the times. And only for some runs an output will be returned.

lolek# hastctl status storage
[ERROR] Error 32 received from hastd.
lolek# hastctl status storage
[ERROR] Error 32 received from hastd.
lolek# hastctl status storage
storage:
  role: secondary
  provname: storage
  localpath: /dev/ad4
  extentsize: 2097152
  keepdirty: 0
  remoteaddr: tcp4://bolek
  replication: memsync
  status: complete
  dirty: 0 bytes
lolek# hastctl status storage
[ERROR] Error 32 received from hastd.

This is because hastd clears res->hr_workerpid only when a new connection from
the primary comes. Whilst hastd checks res->hr_workerpid in control_status()
and if it is not zero it tries to get info from the worker and returns error
(broken pipe) if the worker is actually not running.

So it looks like it is better not just to close res->hr_ctrl in main_loop()
but to do full child cleanup here -- straight away its exit is detected.

What do you think about the attached patch?

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-patch
Content-Disposition: attachment; filename=hast.child_kill.patch

Index: sbin/hastd/hastd.c
===================================================================
--- sbin/hastd/hastd.c	(revision 213357)
+++ sbin/hastd/hastd.c	(working copy)
@@ -94,22 +94,6 @@ g_gate_load(void)
 }
 
 static void
-child_exit_log(unsigned int pid, int status)
-{
-
-	if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
-		pjdlog_debug(1, "Worker process exited gracefully (pid=%u).",
-		    pid);
-	} else if (WIFSIGNALED(status)) {
-		pjdlog_error("Worker process killed (pid=%u, signal=%d).",
-		    pid, WTERMSIG(status));
-	} else {
-		pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).",
-		    pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1);
-	}
-}
-
-static void
 child_exit(void)
 {
 	struct hast_resource *res;
@@ -388,8 +372,6 @@ listen_accept(void)
 	const unsigned char *token;
 	char laddr[256], raddr[256];
 	size_t size;
-	pid_t pid;
-	int status;
 
 	proto_local_address(cfg->hc_listenconn, laddr, sizeof(laddr));
 	pjdlog_debug(1, "Accepting connection to %s.", laddr);
@@ -504,26 +486,7 @@ listen_accept(void)
 			    "Worker process exists (pid=%u), stopping it.",
 			    (unsigned int)res->hr_workerpid);
 			/* Stop child process. */
-			if (kill(res->hr_workerpid, SIGINT) < 0) {
-				pjdlog_errno(LOG_ERR,
-				    "Unable to stop worker process (pid=%u)",
-				    (unsigned int)res->hr_workerpid);
-				/*
-				 * Other than logging the problem we
-				 * ignore it - nothing smart to do.
-				 */
-			}
-			/* Wait for it to exit. */
-			else if ((pid = waitpid(res->hr_workerpid,
-			    &status, 0)) != res->hr_workerpid) {
-				/* We can only log the problem. */
-				pjdlog_errno(LOG_ERR,
-				    "Waiting for worker process (pid=%u) failed",
-				    (unsigned int)res->hr_workerpid);
-			} else {
-				child_exit_log(res->hr_workerpid, status);
-			}
-			child_cleanup(res);
+			child_kill(res);
 		} else if (res->hr_remotein != NULL) {
 			char oaddr[256];
 
@@ -678,8 +641,8 @@ main_loop(void)
 				if (event_recv(res) == 0)
 					continue;
 				/* The worker process exited? */
-				proto_close(res->hr_event);
-				res->hr_event = NULL;
+				if (res->hr_workerpid != 0)
+					child_kill(res);
 			}
 		}
 	}
Index: sbin/hastd/control.c
===================================================================
--- sbin/hastd/control.c	(revision 213357)
+++ sbin/hastd/control.c	(working copy)
@@ -63,6 +63,51 @@ child_cleanup(struct hast_resource *res)
 	res->hr_workerpid = 0;
 }
 
+void
+child_exit_log(unsigned int pid, int status)
+{
+
+	if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
+		pjdlog_debug(1, "Worker process exited gracefully (pid=%u).",
+		    pid);
+	} else if (WIFSIGNALED(status)) {
+		pjdlog_error("Worker process killed (pid=%u, signal=%d).",
+		    pid, WTERMSIG(status));
+	} else {
+		pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).",
+		    pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1);
+	}
+}
+
+void
+child_kill(struct hast_resource *res)
+{
+	pid_t pid;
+	int status;
+
+	assert(res->hr_workerpid != 0);
+	
+	if (kill(res->hr_workerpid, SIGINT) < 0) {
+		pjdlog_errno(LOG_ERR,
+			     "Unable to stop worker process (pid=%u)",
+			     (unsigned int)res->hr_workerpid);
+		/*
+		 * Other than logging the problem we
+		 * ignore it - nothing smart to do.
+		 */
+	}
+	/* Wait for it to exit. */
+	else if ((pid = waitpid(res->hr_workerpid,
+				&status, 0)) != res->hr_workerpid) {
+		/* We can only log the problem. */
+		pjdlog_errno(LOG_ERR,
+			     "Waiting for worker process (pid=%u) failed",
+			     (unsigned int)res->hr_workerpid);
+	}
+	child_exit_log(res->hr_workerpid, status);
+	child_cleanup(res);
+}	
+
 static void
 control_set_role_common(struct hastd_config *cfg, struct nv *nvout,
     uint8_t role, struct hast_resource *res, const char *name, unsigned int no)
@@ -107,22 +152,8 @@ control_set_role_common(struct hastd_config *cfg,
 	 * If previous role was primary or secondary we have to kill process
 	 * doing that work.
 	 */
-	if (res->hr_workerpid != 0) {
-		if (kill(res->hr_workerpid, SIGTERM) < 0) {
-			pjdlog_errno(LOG_WARNING,
-			    "Unable to kill worker process %u",
-			    (unsigned int)res->hr_workerpid);
-		} else if (waitpid(res->hr_workerpid, NULL, 0) !=
-		    res->hr_workerpid) {
-			pjdlog_errno(LOG_WARNING,
-			    "Error while waiting for worker process %u",
-			    (unsigned int)res->hr_workerpid);
-		} else {
-			pjdlog_debug(1, "Worker process %u stopped.",
-			    (unsigned int)res->hr_workerpid);
-		}
-		child_cleanup(res);
-	}
+	if (res->hr_workerpid != 0)
+		child_kill(res);
 
 	/* Start worker process if we are changing to primary. */
 	if (role == HAST_ROLE_PRIMARY)
Index: sbin/hastd/control.h
===================================================================
--- sbin/hastd/control.h	(revision 213357)
+++ sbin/hastd/control.h	(working copy)
@@ -39,6 +39,8 @@ struct hastd_config;
 struct hast_resource;
 
 void child_cleanup(struct hast_resource *res);
+void child_exit_log(unsigned int pid, int status);
+void child_kill(struct hast_resource *res);
 
 void control_set_role(struct hast_resource *res, uint8_t role);
 

--=-=-=--