From owner-freebsd-fs@FreeBSD.ORG  Sun Oct  7 01:29:35 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00E1016A417
	for <freebsd-fs@freebsd.org>; Sun,  7 Oct 2007 01:29:35 +0000 (UTC)
	(envelope-from joao.barros@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.191])
	by mx1.freebsd.org (Postfix) with ESMTP id 80D7013C447
	for <freebsd-fs@freebsd.org>; Sun,  7 Oct 2007 01:29:34 +0000 (UTC)
	(envelope-from joao.barros@gmail.com)
Received: by nf-out-0910.google.com with SMTP id b2so716730nfb
	for <freebsd-fs@freebsd.org>; Sat, 06 Oct 2007 18:29:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=GZLnOFG+79X7Ik1yDNaXcL1PDi9YRrslRdcXuAlxR0w=;
	b=hHglU8fIOlpo6cnvlAvK2CQtgoN5WuvTHx/QzaBBWTnbZMpi9Nbg7aGimgSsXi8OliYeyml9/vP1eNZ/pFN9wccMwnLmTvSj8oGWli+rQ0mjUJ5qskTdKxAP2OUqnyO79cRm15tJysQM1+KxnIKCa3X0fVuLFcP03pPkzgdk4gY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=GA4JXBSugEBqvU6nxxKfES7jrVP7Qa63Do3Jfz3EC/+b8KX2hoAlEjmfK+ToOVvl+QI6X5lmXiTxflFg4CMrQuGH46SbhPLvM5dfzIurOZwSrTcmJSY+pRLuITKkSjgssHW054MCLgZ5tLtLpgkzMTzxmTJ0jjbPB7teoDqtYlE=
Received: by 10.78.180.18 with SMTP id c18mr9816666huf.1191719043101;
	Sat, 06 Oct 2007 18:04:03 -0700 (PDT)
Received: by 10.78.163.2 with HTTP; Sat, 6 Oct 2007 18:04:03 -0700 (PDT)
Message-ID: <70e8236f0710061804o62ea85c6k1d8a5e7d3600ef15@mail.gmail.com>
Date: Sun, 7 Oct 2007 02:04:03 +0100
From: "Joao Barros" <joao.barros@gmail.com>
To: "Pawel Jakub Dawidek" <pjd@freebsd.org>
In-Reply-To: <20071005000046.GC92272@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20071005000046.GC92272@garage.freebsd.pl>
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 01:29:35 -0000

On 10/5/07, Pawel Jakub Dawidek <pjd@freebsd.org> wrote:
> Hi.
>
> We'are about to branch RELENG_7 and I'd like to start discussion with
> folks that experience 'kmem_map too small' panic with the latest HEAD.
>
> I'm trying hard to reproduce it and I can't, so I need to gather more
> info how you are able to provoke this panic.
>
> What I did was to rsync 200 FreeBSD src trees from one directory to
> another on the same ZFS file system. It worked fine.
>
> The system I'm using is i386 and the only tuning I did is bigger
> kmem_map. From my /boot/loader.conf:
>
> vm.kmem_size=629145600
> vm.kmem_size_max=629145600
>
> The machine is dual core Pentium D 3GHz with 1GB of RAM. My pool is:
>
> lcf:root:/tank/0# zpool status
>   pool: tank
>  state: ONLINE
>  scrub: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           ad4       ONLINE       0     0     0
>           ad5       ONLINE       0     0     0
>           ad6       ONLINE       0     0     0
>           ad7       ONLINE       0     0     0
>
> errors: No known data errors
>
> If you can still see those panic, please let me know as soon as possible
> and try to describe what your workload looks like, how to reproduce it,
> etc. I'd really like ZFS to be rock-stable for 7.0 even on i386.
>

i386 with 1GB here. I used to get this when chown'ing some thousand
files recursively via ssh.
Last time I got this was unraring 2GB files from n x 95MB rars via NFS.

My system:

xeon# zpool status
  pool: r4x320
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        r4x320      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0s1d  ONLINE       0     0     0
            ad1s1d  ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors

xeon# cat /boot/loader.conf
zfs_load="YES"
vfs.root.mountfrom="zfs:r4x320"
vfs.zfs.prefetch_disable=1 (I have this to improve video play)

xeon# sysctl vm | grep kmem
vm.kmem_size_scale: 3
vm.kmem_size_max: 335544320
vm.kmem_size_min: 0
vm.kmem_size: 335544320

xeon# sysctl -a | grep vnodes
kern.maxvnodes: 52242
kern.minvnodes: 17414
vfs.freevnodes: 7797
vfs.wantfreevnodes: 17414
vfs.numvnodes: 8230

I usually set kern.maxvnodes to 50000 manually and everything is ok
but I see that I forgot to on my last reboot and haven't seen any
problems yet:
xeon# uptime
 1:56AM  up 4 days, 15:06

off-topic: it's lovely not having to have your 874GB fs with millions
of files checked after a crash or power failure :-D

-- 
Joao Barros

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct  7 18:13:02 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 20B9616A41B
	for <freebsd-fs@freebsd.org>; Sun,  7 Oct 2007 18:13:02 +0000 (UTC)
	(envelope-from amdmi3@amdmi3.ru)
Received: from cp65.agava.net (cp65.agava.net [89.108.66.215])
	by mx1.freebsd.org (Postfix) with ESMTP id BFF7113C4B6
	for <freebsd-fs@freebsd.org>; Sun,  7 Oct 2007 18:13:01 +0000 (UTC)
	(envelope-from amdmi3@amdmi3.ru)
Received: from [213.148.20.85] (helo=nexii.panopticon)
	by cp65.agava.net with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.44 (FreeBSD))
	id 1Ieace-0003rO-1Q; Sun, 07 Oct 2007 22:13:00 +0400
Received: from hades.panopticon (hades.panopticon [192.168.0.2])
	by nexii.panopticon (Postfix) with ESMTP id 8C3A11703D;
	Sun,  7 Oct 2007 22:14:13 +0400 (MSD)
Received: by hades.panopticon (Postfix, from userid 1000)
	id 3D95F40CB; Sun,  7 Oct 2007 22:14:29 +0400 (MSD)
Date: Sun, 7 Oct 2007 22:14:29 +0400
From: Dmitry Marakasov <amdmi3@amdmi3.ru>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20071007181429.GB1082@hades.panopticon>
Mail-Followup-To: Bruce Evans <brde@optusnet.com.au>,
	freebsd-fs@freebsd.org
References: <20071005004820.GA29814@hades.panopticon>
	<20071006080406.S689@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
In-Reply-To: <20071006080406.S689@besplex.bde.org>
User-Agent: Mutt/1.5.16 (2007-06-09)
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - cp65.agava.net
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [26 6]
X-AntiAbuse: Sender Address Domain - amdmi3.ru
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: freebsd-fs@freebsd.org
Subject: Re: Very slow writes on flash + msdosfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 18:13:02 -0000

* Bruce Evans (brde@optusnet.com.au) wrote:

> Try 512 bytes/cluster for real slowness.
Yep, I've tried. I had to use 32k clusters to get 8Mb/s.

>> So where is the problem? Why's there no caching and why's there 1 sector
>> writes?
> Old versions of msdosfs don't implement clustering.

>> PS. I use 6.1, has the situation changed in -CURRENT?
> Yes.
>
> However, clustering won't help much for small files, due to BSD's
> fundamental design error of per-vnode buffering.  With 340 files in a
>
> Async mounts would reduce the minimum number of writes per file to
> about 1 (for the data block).  msdosfs doesn't implement them yet.
Hm, true. I just found out that UFS is slow as well. Though it can write
large files at almost 11 MB/s, small files are even slower. Async mount
helps a lot, too bad msdosfs can't do it. So if this problem is
fundamenal, and msdosfs can't work async, isn't there some GEOM class
that does simple memory caching? I don't care what happens to data on
flash if the power goes down in the middle of writing, but I'm just too
jealous of how good Linux forks with the very same flash (I have 4Gb mem
Linux box on my work, and whatever you copy to flash - movies, or tons
of small files, cp finishes in a moment - all data is actually copied to
memory, and than flushed on umount or sync - I guess on the maximum
speed possible).

-- 
Best regards,
  Dmitry Marakasov               mailto:amdmi3@amdmi3.ru

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct  7 19:06:38 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DD84916A418
	for <freebsd-fs@freebsd.org>; Sun,  7 Oct 2007 19:06:38 +0000 (UTC)
	(envelope-from softsearch@gmail.com)
Received: from mu-out-0910.google.com (mu-out-0910.google.com [209.85.134.190])
	by mx1.freebsd.org (Postfix) with ESMTP id 6835C13C459
	for <freebsd-fs@freebsd.org>; Sun,  7 Oct 2007 19:06:33 +0000 (UTC)
	(envelope-from softsearch@gmail.com)
Received: by mu-out-0910.google.com with SMTP id w9so1343693mue
	for <freebsd-fs@freebsd.org>; Sun, 07 Oct 2007 12:06:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:date:from:reply-to:organization:x-priority:message-id:to:subject:mime-version:content-type:content-transfer-encoding;
	bh=gGp44Wkxa6i6vNbPO6c9E5WnkEltfzh0hlf/cxLygTE=;
	b=MG3tu2/kX5mBkU/Wzp2XdtsB7+okHQXjd9q3b9gz27x6yuGMgamijpOEzvx8PMgvFiz/bdKK54StKIcpRfXcXdrsyzAcbFLrlQ+En0MHqy3FSQO+sPG5Khp4aadylDjeoC4e4R0jhygy8lCsUnad2a/95YgjCBM1G5S0u2jPnL0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:date:from:reply-to:organization:x-priority:message-id:to:subject:mime-version:content-type:content-transfer-encoding;
	b=Hx6MqSNsHpjd0AAEFJkdBJFzHTUSNdtGmrf07Tn+km5oWJsm1GKWEsJyY9aNAeMMhuw3UXu8Ljteudj7xpmkHaFF8B+sdNwDFs2ALaHZ63E+KVhJcl8CTRAa3ciJPnDu5BwRu4rDuGPeavM/NftBsMHxIi57Q4y6HJ+UR6LYQdY=
Received: by 10.82.146.14 with SMTP id t14mr20218121bud.1191783991807;
	Sun, 07 Oct 2007 12:06:31 -0700 (PDT)
Received: from ?81.200.123.169? ( [81.200.123.169])
	by mx.google.com with ESMTPS id f3sm6384025nfh.2007.10.07.12.06.30
	(version=SSLv3 cipher=OTHER); Sun, 07 Oct 2007 12:06:31 -0700 (PDT)
Date: Sun, 7 Oct 2007 23:05:52 +0400
From: Michael Monashev <softsearch@gmail.com>
Organization: SoftSearch.ru
X-Priority: 3 (Normal)
Message-ID: <19347792.20071007230552@gmail.com>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: ZFS stripe question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Michael Monashev <softsearch@gmail.com>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 19:06:38 -0000

Hi!

I  have  for example 2 same disks: da0 - 500Gb and da2 - 500Gb. I make
ZFS stripe on it, and get 2x read speed.

If I have 2 same disk, but one of it have 2 slices: da0s1 - 200Gb and
da0s2 - 300Gb. After I make ZFS stripe on da0s2 and da2.
What would average read speed I  get? 2x or ((300+500)/500)x ?
What would stripe size I get? 800Gb or 600Gb?

Sorry for my English.

-- 
 Michael


From owner-freebsd-fs@FreeBSD.ORG  Sun Oct  7 20:07:48 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A29716A417
	for <freebsd-fs@FreeBSD.org>; Sun,  7 Oct 2007 20:07:48 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id C992213C494
	for <freebsd-fs@FreeBSD.org>; Sun,  7 Oct 2007 20:07:47 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	(c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l97K7bZA028273
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 8 Oct 2007 06:07:41 +1000
Date: Mon, 8 Oct 2007 06:07:36 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Dmitry Marakasov <amdmi3@amdmi3.ru>
In-Reply-To: <20071007181429.GB1082@hades.panopticon>
Message-ID: <20071008051733.T29782@delplex.bde.org>
References: <20071005004820.GA29814@hades.panopticon>
	<20071006080406.S689@besplex.bde.org>
	<20071007181429.GB1082@hades.panopticon>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Very slow writes on flash + msdosfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 20:07:48 -0000

On Sun, 7 Oct 2007, Dmitry Marakasov wrote:

> * Bruce Evans (brde@optusnet.com.au) wrote:
>> However, clustering won't help much for small files, due to BSD's
>> fundamental design error of per-vnode buffering.  With 340 files in a
>>
>> Async mounts would reduce the minimum number of writes per file to
>> about 1 (for the data block).  msdosfs doesn't implement them yet.
> Hm, true. I just found out that UFS is slow as well. Though it can write
> large files at almost 11 MB/s, small files are even slower. Async mount
> helps a lot, too bad msdosfs can't do it. So if this problem is
> fundamenal, and msdosfs can't work async,

msdosfs can work async.  My version does.

> isn't there some GEOM class
> that does simple memory caching?

I think there is, but I wouldn't use it for political reasons.

> I don't care what happens to data on
> flash if the power goes down in the middle of writing, but I'm just too
> jealous of how good Linux forks with the very same flash (I have 4Gb mem
> Linux box on my work, and whatever you copy to flash - movies, or tons
> of small files, cp finishes in a moment - all data is actually copied to
> memory, and than flushed on umount or sync - I guess on the maximum
> speed possible).

Linux might still take a long time for the unmount/sync.

Linux's block devices are useful for avoiding the corresponding slowness
for newfs  -- there is no need to bloat all utilities with buffering;
you just run them on the block device.  They are also useful for working
around the corresponding slowness for reading of small file systems
on slow media (maybe not flash drives, but CD/DVD) on machines with
large RAM -- preread the entire file system into the buffer cache via
a block device.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct  7 21:02:05 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA81916A417
	for <freebsd-fs@FreeBSD.org>; Sun,  7 Oct 2007 21:02:05 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.freebsd.org (Postfix) with ESMTP id 951EF13C448
	for <freebsd-fs@FreeBSD.org>; Sun,  7 Oct 2007 21:02:05 +0000 (UTC)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.61.3])
	by phk.freebsd.dk (Postfix) with ESMTP id E9B0917105;
	Sun,  7 Oct 2007 20:37:50 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id l97KbmHb020194;
	Sun, 7 Oct 2007 20:37:49 GMT (envelope-from phk@critter.freebsd.dk)
To: Bruce Evans <brde@optusnet.com.au>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Mon, 08 Oct 2007 06:07:36 +1000."
	<20071008051733.T29782@delplex.bde.org> 
Date: Sun, 07 Oct 2007 20:37:48 +0000
Message-ID: <20193.1191789468@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Very slow writes on flash + msdosfs 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 21:02:05 -0000


One of the reasons that writes to flash can be extremely slow is
that the built in wear-leveling gets overwhelmed.

There is a specification somewhere, that explains to camera
manufacturers how exactly they should perform the writes to
flash media to get maximum writing speed.

Unfortunately many flash producers think that is the only thing you
can use flash devices for, and their wear-leveling support only
this write mode.

M-Systems had a patent on monitoring the free cluster map of the
fat filesystem from the wear-leveling code, but I don't know how
wide-spread that has become yet.  Sandisk bought M-Systems some
years ago, so I bet they have it.

There exists ATA/whatever commands to tell a flash device that a
given range of sectors can be reclaimed by the wear-leveling code,
but we do not issue these when we delete files.

As a result, the wearleveling ready-pool is rapidly depleted, forcing
all sector writes to perform a block evacuation, erase and rewrite.

The BIO_DELETE request was intended to give us support for this,
unfortunately, flash vendors are not at all willing to officially
support the interface and thus it never truly got implemented.

The slow write speed also indicates that you are wearing your
flash devices out up in 1% of the time they should last.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 11:08:20 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2100416A41B
	for <freebsd-fs@FreeBSD.org>; Mon,  8 Oct 2007 11:08:20 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 04EF313C4B6
	for <freebsd-fs@FreeBSD.org>; Mon,  8 Oct 2007 11:08:20 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l98B8J1F083265
	for <freebsd-fs@FreeBSD.org>; Mon, 8 Oct 2007 11:08:19 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l98B8IkI083261
	for freebsd-fs@FreeBSD.org; Mon, 8 Oct 2007 11:08:18 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 8 Oct 2007 11:08:18 GMT
Message-Id: <200710081108.l98B8IkI083261@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to you
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 11:08:20 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o kern/114856  fs         [ntfs] [patch] Bug in NTFS allows bogus file modes.
o kern/116170  fs         Kernel panic when mounting /tmp

4 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/114847  fs         [ntfs] [patch] dirmask support for NTFS ala MSDOSFS

1 problem total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 12:15:40 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 15ACB16A41B;
	Mon,  8 Oct 2007 12:15:40 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id A67C213C448;
	Mon,  8 Oct 2007 12:15:39 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 42C3C45E90; Mon,  8 Oct 2007 14:15:38 +0200 (CEST)
Received: from localhost (pjd.wheel.pl [10.0.1.1])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id CE2BA45684;
	Mon,  8 Oct 2007 14:15:31 +0200 (CEST)
Date: Mon, 8 Oct 2007 14:15:23 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Message-ID: <20071008121523.GM2327@garage.freebsd.pl>
References: <20071005000046.GC92272@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="Tnj+unmjHTqEM5y0"
Content-Disposition: inline
In-Reply-To: <20071005000046.GC92272@garage.freebsd.pl>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 
	autolearn=ham version=3.0.4
Cc: freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 12:15:40 -0000


--Tnj+unmjHTqEM5y0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Here are some updates:

I was able to reproduce the panic by rsyncing big files and trying
bonnie++ test suggested in this thread.

Can you guys retry with this patch:

	http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch

It's a hack, yes, but allows to mitigate the problem quite well. I'm
looking for a solution that can be used for 7.0 before we find a better
fix.

BTW. To use ZFS you _must_ increase vm.kmem_size/vm.kmem_size_max.
If you have the problem discussed here and you're using standard values,
please retry with vm.kmem_size/vm.kmem_size_max set to at least 600MB in
/boot/loader.conf.

I'm not sure if it's not too late to ask re@ about increasing the
default kmem size at least on amd64. ~300MB we have there is silly
small.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--Tnj+unmjHTqEM5y0
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHCh9bForvXbEpPzQRAlLFAKD2Xo69Xid2JKshQ3ATi1m6MM/CrgCdFmN8
t4Ayssew0tklY0lh+iOSEFA=
=sRS8
-----END PGP SIGNATURE-----

--Tnj+unmjHTqEM5y0--

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 12:42:37 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 36ABB16A420;
	Mon,  8 Oct 2007 12:42:37 +0000 (UTC)
	(envelope-from master-list@arcor.de)
Received: from mail.i88.de (i88.de [88.198.25.141])
	by mx1.freebsd.org (Postfix) with ESMTP id DF45F13C494;
	Mon,  8 Oct 2007 12:42:36 +0000 (UTC)
	(envelope-from master-list@arcor.de)
Received: from localhost (localhost [127.0.0.1])
	by mail.i88.de (Postfix) with ESMTP id D5CB48450C;
	Mon,  8 Oct 2007 14:26:42 +0200 (CEST)
Received: from mail.i88.de ([127.0.0.1])
	by localhost (serv.i88.de [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id q12ieydvO5fI; Mon,  8 Oct 2007 14:26:34 +0200 (CEST)
Received: from [192.168.10.13] (p549D6E58.dip.t-dialin.net [84.157.110.88])
	by mail.i88.de (Postfix) with ESMTP id CC0DA84507;
	Mon,  8 Oct 2007 14:26:33 +0200 (CEST)
Message-ID: <470A21F8.8030506@arcor.de>
Date: Mon, 08 Oct 2007 14:26:32 +0200
From: Micha Mutschler <master-list@arcor.de>
User-Agent: Thunderbird 1.5 (X11/20051201)
MIME-Version: 1.0
To: freebsd-fs@freebsd.org,  freebsd-current@freebsd.org
References: <20071005000046.GC92272@garage.freebsd.pl>
In-Reply-To: <20071005000046.GC92272@garage.freebsd.pl>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 12:42:37 -0000

Pawel Jakub Dawidek wrote:

> If you can still see those panic, please let me know as soon as possible
> and try to describe what your workload looks like, how to reproduce it,
> etc. I'd really like ZFS to be rock-stable for 7.0 even on i386.
> 

Hi!

I was doing a ports rsync from a zfs to /usr/ports (ufs). That worked
fine. The crash happend as I started to copy (parallel to the rsync) a
huge file from a nfs server to the zfs.

For 512 MB RAM I've set:

kern.maxvnodes="25000"
vfs.zfs.prefetch_disable="1"
vfs.zfs.arc_max="52428800"
vm.kmem_size_max="402653184"
vfs.zfs.zil_disable="1"

[root@filer ~]# zpool status
  pool: xport
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        xport       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad9     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad11    ONLINE       0     0     0

errors: No known data errors
[root@filer ~]#
[root@filer ~]# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
xport        33.8G   514G  33.6G  /xport
xport/ports   212M   514G   212M  /xport/ports
xport/t      26.9K   514G  26.9K  /xport/t
[root@filer ~]#


(I'm using 200709)
regarts,

Micha Mutschler


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 13:30:51 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D62E416A417;
	Mon,  8 Oct 2007 13:30:51 +0000 (UTC) (envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 8AA6113C45B;
	Mon,  8 Oct 2007 13:30:51 +0000 (UTC) (envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id 4A0DB20C9;
	Mon,  8 Oct 2007 15:30:42 +0200 (CEST)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: -0.0/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id B9A0120C8;
	Mon,  8 Oct 2007 15:30:41 +0200 (CEST)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 8395284481; Mon,  8 Oct 2007 15:30:41 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
Date: Mon, 08 Oct 2007 15:30:41 +0200
Message-ID: <86bqb97mym.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 13:30:51 -0000

Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:
> I'm not sure if it's not too late to ask re@ about increasing the
> default kmem size at least on amd64. ~300MB we have there is silly
> small.

Speaking of which, I tried setting vm.kmem_size to 2G on a C2D system
with 4 GB RAM, but it simply panics:

OK set vm.kmem_size=3D2G
OK boot -s
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-CURRENT #8: Tue Sep 25 13:31:41 CEST 2007
    des@ds4.des.no:/usr/obj/usr/src/sys/ds4
kmem_suballoc: bad status return of 3.
panic: kmem_suballoc
cpuid =3D 0
KDB: enter: panic
[thread pid 0 tid 0 ]
Stopped at      kdb_enter+0x31: popq    %rbp
db> where
Tracing pid 0 tid 0 td 0xffffffff805af4a0
kdb_enter() at kdb_enter+0x31
panic() at panic+0x166
kmem_suballoc() at kmem_suballoc+0xc3
kmeminit() at kmeminit+0x16e
mi_startup() at mi_startup+0x59
btext() at btext+0x2c
db> reset

with vm.kmem_size unset and vm.kmem_size_max=3D2G, it boots fine:

Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (2400.01-MHz K8-class =
CPU)
  Origin =3D "GenuineIntel"  Id =3D 0x6f6  Stepping =3D 6
  Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG=
E,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=3D0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=3D0x20100800<SYSCALL,NX,LM>
  AMD Features2=3D0x1<LAHF>
  Cores per package: 2
usable memory =3D 4286398464 (4087 MB)
avail memory  =3D 4130979840 (3939 MB)
ACPI APIC Table: <GBT    GBTUACPI>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
[...]

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 13:39:03 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1EF9216A417;
	Mon,  8 Oct 2007 13:39:03 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id D810B13C47E;
	Mon,  8 Oct 2007 13:39:01 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 39C0145EA7; Mon,  8 Oct 2007 15:39:00 +0200 (CEST)
Received: from localhost (pjd.wheel.pl [10.0.1.1])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id DC6B145683;
	Mon,  8 Oct 2007 15:38:54 +0200 (CEST)
Date: Mon, 8 Oct 2007 15:38:46 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Dag-Erling Sm??rgrav <des@des.no>
Message-ID: <20071008133846.GP2327@garage.freebsd.pl>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
	<86bqb97mym.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="yEbVe0JFHWhrOjGA"
Content-Disposition: inline
In-Reply-To: <86bqb97mym.fsf@ds4.des.no>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 
	autolearn=ham version=3.0.4
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 13:39:03 -0000


--yEbVe0JFHWhrOjGA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 08, 2007 at 03:30:41PM +0200, Dag-Erling Sm??rgrav wrote:
> Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:
> > I'm not sure if it's not too late to ask re@ about increasing the
> > default kmem size at least on amd64. ~300MB we have there is silly
> > small.
>=20
> Speaking of which, I tried setting vm.kmem_size to 2G on a C2D system
> with 4 GB RAM, but it simply panics:
>=20
> OK set vm.kmem_size=3D2G
> OK boot -s
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> Copyright (c) 1992-2007 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 7.0-CURRENT #8: Tue Sep 25 13:31:41 CEST 2007
>     des@ds4.des.no:/usr/obj/usr/src/sys/ds4
> kmem_suballoc: bad status return of 3.
> panic: kmem_suballoc
> cpuid =3D 0
> KDB: enter: panic
> [thread pid 0 tid 0 ]
> Stopped at      kdb_enter+0x31: popq    %rbp
> db> where
> Tracing pid 0 tid 0 td 0xffffffff805af4a0
> kdb_enter() at kdb_enter+0x31
> panic() at panic+0x166
> kmem_suballoc() at kmem_suballoc+0xc3
> kmeminit() at kmeminit+0x16e
> mi_startup() at mi_startup+0x59
> btext() at btext+0x2c
> db> reset
>=20
> with vm.kmem_size unset and vm.kmem_size_max=3D2G, it boots fine:
>=20
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (2400.01-MHz K8-clas=
s CPU)
>   Origin =3D "GenuineIntel"  Id =3D 0x6f6  Stepping =3D 6
>   Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,=
PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>   Features2=3D0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PD=
CM>
>   AMD Features=3D0x20100800<SYSCALL,NX,LM>
>   AMD Features2=3D0x1<LAHF>
>   Cores per package: 2
> usable memory =3D 4286398464 (4087 MB)
> avail memory  =3D 4130979840 (3939 MB)
> ACPI APIC Table: <GBT    GBTUACPI>
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
> ioapic0: Changing APIC ID to 2
> ioapic0 <Version 2.0> irqs 0-23 on motherboard
> [...]

For i386 one has to set, eg. 'options KVA_PAGES=3D512' to the kernel
config to be able to define kmem larger than ~700MB. I guess you're
running amd64, maybe there is similar requirement?

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--yEbVe0JFHWhrOjGA
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHCjLmForvXbEpPzQRAkoHAKDf/geY/dDi93PiMUUQVReTNb+AhwCgonvy
erlqcJb/7HloqTc2UVOSnYs=
=lU5S
-----END PGP SIGNATURE-----

--yEbVe0JFHWhrOjGA--

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 13:44:18 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 83D0516A418;
	Mon,  8 Oct 2007 13:44:18 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id E7DE213C467;
	Mon,  8 Oct 2007 13:44:17 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id l98DiEUh016370;
	Mon, 8 Oct 2007 15:44:15 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id l98Di8fZ077782
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 8 Oct 2007 15:44:08 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id l98Di882006881;
	Mon, 8 Oct 2007 15:44:08 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id l98Di8rY006880;
	Mon, 8 Oct 2007 15:44:08 +0200 (CEST) (envelope-from ticso)
Date: Mon, 8 Oct 2007 15:44:07 +0200
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= <des@des.no>
Message-ID: <20071008134407.GB67153@cicely12.cicely.de>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
	<86bqb97mym.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <86bqb97mym.fsf@ds4.des.no>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.1.7
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org,
	Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 13:44:18 -0000

On Mon, Oct 08, 2007 at 03:30:41PM +0200, Dag-Erling Sm�rgrav wrote:
> Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:
> > I'm not sure if it's not too late to ask re@ about increasing the
> > default kmem size at least on amd64. ~300MB we have there is silly
> > small.
> 
> Speaking of which, I tried setting vm.kmem_size to 2G on a C2D system
> with 4 GB RAM, but it simply panics:
> 
> OK set vm.kmem_size=2G

This sounds like there's a signed 32 bit limit somewhere.
Wild guess:
Maybe it is a loader parsing limitation and not one of the kernel.

> OK boot -s
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> Copyright (c) 1992-2007 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 7.0-CURRENT #8: Tue Sep 25 13:31:41 CEST 2007
>     des@ds4.des.no:/usr/obj/usr/src/sys/ds4
> kmem_suballoc: bad status return of 3.
> panic: kmem_suballoc
> cpuid = 0
> KDB: enter: panic
> [thread pid 0 tid 0 ]
> Stopped at      kdb_enter+0x31: popq    %rbp
> db> where
> Tracing pid 0 tid 0 td 0xffffffff805af4a0
> kdb_enter() at kdb_enter+0x31
> panic() at panic+0x166
> kmem_suballoc() at kmem_suballoc+0xc3
> kmeminit() at kmeminit+0x16e
> mi_startup() at mi_startup+0x59
> btext() at btext+0x2c
> db> reset
> 
> with vm.kmem_size unset and vm.kmem_size_max=2G, it boots fine:

What is the vakue for vm.kmem_size_max after booting?
Maybe a errously parsed value on max isn't that critical.

-- 
B.Walter                http://www.bwct.de      http://www.fizon.de
bernd@bwct.de           info@bwct.de            support@fizon.de

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 14:42:39 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9DFF16A468;
	Mon,  8 Oct 2007 14:42:39 +0000 (UTC) (envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 8CCEB13C480;
	Mon,  8 Oct 2007 14:42:39 +0000 (UTC) (envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id 4AAFD20BC;
	Mon,  8 Oct 2007 16:42:30 +0200 (CEST)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: -0.0/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id 32E6020BB;
	Mon,  8 Oct 2007 16:42:30 +0200 (CEST)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 2179684486; Mon,  8 Oct 2007 16:42:30 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: ticso@cicely.de
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl> <86bqb97mym.fsf@ds4.des.no>
	<20071008134407.GB67153@cicely12.cicely.de>
Date: Mon, 08 Oct 2007 16:42:30 +0200
In-Reply-To: <20071008134407.GB67153@cicely12.cicely.de> (Bernd Walter's
	message of "Mon\, 8 Oct 2007 15\:44\:07 +0200")
Message-ID: <864ph17jmx.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org,
	Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 14:42:40 -0000

Bernd Walter <ticso@cicely12.cicely.de> writes:
> On Mon, Oct 08, 2007 at 03:30:41PM +0200, Dag-Erling Sm=C3=B8rgrav wrote:
> > Speaking of which, I tried setting vm.kmem_size to 2G on a C2D system
> > with 4 GB RAM, but it simply panics:
> >=20
> > OK set vm.kmem_size=3D2G
> This sounds like there's a signed 32 bit limit somewhere.

Yes, this is really stupid - vm.kmem_size is an int.  Thanks for setting
me on the right track.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 15:29:02 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6CEFF16A41A;
	Mon,  8 Oct 2007 15:29:02 +0000 (UTC) (envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 22C0713C448;
	Mon,  8 Oct 2007 15:29:02 +0000 (UTC) (envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id 857F820BF;
	Mon,  8 Oct 2007 17:28:52 +0200 (CEST)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: -0.0/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id 0182420BA;
	Mon,  8 Oct 2007 17:28:51 +0200 (CEST)
Received: by ds4.des.no (Postfix, from userid 1001)
	id C181F8442C; Mon,  8 Oct 2007 17:28:51 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: ticso@cicely.de
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl> <86bqb97mym.fsf@ds4.des.no>
	<20071008134407.GB67153@cicely12.cicely.de>
	<864ph17jmx.fsf@ds4.des.no>
Date: Mon, 08 Oct 2007 17:28:51 +0200
In-Reply-To: <864ph17jmx.fsf@ds4.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?=
	=?utf-8?Q?rav=22's?= message of
	"Mon\, 08 Oct 2007 16\:42\:30 +0200")
Message-ID: <86ve9hr5fw.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org,
	Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 15:29:02 -0000

Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
> Bernd Walter <ticso@cicely12.cicely.de> writes:
> > On Mon, Oct 08, 2007 at 03:30:41PM +0200, Dag-Erling Sm=C3=B8rgrav wrot=
e:
> > > Speaking of which, I tried setting vm.kmem_size to 2G on a C2D system
> > > with 4 GB RAM, but it simply panics:
> > >=20
> > > OK set vm.kmem_size=3D2G
> > This sounds like there's a signed 32 bit limit somewhere.
> Yes, this is really stupid - vm.kmem_size is an int.  Thanks for setting
> me on the right track.

Actually, somebody pointed out to me that due to the memory model used,
amd64 can only have 2 GB KVA, and that includes other things besides
kmem, so even with the tunable bug fixed, the best I can hope for is
probably 1.5 GB.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 16:27:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 994F516A417;
	Mon,  8 Oct 2007 16:27:43 +0000 (UTC)
	(envelope-from cb@severious.net)
Received: from ion.gank.org (ion.gank.org [69.55.238.164])
	by mx1.freebsd.org (Postfix) with ESMTP id 81EA313C45D;
	Mon,  8 Oct 2007 16:27:43 +0000 (UTC)
	(envelope-from cb@severious.net)
Received: by ion.gank.org (Postfix, from userid 1001)
	id 32799115D7; Mon,  8 Oct 2007 11:27:43 -0500 (CDT)
Date: Mon, 8 Oct 2007 11:27:42 -0500
From: Craig Boston <cb@severious.net>
To: Bakul Shah <bakul@bitblocks.com>
Message-ID: <20071008162730.GA98555@nowhere>
Mail-Followup-To: Craig Boston <cb@severious.net>,
	Bakul Shah <bakul@bitblocks.com>,
	Pawel Jakub Dawidek <pjd@FreeBSD.org>, freebsd-fs@FreeBSD.org,
	freebsd-current@FreeBSD.org
References: <20071005180119.GE98210@garage.freebsd.pl>
	<20071006174614.E1D575B52@mail.bitblocks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071006174614.E1D575B52@mail.bitblocks.com>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org,
	Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 16:27:43 -0000

On Sat, Oct 06, 2007 at 10:46:14AM -0700, Bakul Shah wrote:
> There two differences: R1.11 arc_reclaim_needed() returns 1 when 80%
> of kmem is used, while R1.10 does so at 50% of kmem.

I'll bet it's this change that provokes the problem; as even when
manually tuning the ARC size to 1/2 kmem_size or lower I still sometimes
get panics.  Probably what's happening is kmem usage gets high from
other things, and when there's a sudden spike zfs can't react fast
enough and shrink the ARC.  At 50% it acts more conservatively so
there's more memory available for burst usage.

I noticed that some of the time, on my mostly-stable system, the panic
happens when the nvidia driver is trying to allocate a 128K chunk.
vmstat -m only shows nvidia at ~12MB total though, so I think it just
gets hit because it malloc/frees large blocks more than most subsystems.

The 512MB memory one doesn't run X at all, but it's by far the least
stable of the bunch.  Unfortunately it doesn't seem to want to create
crash dumps for some reason.

> Still, fiddling with limits to make the panic go away seems
> to somehow miss the point as I always worry it will show up
> under other conditions.  May be there a way to ensure that
> kmem_map is never too small or may be zfs can reserve a few
> resources for its own use so that it can get out of a tight
> spot?

I don't think having ZFS reserve resources would help, as at least for
me the kmem_map panic doesn't always happen within ZFS code.  It's just
that the increased kernel memory demands from ZFS are causing it to run
out at times.

I still think the best course is to have ZFS's cache use VM objects like
the buffer cache does, but I know this is a very nontrivial thing to do.

Craig

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 16:33:51 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1B1E616A417;
	Mon,  8 Oct 2007 16:33:51 +0000 (UTC)
	(envelope-from cb@severious.net)
Received: from ion.gank.org (ion.gank.org [69.55.238.164])
	by mx1.freebsd.org (Postfix) with ESMTP id 0597F13C45A;
	Mon,  8 Oct 2007 16:33:50 +0000 (UTC)
	(envelope-from cb@severious.net)
Received: by ion.gank.org (Postfix, from userid 1001)
	id AF63611794; Mon,  8 Oct 2007 11:33:50 -0500 (CDT)
Date: Mon, 8 Oct 2007 11:33:49 -0500
From: Craig Boston <cb@severious.net>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Message-ID: <20071008163349.GB98555@nowhere>
Mail-Followup-To: Craig Boston <cb@severious.net>,
	Pawel Jakub Dawidek <pjd@FreeBSD.org>, freebsd-fs@FreeBSD.org,
	freebsd-current@FreeBSD.org
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071008121523.GM2327@garage.freebsd.pl>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 16:33:51 -0000

On Mon, Oct 08, 2007 at 02:15:23PM +0200, Pawel Jakub Dawidek wrote:
> It's a hack, yes, but allows to mitigate the problem quite well. I'm
> looking for a solution that can be used for 7.0 before we find a better
> fix.

Um, tsleep()ing when M_NOWAIT is set?  Yes, I'd call that quite a hack
:)

Sorry to spam the thread again, but one thing I noticed is that zfs does
an awful lot of allocations of various sizes:

$ uptime
11:30AM  up 2 days, 18:06, 11 users, load averages: 0.05, 0.12, 0.15
(was idle most of the weekend)

$ vmstat -m | grep solaris
      solaris 83176 341456K       - 53216607  16,32,64,128,256,512,1024,2048,4096

I'm not completely up-to-date on what algorithm the kernel allocator is
using these days, but is it possible that kernel memory is getting
fragmented by all of those allocations?

Craig

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 17:18:13 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A4E8916A419
	for <freebsd-fs@freebsd.org>; Mon,  8 Oct 2007 17:18:13 +0000 (UTC)
	(envelope-from kometen@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.190])
	by mx1.freebsd.org (Postfix) with ESMTP id 2DB5713C467
	for <freebsd-fs@freebsd.org>; Mon,  8 Oct 2007 17:18:11 +0000 (UTC)
	(envelope-from kometen@gmail.com)
Received: by nf-out-0910.google.com with SMTP id b2so1023373nfb
	for <freebsd-fs@freebsd.org>; Mon, 08 Oct 2007 10:18:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=DFqOBvECxVKQzhVij+f+RDRd6rcieximN8lGL0C9N68=;
	b=bsZUEtcC0S7M8kabIsDAUYNj5imp2zaMa7TaqpxmoT9kuKu2NYd6u5RSIqjOgMACliUJvrhopTJlB+A5b+4CccuDclT4fiIb9JaFXxPI+Vbhr/pfEYnGMMXOZhs3vT7WTV5fD8YoOd01DhYZWHcbJ/eR++OWTiDlvgI7jOeIAEI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=lCyhIjEDU0m7klQn/8Ox33YeGONoFXyXgAi+GZWl4brhbHPkm/QFrLzEo9zVd7higDOa4xthhj7HAa77kvm7rmgBstyXvzghbhR0I/220fY0Wt+9D8dcJa0bRzjJTS3oqcYXC/fuiC2raT3x5TyXIt6ZD2UM9khp9A/ixECTM6I=
Received: by 10.78.193.5 with SMTP id q5mr10781511huf.1191863890340;
	Mon, 08 Oct 2007 10:18:10 -0700 (PDT)
Received: by 10.78.146.10 with HTTP; Mon, 8 Oct 2007 10:18:10 -0700 (PDT)
Message-ID: <b41c75520710081018wabadea0g32517eb99665c29a@mail.gmail.com>
Date: Mon, 8 Oct 2007 19:18:10 +0200
From: "Claus Guttesen" <kometen@gmail.com>
To: "Pawel Jakub Dawidek" <pjd@freebsd.org>
In-Reply-To: <20071008121523.GM2327@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 17:18:13 -0000

> I was able to reproduce the panic by rsyncing big files and trying
> bonnie++ test suggested in this thread.
>
> Can you guys retry with this patch:
>
>         http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch
>
> It's a hack, yes, but allows to mitigate the problem quite well. I'm
> looking for a solution that can be used for 7.0 before we find a better
> fix.

Congrats Pawel! You made my server survive my rsync of 90 GB. :-)

This is on same src as the one that required a reboot except for your
patch. So this fix does 'alleviate kmem_map too small' in my case.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 18:14:49 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7466B16A41A;
	Mon,  8 Oct 2007 18:14:49 +0000 (UTC)
	(envelope-from cb@severious.net)
Received: from ion.gank.org (ion.gank.org [69.55.238.164])
	by mx1.freebsd.org (Postfix) with ESMTP id 6307713C457;
	Mon,  8 Oct 2007 18:14:49 +0000 (UTC)
	(envelope-from cb@severious.net)
Received: by ion.gank.org (Postfix, from userid 1001)
	id 05316110F2; Mon,  8 Oct 2007 13:14:49 -0500 (CDT)
Date: Mon, 8 Oct 2007 13:14:47 -0500
From: Craig Boston <cb@severious.net>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>, freebsd-fs@FreeBSD.org,
	freebsd-current@FreeBSD.org
Message-ID: <20071008181447.GD98555@nowhere>
Mail-Followup-To: Craig Boston <cb@severious.net>,
	Pawel Jakub Dawidek <pjd@FreeBSD.org>, freebsd-fs@FreeBSD.org,
	freebsd-current@FreeBSD.org
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
	<20071008163349.GB98555@nowhere>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071008163349.GB98555@nowhere>
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 18:14:49 -0000

On Mon, Oct 08, 2007 at 11:33:49AM -0500, Craig Boston wrote:
> Um, tsleep()ing when M_NOWAIT is set?  Yes, I'd call that quite a hack
> :)

Oops, my binary logic is inverted today.  Please ignore.

Craig

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct  8 20:28:36 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 90C3F16A417;
	Mon,  8 Oct 2007 20:28:36 +0000 (UTC)
	(envelope-from gb@isis.u-strasbg.fr)
Received: from mailhost.u-strasbg.fr (mailhost.u-strasbg.fr
	[IPv6:2001:660:2402::156])
	by mx1.freebsd.org (Postfix) with ESMTP id 2CF6913C474;
	Mon,  8 Oct 2007 20:28:36 +0000 (UTC)
	(envelope-from gb@isis.u-strasbg.fr)
Received: from 6nq.u-strasbg.fr (mojito.u-strasbg.fr
	[IPv6:2001:660:4701:1002::3])
	by mailhost.u-strasbg.fr (8.13.8/jtpda-5.5pre1) with ESMTP id
	l98KSTS7010471 ; Mon, 8 Oct 2007 22:28:29 +0200 (CEST)
Received: by 6nq.u-strasbg.fr (Postfix, from userid 1001)
	id 1EB8D8184; Mon,  8 Oct 2007 22:27:44 +0200 (CEST)
Date: Mon, 8 Oct 2007 22:27:44 +0200
From: Guy Brand <gb@isis.u-strasbg.fr>
To: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Message-ID: <20071008202743.GA1555@isis.u-strasbg.fr>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <20071008121523.GM2327@garage.freebsd.pl>
x-gpg-fingerprint: B423 4924 012E 52F3 BA9E  547F CC8C 0BC5 9C0E B1CA
x-gpg-key: 9C0EB1CA
User-Agent: Mutt/1.5.16 (2007-06-09)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0
	(mailhost.u-strasbg.fr [IPv6:2001:660:2402::156]);
	Mon, 08 Oct 2007 22:28:29 +0200 (CEST)
X-Virus-Scanned: ClamAV 0.88.7/4507/Mon Oct 8 20:42:59 2007 on mr6.u-strasbg.fr
X-Virus-Status: Clean
X-Spam-Status: No, score=0.1 required=5.0 tests=AWL,NO_RELAYS
	autolearn=disabled version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on mr6.u-strasbg.fr
Cc: 
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2007 20:28:36 -0000

Pawel Jakub Dawidek (pjd@freebsd.org) on 08/10/2007 at 14:15 wrote:

> Can you guys retry with this patch:
> 
> 	http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch
> 
> It's a hack, yes, but allows to mitigate the problem quite well. I'm
> looking for a solution that can be used for 7.0 before we find a better
> fix.
> 
> BTW. To use ZFS you _must_ increase vm.kmem_size/vm.kmem_size_max.
> If you have the problem discussed here and you're using standard values,
> please retry with vm.kmem_size/vm.kmem_size_max set to at least 600MB in
> /boot/loader.conf.

  Hi,

  
  On a Lenovo X61s from yesterday (FreeBSD 7.0-CURRENT #1: Sun Oct  7
  21:55:14 CEST 2007) with

	hw.model: Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GHz
	hw.physmem: 2091008000
	hw.machine_arch: i386
	hw.realmem: 2104164352

  and a single zpool containing ad4s2 slice (all FS except /). I tuned
  vm.kmem_size and ran ad vitam:

    - vt0: rsync /usr/src /tmp/. && rm -rf /tmp/src
    - vt1: dd if=/dev/zero of=/tmp/file bs=1k count=1000000

  The kernel panics until vm.kmem_size/vm.kmem_size_max is set to
  671088640 where I could have both loops running for 3 hours. But
  then the system was hanging: rsync and dd seem stopped in their
  execution. This was confirmed on another term (load of 0). Any
  attempt to R/W from the poll (touch/ls) hangs the terminal. I could
  still log in to a term, but reboot/shutdown failed.

  I applied your patch and re-run the tests. After eight hours, the
  kernel is still up and usable. Thanks Pawel.

> I'm not sure if it's not too late to ask re@ about increasing the
> default kmem size at least on amd64. ~300MB we have there is silly
> small.

  Default vm.kmem_size value is 335544320 on my laptop: a portsnap
  fetch + extract from scratch panics with kmem_map too small.

-- 
  bug


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct  9 13:27:01 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0EBC716A417
	for <freebsd-fs@freebsd.org>; Tue,  9 Oct 2007 13:27:01 +0000 (UTC)
	(envelope-from bg@sics.se)
Received: from letter.sics.se (letter.sics.se [193.10.64.6])
	by mx1.freebsd.org (Postfix) with ESMTP id A59BD13C455
	for <freebsd-fs@freebsd.org>; Tue,  9 Oct 2007 13:27:00 +0000 (UTC)
	(envelope-from bg@sics.se)
Received: from sics.se (ibook.sics.se [193.10.66.104])
	by letter.sics.se (Postfix) with ESMTP id 3005D400D0;
	Tue,  9 Oct 2007 15:02:00 +0200 (CEST)
Date: Tue, 9 Oct 2007 15:01:49 +0200
From: Bjorn Gronvall <bg@sics.se>
To: freebsd-fs@freebsd.org
Message-ID: <20071009150149.337279ce@ibook.sics.se>
Organization: SICS.SE
X-Mailer: Claws Mail 2.9.1 (GTK+ 2.10.6; i386-portbld-freebsd6.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Cc: 
Subject: NFS server does not cluster writes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2007 13:27:01 -0000

Hi,

The current NFS server does only cluster reads but never writes which
in turn leads to poor sequential-write performance. The attached patch
makes the following changes:

1/ Rearrange the code so that the same code can be used to detect both
   sequential reads and writes.

2/ Merge in updates from vfs_vnops.c::sequential_heuristic.

3/ Use double hashing in order to avoid hash-clustering in the
   nfsheur table.

4/ Pack nfsheur table more efficiently.

5/ Tolerate reordered RPCs to some small amount (initially suggested
   by Ellard and Seltzer).

6/ Back-off from sequential access rather than immediately switching to
   random access.

These changes has been tested on a low performance ATA disk (with
write caching disabled) and speeded up large sequential writes by a
factor of four. I would be interested in getting numbers from more
normal server configurations if somebody has the time to try it out.

Cheers,
/b

-- 
  _     _                                           ,_______________.
Bjorn Gronvall (Bj�rn Gr�nvall)                    /_______________/|
Swedish Institute of Computer Science              |               ||
PO Box 1263, S-164 29 Kista, Sweden                | Schroedingers ||
Email: bg@sics.se, Phone +46 -8 633 15 25          |      Cat      |/
Cellular +46 -70 768 06 35, Fax +46 -8 751 72 30   '---------------'


--- nfs_serv.c.orig	2007-10-09 12:03:00.000000000 +0200
+++ nfs_serv.c	2007-10-09 13:50:02.000000000 +0200
@@ -106,18 +106,98 @@
 
 #define MAX_COMMIT_COUNT	(1024 * 1024)
 
-#define NUM_HEURISTIC		1017
+#define NUM_HEURISTIC		1031 /* Must be prime! */
+#define HASH_MAXSTEP		0x3ff
 #define NHUSE_INIT		64
 #define NHUSE_INC		16
 #define NHUSE_MAX		2048
+CTASSERT(NUM_HEURISTIC > (HASH_MAXSTEP + 1));
 
 static struct nfsheur {
+	off_t nh_nextoff;	/* next offset for sequential detection */
 	struct vnode *nh_vp;	/* vp to match (unreferenced pointer) */
-	off_t nh_nextr;		/* next offset for sequential detection */
-	int nh_use;		/* use count for selection */
-	int nh_seqcount;	/* heuristic */
+	uint16_t nh_use;	/* use count for selection */
+	uint16_t nh_seqcount;	/* in units of BKVASIZE bytes */
 } nfsheur[NUM_HEURISTIC];
 
+/*
+ * Sequential heuristic - detect sequential operation
+ */
+static
+struct nfsheur *
+sequential_heuristic(const struct uio *uio, struct vnode *vp)
+{
+	struct nfsheur *nh;
+	unsigned hi, step;	/* Double hashing */
+	int try = 32;		/* A bit large? */
+	int nblocks;
+
+	/*
+	 * Locate best candidate
+	 */
+
+	hi =   ((unsigned)vp / sizeof(struct vnode)) % NUM_HEURISTIC;
+	step = ((unsigned)vp / sizeof(struct vnode)) & HASH_MAXSTEP;
+	step++;			/* Step must not be zero. */
+	nh = &nfsheur[hi];
+
+	while (try--) {
+		if (nfsheur[hi].nh_vp == vp) {
+			nh = &nfsheur[hi];
+			break;
+		}
+		if (nfsheur[hi].nh_use > 0)
+			--nfsheur[hi].nh_use;
+		hi = hi + step;
+		if (hi >= NUM_HEURISTIC)
+			hi -= NUM_HEURISTIC;
+		if (nfsheur[hi].nh_use < nh->nh_use)
+			nh = &nfsheur[hi];
+	}
+
+	if (nh->nh_vp != vp) {
+		nh->nh_vp = vp;
+		nh->nh_nextoff = uio->uio_offset;
+		nh->nh_use = NHUSE_INIT;
+		if (uio->uio_offset == 0)
+			nh->nh_seqcount = 4;
+		else
+			nh->nh_seqcount = 1;
+	}
+
+	nh->nh_use += NHUSE_INC;
+	if (nh->nh_use > NHUSE_MAX)
+		nh->nh_use = NHUSE_MAX;
+
+	/*
+	 * Calculate heuristic
+	 */
+
+	/*
+	 * XXX we assume that the filesystem block size is
+	 * the default.  Not true, but still gives us a pretty
+	 * good indicator of how sequential the read operations
+	 * are.
+	 */
+	nblocks = (uio->uio_resid + BKVASIZE - 1) / BKVASIZE;
+	if ((uio->uio_offset == 0 && nh->nh_seqcount > 0) ||
+	    uio->uio_offset == nh->nh_nextoff) {
+		nh->nh_seqcount += nblocks;
+		if (nh->nh_seqcount > IO_SEQMAX)
+			nh->nh_seqcount = IO_SEQMAX;
+	} else if (qabs(uio->uio_offset - nh->nh_nextoff) <=
+		   4*imax(BKVASIZE, uio->uio_resid)) {
+		/* Probably reordered RPC, do nothing. */
+	} else {
+		nh->nh_seqcount /= 4;
+		/* RPCs larger than 1 block should cluster IO. */
+		if (nblocks > 1 && nh->nh_seqcount < nblocks)
+			nh->nh_seqcount = nblocks;
+	}
+
+	return (nh);
+}
+
 /* Global vars */
 
 int nfsrvw_procrastinate = NFS_GATHERDELAY * 1000;
@@ -855,61 +935,6 @@
 	else
 		cnt = reqlen;
 
-	/*
-	 * Calculate seqcount for heuristic
-	 */
-
-	{
-		int hi;
-		int try = 32;
-
-		/*
-		 * Locate best candidate
-		 */
-
-		hi = ((int)(vm_offset_t)vp / sizeof(struct vnode)) % NUM_HEURISTIC;
-		nh = &nfsheur[hi];
-
-		while (try--) {
-			if (nfsheur[hi].nh_vp == vp) {
-				nh = &nfsheur[hi];
-				break;
-			}
-			if (nfsheur[hi].nh_use > 0)
-				--nfsheur[hi].nh_use;
-			hi = (hi + 1) % NUM_HEURISTIC;
-			if (nfsheur[hi].nh_use < nh->nh_use)
-				nh = &nfsheur[hi];
-		}
-
-		if (nh->nh_vp != vp) {
-			nh->nh_vp = vp;
-			nh->nh_nextr = off;
-			nh->nh_use = NHUSE_INIT;
-			if (off == 0)
-				nh->nh_seqcount = 4;
-			else
-				nh->nh_seqcount = 1;
-		}
-
-		/*
-		 * Calculate heuristic
-		 */
-
-		if ((off == 0 && nh->nh_seqcount > 0) || off == nh->nh_nextr) {
-			if (++nh->nh_seqcount > IO_SEQMAX)
-				nh->nh_seqcount = IO_SEQMAX;
-		} else if (nh->nh_seqcount > 1) {
-			nh->nh_seqcount = 1;
-		} else {
-			nh->nh_seqcount = 0;
-		}
-		nh->nh_use += NHUSE_INC;
-		if (nh->nh_use > NHUSE_MAX)
-			nh->nh_use = NHUSE_MAX;
-		ioflag |= nh->nh_seqcount << IO_SEQSHIFT;
-        }
-
 	nfsm_reply(NFSX_POSTOPORFATTR(v3) + 3 * NFSX_UNSIGNED+nfsm_rndup(cnt));
 	if (v3) {
 		tl = nfsm_build(u_int32_t *, NFSX_V3FATTR + 4 * NFSX_UNSIGNED);
@@ -967,9 +992,11 @@
 		uiop->uio_resid = len;
 		uiop->uio_rw = UIO_READ;
 		uiop->uio_segflg = UIO_SYSSPACE;
+		nh = sequential_heuristic(uiop, vp);
+		ioflag |= nh->nh_seqcount << IO_SEQSHIFT;
 		error = VOP_READ(vp, uiop, IO_NODELOCKED | ioflag, cred);
 		off = uiop->uio_offset;
-		nh->nh_nextr = off;
+		nh->nh_nextoff = off;
 		FREE((caddr_t)iv2, M_TEMP);
 		if (error || (getret = VOP_GETATTR(vp, vap, cred, td))) {
 			if (!error)
@@ -1037,12 +1064,14 @@
 	nfsfh_t nfh;
 	fhandle_t *fhp;
 	struct uio io, *uiop = &io;
+	struct nfsheur *nh;
 	off_t off;
 	struct mount *mntp = NULL;
 	int tvfslocked;
 	int vfslocked;
 
 	nfsdbprintf(("%s %d\n", __FILE__, __LINE__));
+	bwillwrite();
 	vfslocked = 0;
 	if (mrep == NULL) {
 		*mrq = NULL;
@@ -1175,9 +1204,12 @@
 	    uiop->uio_segflg = UIO_SYSSPACE;
 	    uiop->uio_td = NULL;
 	    uiop->uio_offset = off;
+	    nh = sequential_heuristic(uiop, vp);
+	    ioflags |= nh->nh_seqcount << IO_SEQSHIFT;
 	    error = VOP_WRITE(vp, uiop, ioflags, cred);
 	    /* XXXRW: unlocked write. */
 	    nfsrvstats.srvvop_writes++;
+	    nh->nh_nextoff = uiop->uio_offset;
 	    FREE((caddr_t)iv, M_TEMP);
 	}
 	aftat_ret = VOP_GETATTR(vp, vap, cred, td);

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct  9 19:27:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 62FE516A418;
	Tue,  9 Oct 2007 19:27:43 +0000 (UTC)
	(envelope-from darrenr@freebsd.org)
Received: from out1.smtp.messagingengine.com (out1.smtp.messagingengine.com
	[66.111.4.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 1459013C44B;
	Tue,  9 Oct 2007 19:27:42 +0000 (UTC)
	(envelope-from darrenr@freebsd.org)
Received: from compute1.internal (compute1.internal [10.202.2.41])
	by out1.messagingengine.com (Postfix) with ESMTP id 9AE0F30E20;
	Tue,  9 Oct 2007 15:27:42 -0400 (EDT)
Received: from heartbeat1.messagingengine.com ([10.202.2.160])
	by compute1.internal (MEProxy); Tue, 09 Oct 2007 15:27:42 -0400
X-Sasl-enc: zfUDSWbegn5Na/9yFsg6Mj5G7ulbSSGa03FhJ4Wq7PGk 1191958062
Received: from [192.168.1.235] (64-142-85-108.dsl.dynamic.sonic.net
	[64.142.85.108])
	by mail.messagingengine.com (Postfix) with ESMTP id D5C505637;
	Tue,  9 Oct 2007 15:27:41 -0400 (EDT)
Message-ID: <470BD649.9050505@freebsd.org>
Date: Tue, 09 Oct 2007 12:28:09 -0700
From: Darren Reed <darrenr@freebsd.org>
User-Agent: Thunderbird 2.0.0.0 (Windows/20070326)
MIME-Version: 1.0
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <20071005000046.GC92272@garage.freebsd.pl>	<20071008121523.GM2327@garage.freebsd.pl>	<86bqb97mym.fsf@ds4.des.no>
	<20071008133846.GP2327@garage.freebsd.pl>
In-Reply-To: <20071008133846.GP2327@garage.freebsd.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, Dag-Erling Sm??rgrav <des@des.no>,
	freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2007 19:27:43 -0000

Pawel Jakub Dawidek wrote:
> ...
>
> For i386 one has to set, eg. 'options KVA_PAGES=512' to the kernel
> config to be able to define kmem larger than ~700MB. I guess you're
> running amd64, maybe there is similar requirement?
>   

Given how much RAM PCs have these days, why isn't this a default for 
GENERIC?

Or why isn't it at least a tunable rather than an option?

Darren


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct  9 19:31:12 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 819EB16A469;
	Tue,  9 Oct 2007 19:31:12 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id D4EEF13C4B5;
	Tue,  9 Oct 2007 19:31:11 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id EF4BC45EE5; Tue,  9 Oct 2007 21:31:08 +0200 (CEST)
Received: from localhost (154.81.datacomsa.pl [195.34.81.154])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 26CB545E93;
	Tue,  9 Oct 2007 21:31:04 +0200 (CEST)
Date: Tue, 9 Oct 2007 21:30:52 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Darren Reed <darrenr@freebsd.org>
Message-ID: <20071009193051.GA13519@garage.freebsd.pl>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
	<86bqb97mym.fsf@ds4.des.no>
	<20071008133846.GP2327@garage.freebsd.pl>
	<470BD649.9050505@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="Qxx1br4bt0+wmkIi"
Content-Disposition: inline
In-Reply-To: <470BD649.9050505@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: freebsd-fs@FreeBSD.org, Dag-Erling Sm??rgrav <des@des.no>,
	freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2007 19:31:12 -0000


--Qxx1br4bt0+wmkIi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 09, 2007 at 12:28:09PM -0700, Darren Reed wrote:
> Pawel Jakub Dawidek wrote:
> >...
> >
> >For i386 one has to set, eg. 'options KVA_PAGES=3D512' to the kernel
> >config to be able to define kmem larger than ~700MB. I guess you're
> >running amd64, maybe there is similar requirement?
> > =20
>=20
> Given how much RAM PCs have these days, why isn't this a default for=20
> GENERIC?
>=20
> Or why isn't it at least a tunable rather than an option?

This may be a good reason - today's PCs have a lot of RAM and KVA_PAGES
splits address space between userland and kernel - the more address
space for the kernel, the less address space for the userland.
KVA_PAGES=3D512 splits 4GB address space in half, so userland processes
can address at most 2GB of memory.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--Qxx1br4bt0+wmkIi
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHC9brForvXbEpPzQRAq24AKDQ+OUGZ9dwkEXYyuSvKmTG7m31ogCgws3O
TnfpmvNtci4dBdOmDiKPbAQ=
=xv+V
-----END PGP SIGNATURE-----

--Qxx1br4bt0+wmkIi--

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct  9 19:40:55 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 19C9D16A418;
	Tue,  9 Oct 2007 19:40:55 +0000 (UTC)
	(envelope-from darrenr@freebsd.org)
Received: from out1.smtp.messagingengine.com (out1.smtp.messagingengine.com
	[66.111.4.25])
	by mx1.freebsd.org (Postfix) with ESMTP id DF96513C48D;
	Tue,  9 Oct 2007 19:40:54 +0000 (UTC)
	(envelope-from darrenr@freebsd.org)
Received: from compute1.internal (compute1.internal [10.202.2.41])
	by out1.messagingengine.com (Postfix) with ESMTP id 80B5B321E4;
	Tue,  9 Oct 2007 15:40:54 -0400 (EDT)
Received: from heartbeat1.messagingengine.com ([10.202.2.160])
	by compute1.internal (MEProxy); Tue, 09 Oct 2007 15:40:54 -0400
X-Sasl-enc: vkEOe1rImmQifY0I5ICdQWIvM18CYX1cZtL+e23qBSPq 1191958854
Received: from [192.168.1.235] (64-142-85-108.dsl.dynamic.sonic.net
	[64.142.85.108])
	by mail.messagingengine.com (Postfix) with ESMTP id 99A495639;
	Tue,  9 Oct 2007 15:40:53 -0400 (EDT)
Message-ID: <470BD961.4000407@freebsd.org>
Date: Tue, 09 Oct 2007 12:41:21 -0700
From: Darren Reed <darrenr@freebsd.org>
User-Agent: Thunderbird 2.0.0.0 (Windows/20070326)
MIME-Version: 1.0
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
In-Reply-To: <20071008121523.GM2327@garage.freebsd.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2007 19:40:55 -0000

Pawel Jakub Dawidek wrote:
> Here are some updates:
>
> I was able to reproduce the panic by rsyncing big files and trying
> bonnie++ test suggested in this thread.
>
> Can you guys retry with this patch:
>
> 	http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch
>   

So, I have a question...
What happens if the "for (i = 0..)" is changed to "while(1)" and
the "panic" is subsequently removed?


It appears like the code changes the meaning of "WAIT" to "wait
for 4 seconds" then panic if it won't work.  Previously, "WAIT" was
not waiting at all...whch could be described as a bug!

If I recall correctly, ZFS caches writes and doe them in spurts and
that those spurts are spaced out more than 4 seconds.  (For the
curious, do "zpool status" and observe the gap in time between
write activity.)

If you start a large amount of I/O, it is possible that all the KVA will
be used up and ZFS will not get a chance to flush its buffers before
the 4s timer here expires.  Does that sound plausible?

Would doubling the 8 to (say) 16 be beneficial here, to at least make
the waiting span one ZFS flush out to disk?

Darren


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct  9 21:01:04 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E89EB16A41B;
	Tue,  9 Oct 2007 21:01:04 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id 379FC13C4DB;
	Tue,  9 Oct 2007 21:01:03 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 7497C45E90; Tue,  9 Oct 2007 23:01:02 +0200 (CEST)
Received: from localhost (154.81.datacomsa.pl [195.34.81.154])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 6612A45684;
	Tue,  9 Oct 2007 23:00:56 +0200 (CEST)
Date: Tue, 9 Oct 2007 23:00:43 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Darren Reed <darrenr@freebsd.org>
Message-ID: <20071009210043.GC13519@garage.freebsd.pl>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
	<470BD961.4000407@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="eRtJSFbw+EEWtPj3"
Content-Disposition: inline
In-Reply-To: <470BD961.4000407@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2007 21:01:05 -0000


--eRtJSFbw+EEWtPj3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 09, 2007 at 12:41:21PM -0700, Darren Reed wrote:
> Pawel Jakub Dawidek wrote:
> >Here are some updates:
> >
> >I was able to reproduce the panic by rsyncing big files and trying
> >bonnie++ test suggested in this thread.
> >
> >Can you guys retry with this patch:
> >
> >	http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch
> > =20
>=20
> So, I have a question...
> What happens if the "for (i =3D 0..)" is changed to "while(1)" and
> the "panic" is subsequently removed?

I think it should stay to give the user a hint what's going on instead
of hanging there forever.

> It appears like the code changes the meaning of "WAIT" to "wait
> for 4 seconds" then panic if it won't work.  Previously, "WAIT" was
> not waiting at all...whch could be described as a bug!

It's actually 7 seconds:)

> If I recall correctly, ZFS caches writes and doe them in spurts and
> that those spurts are spaced out more than 4 seconds.  (For the
> curious, do "zpool status" and observe the gap in time between
> write activity.)
>=20
> If you start a large amount of I/O, it is possible that all the KVA will
> be used up and ZFS will not get a chance to flush its buffers before
> the 4s timer here expires.  Does that sound plausible?

It depends if the problem we see is because of caching/delaying writes
or just caching data for faster reads. If the latter, the cache can be
just thrown away, so it's much faster than waiting for buffers to be
flushed in former case. ZFS flushes buffers every 5 seconds by default
or when there is too much data, so 7 seconds sounds reasonable.

> Would doubling the 8 to (say) 16 be beneficial here, to at least make
> the waiting span one ZFS flush out to disk?

Note that this is visible by the user as almost complete system hang, I
think. 16 would make it to wait for 30 seconds.
I do agree that waiting even 30 seconds in some extremly rare situations
is better than panicing, but I'd first see if 8 fixes the problem.

In my testing kernel I added debug printf to see when 'i' is larger than
0 - every value larger than 0 means panic with the old kernel.  I never
observed 'i' larger than 1.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--eRtJSFbw+EEWtPj3
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHC+v7ForvXbEpPzQRAiNlAKCcYhVYuqetJSW65l+JNEnnnVKB7ACdFRx5
xjaHLr4pLF4OEct/3Jzx/Wk=
=3bSE
-----END PGP SIGNATURE-----

--eRtJSFbw+EEWtPj3--

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 11 01:03:23 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ACA5A16A419;
	Thu, 11 Oct 2007 01:03:23 +0000 (UTC)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw
	[140.138.150.166])
	by mx1.freebsd.org (Postfix) with ESMTP id 6FFBC13C43E;
	Thu, 11 Oct 2007 01:03:23 +0000 (UTC)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000)
	id 313078C9B01; Thu, 11 Oct 2007 08:43:42 +0800 (CST)
Received: from localhost (localhost [127.0.0.1])
	by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id D811B8C9B00;
	Thu, 11 Oct 2007 08:43:42 +0800 (CST)
Date: Thu, 11 Oct 2007 08:43:42 +0800 (CST)
From: Tai-hwa Liang <avatar@mmlab.cse.yzu.edu.tw>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
In-Reply-To: <20071008121523.GM2327@garage.freebsd.pl>
Message-ID: <0710110840301.59863@www.mmlab.cse.yzu.edu.tw>
References: <20071005000046.GC92272@garage.freebsd.pl>
	<20071008121523.GM2327@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: ZFS kmem_map too small.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2007 01:03:23 -0000

On Mon, 8 Oct 2007, Pawel Jakub Dawidek wrote:
> Here are some updates:
>
> I was able to reproduce the panic by rsyncing big files and trying
> bonnie++ test suggested in this thread.
>
> Can you guys retry with this patch:
>
> 	http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch
>
> It's a hack, yes, but allows to mitigate the problem quite well. I'm
> looking for a solution that can be used for 7.0 before we find a better
> fix.
>
> BTW. To use ZFS you _must_ increase vm.kmem_size/vm.kmem_size_max.
> If you have the problem discussed here and you're using standard values,
> please retry with vm.kmem_size/vm.kmem_size_max set to at least 600MB in
> /boot/loader.conf.
>
> I'm not sure if it's not too late to ask re@ about increasing the
> default kmem size at least on amd64. ~300MB we have there is silly
> small.

   The latest patch does keep the system surviving longer than before;
however, it eventually panicked with different message.

- Testing case:

 	while true; do
 		bonnie++ -s 2048 -c 50 -x 20
 	done

- ZFS related settings:

 	vfs.root.mountfrom="zfs:universe"
 	vm.kmem_size_max="629145600"
 	vm.kmem_size="629145600"

- zpool status:
   pool: universe
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         universe    ONLINE       0     0     0
           ad0s3d    ONLINE       0     0     0

errors: No known data errors

- panic messages:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3916545d
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc04d33c0
stack pointer           = 0x28:0xf7b0e930
frame pointer           = 0x28:0xf7b0e944
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3477 (bonnie++)
[thread pid 3477 tid 100159 ]
Stopped at	malloc_type_zone_allocated+0x70:	addb %cl,0x758bf45d(%ebx)
db> wh
Tracing pid 3477 tid 100159 td 0xc4258a50
malloc_type_zone_allocated(c1076d20,0,2,1,0,...) at
malloc_type_zone_allocated+0x70
malloc(40,c0824060,2,f7b0e9fc,c080df70,...) at malloc+0x69
zfs_kmem_alloc(40,2,c0607d3e,c3d66228,f7b0e998,...) at zfs_kmem_alloc+0x20
zfs_range_lock(c90da0ec,22f2da,0,1,0,...) at zfs_range_lock+0x20
zfs_freebsd_write(f7b0ebc4,0,0,0,c06be120,...) at zfs_freebsd_write+0x24b
VOP_WRITE_APV(c08257c0,f7b0ebc4,c4258a50,c068d7f8,242,...) at VOP_WRITE_APV+0xb6
vn_write(c53bd0d8,f7b0ec60,c7835e00,0,c4258a50,...) at vn_write+0x247
dofilewrite(f7b0ec60,ffffffff,ffffffff,0,c53bd0d8,...) at dofilewrite+0x97
kern_writev(c4258a50,3,f7b0ec60,bfbfdf8b,1,...) at kern_writev+0x58
write(c4258a50,f7b0ecfc,c,f7b0eca4,c065d061,...) at write+0x4f
syscall(f7b0ed38) at syscall+0x319
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (4, FreeBSD ELF32, write), eip = 0x282779a3, esp = 0xbfbfdf3c, ebp = 0xbfbfdf58 ---
db>

-- 
Cheers,

Tai-hwa Liang