From owner-freebsd-stable@FreeBSD.ORG  Tue Nov 18 18:43:01 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 61319106564A;
	Tue, 18 Nov 2008 18:43:01 +0000 (UTC)
	(envelope-from lopez.on.the.lists@yellowspace.net)
Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.200.164])
	by mx1.freebsd.org (Postfix) with ESMTP id D0C388FC13;
	Tue, 18 Nov 2008 18:43:00 +0000 (UTC)
	(envelope-from lopez.on.the.lists@yellowspace.net)
Received: from [192.168.178.21] ([85.181.143.201])
	(AUTH: CRAM-MD5 lopez.on.the.lists@yellowspace.net, TLS: TLSv1/SSLv3,
	128bits, AES128-SHA)
	by mail.yellowspace.net with esmtp; Tue, 18 Nov 2008 19:42:58 +0100
	id 0035E94F.0000000049230CB2.00012CED
Message-Id: <7BA53082-577E-4DF2-8E2A-025942C11C0A@yellowspace.net>
From: Lorenzo Perone <lopez.on.the.lists@yellowspace.net>
To: Chao Shin <quakelee@geekcn.org>
In-Reply-To: <op.uks5krn2hnq548@qld630>
Content-Type: text/plain; charset=UTF-8; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v929.2)
Date: Tue, 18 Nov 2008 19:42:56 +0100
References: <491CE71F.2020208@delphij.net> <491CE835.4050504@delphij.net>
	<20081117155835.GC2101@garage.freebsd.pl>
	<op.uks5krn2hnq548@qld630>
X-Mailer: Apple Mail (2.929.2)
Cc: d@delphij.net, Pawel Jakub Dawidek <pjd@freebsd.org>,
	FreeBSD Stable <freebsd-stable@freebsd.org>
Subject: Re: ZFS crashes on heavy threaded environment
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Nov 2008 18:43:01 -0000


For what's worth it, I have similar problems on a comparable system =20
(amd64/8GB,
7.1-PRERELEASE #3: Sun Nov 16 13:39:43), which I wouldn't call
heavilly threaded yet (as there is only one mysql51 running,
and courier-mta/imap, max 15 users now).
Perhaps worth a note: Bjoern's multi-IP jail patches are applied on
this system.

The setup is so that one zfs filesystem is mounted into a jail handling
only mail (and for that: just the root of the mail files), and a =20
script on
the main host rotates snapshots hourly (making a new one, and
destroying the oldest).

After about 8-24 hours of production:

- mysqld is stuck in sbwait state;
- messages start filling up with
   kernel: vm_thread_new: kstack allocation failed
- almost any attempt to fork a process fails with
   Cannot allocate memory.

No panic so far, at least since I've introduced =20
vfs.zfs.prefetch_disable=3D"1".
Before that, I experienced several panics upon shutdown.

If I still have an open shell, I can send around some -TERMs and
-KILLs and halfway get back control; after that, if I zfs umount -a
kernel memory usage drastically drops down, and I can resume
the services. However, not for long. After about 1-2 hrs of production
it starts whining again in the messages about kstack allocation failed,
and soon thereafter it all repeats. Only rebooting gives back another
12-24hrs of operation.

What I've tracked down so far:
- zfs destroy'ing old snapshots definitively makes those failures
pop up earlier
- I've been collecting some data shortly around the memory
problems, which I post below.

Since this is a production machine (I know, I shoudn't - but hey,
you made us lick blood and now we ended up wanting more! So,
yes, I confirm, you definitively _are_ evil! ;)), I'm almost
ready to move that back to UFS.

But if it can be useful for debugging, I would be willing to set up a =20=

zabbix
agent or such to track whichever values could be useful over time for =20=

a day or two.
If on the other hand these bugs (leaks, or whatever) are likely to
be solved in the recent commit, I'll just move back to UFS until
they're ported to -STABLE.

Here follows some data about memory usage (strangely, I never
saw this even halfway reaching 1.5 GB, but it's really almost
voodoo to me so I leave the analysis up to others):

TEXT=3D`kldstat | tr a-f A-F | awk 'BEGIN {print "ibase=3D16"}; NR > 1 =20=

{print $4}' | bc | awk '{a+=3D$1}; END {print a}'`
DATA=3D`vmstat -m | sed 's/K//' | awk '{a+=3D$3}; END {print a*1024}'`
TOTAL=3D`echo $DATA $TEXT | awk '{print $1+$2}'`

TEXT=3D13102280, 12.4953 MB
DATA=3D470022144, 448.248 MB
TOTAL=3D483124424, 460.743 MB

vmstat -m | grep vnodes
kern.maxvnodes: 100000
kern.minvnodes: 25000
vfs.freevnodes: 2380
vfs.wantfreevnodes: 25000
vfs.numvnodes: 43982

As said, the box has 8 GB of RAM, the following loader.conf,
and at the time of the lockups there were about 5GB free
userland memory available.

my loader.conf:
vm.kmem_size=3D"1536M"
vm.kmem_size_max=3D"1536M"
vfs.zfs.arc_min=3D"512M"
vfs.zfs.arc_max=3D"768M"
vfs.zfs.prefetch_disable=3D"1"

as for the filesystem, I only changed the recordsize and
the mountpoint, the rest is default:

[horkheimer:lopez] root# zfs get all hkpool/mail
NAME         PROPERTY       VALUE                  SOURCE
hkpool/mail  type           filesystem             -
hkpool/mail  creation       Fri Oct 31 13:28 2008  -
hkpool/mail  used           5.50G                  -
hkpool/mail  available      386G                   -
hkpool/mail  referenced     4.33G                  -
hkpool/mail  compressratio  1.05x                  -
hkpool/mail  mounted        yes                    -
hkpool/mail  quota          none                   default
hkpool/mail  reservation    none                   default
hkpool/mail  recordsize     4K                     local
hkpool/mail  mountpoint     /jails/mail/mail       local
hkpool/mail  sharenfs       off                    default
hkpool/mail  checksum       on                     default
hkpool/mail  compression    on                     local
hkpool/mail  atime          on                     default
hkpool/mail  devices        on                     default
hkpool/mail  exec           on                     default
hkpool/mail  setuid         on                     default
hkpool/mail  readonly       off                    default
hkpool/mail  jailed         off                    local
hkpool/mail  snapdir        hidden                 default
hkpool/mail  aclmode        groupmask              default
hkpool/mail  aclinherit     secure                 default
hkpool/mail  canmount       on                     default
hkpool/mail  shareiscsi     off                    default
hkpool/mail  xattr          off                    temporary
hkpool/mail  copies         1                      default

the pool is using a partition on a hardware RAID1:

[horkheimer:lopez] root# zpool status
   pool: hkpool
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         hkpool      ONLINE       0     0     0
           da0s1f    ONLINE       0     0     0


Regards and thanx a lot for bringing on zfs,

Lorenzo


On 18.11.2008, at 10:20, Chao Shin wrote:

> On Mon, 17 Nov 2008 23:58:35 +0800=EF=BC=8CPawel Jakub Dawidek =
<pjd@freebsd.o=20
> rg> wrote:
>
>> On Thu, Nov 13, 2008 at 06:53:41PM -0800, Xin LI wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Xin LI wrote:
>>> > Hi, Pawel,
>>> >
>>> > We can still reproduce the ZFS crash (threading+heavy I/O load) =20=

>>> on a
>>> > fresh 7.1-STABLE build, in a few minutes:
>>> >
>>> > /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s =20=

>>> 30g -i
>>> > 0 -i 1 -i 2 -i 8 -+p 70 -C
>>> >
>>> > I have included a backtrace output from my colleague who has his =20=

>>> hands
>>> > on the test environment.  Should there is more information =20
>>> necessary
>>> > please let us know and we wish to provide help on this.
>>>
>>> Further datapoint.  The system used to run with untuned =20
>>> loader.conf, and
>>> my colleague just reported that with the following loader.conf, the
>>> problem can be triggered sooner:
>>>
>>> vm.kmem_size_max=3D838860800
>>> vm.kmem_size_scale=3D"2"
>>>
>>> The system is running FreeBSD/amd64 7.1-PRERELEASE equipped with =20
>>> 8GB of
>>> RAM with GENERIC kernel.
>>
>> With new ZFS I get:
>>
>> Memory allocation failed:: Cannot allocate memory
>>
>> Is this expected?
>>
>
> At first, Congratulations to you, thanks to your works, well done!
>
> I used this command on a FreeBSD 7.1-PRERELEASE amd64 box with 8GB =20
> mem, isn't got output like that, but kernel panic.
> Maybe you should lower the threads and file size, for example:
>
> /usr/local/bin/iozone -M -e -+u -T -t 64 -S 4096 -L 64 -r 4k -s 2g -=20=

> i 0 -i 1 -i 2 -i 8 -+p 70 -C
>
> Actually, we had used this command to test a 8-current with zfs v12 =20=

> patch on July, there is no more panic. So we hope
> zfs v13 can MFC as soon as possible, because we really need it now.
> --=20
> The Power to Serve
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org=20
> "