From owner-freebsd-stable@FreeBSD.ORG Tue Nov 18 18:43:01 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 61319106564A; Tue, 18 Nov 2008 18:43:01 +0000 (UTC) (envelope-from lopez.on.the.lists@yellowspace.net) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.200.164]) by mx1.freebsd.org (Postfix) with ESMTP id D0C388FC13; Tue, 18 Nov 2008 18:43:00 +0000 (UTC) (envelope-from lopez.on.the.lists@yellowspace.net) Received: from [192.168.178.21] ([85.181.143.201]) (AUTH: CRAM-MD5 lopez.on.the.lists@yellowspace.net, TLS: TLSv1/SSLv3, 128bits, AES128-SHA) by mail.yellowspace.net with esmtp; Tue, 18 Nov 2008 19:42:58 +0100 id 0035E94F.0000000049230CB2.00012CED Message-Id: <7BA53082-577E-4DF2-8E2A-025942C11C0A@yellowspace.net> From: Lorenzo Perone To: Chao Shin In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v929.2) Date: Tue, 18 Nov 2008 19:42:56 +0100 References: <491CE71F.2020208@delphij.net> <491CE835.4050504@delphij.net> <20081117155835.GC2101@garage.freebsd.pl> X-Mailer: Apple Mail (2.929.2) Cc: d@delphij.net, Pawel Jakub Dawidek , FreeBSD Stable Subject: Re: ZFS crashes on heavy threaded environment X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2008 18:43:01 -0000 For what's worth it, I have similar problems on a comparable system =20 (amd64/8GB, 7.1-PRERELEASE #3: Sun Nov 16 13:39:43), which I wouldn't call heavilly threaded yet (as there is only one mysql51 running, and courier-mta/imap, max 15 users now). Perhaps worth a note: Bjoern's multi-IP jail patches are applied on this system. The setup is so that one zfs filesystem is mounted into a jail handling only mail (and for that: just the root of the mail files), and a =20 script on the main host rotates snapshots hourly (making a new one, and destroying the oldest). After about 8-24 hours of production: - mysqld is stuck in sbwait state; - messages start filling up with kernel: vm_thread_new: kstack allocation failed - almost any attempt to fork a process fails with Cannot allocate memory. No panic so far, at least since I've introduced =20 vfs.zfs.prefetch_disable=3D"1". Before that, I experienced several panics upon shutdown. If I still have an open shell, I can send around some -TERMs and -KILLs and halfway get back control; after that, if I zfs umount -a kernel memory usage drastically drops down, and I can resume the services. However, not for long. After about 1-2 hrs of production it starts whining again in the messages about kstack allocation failed, and soon thereafter it all repeats. Only rebooting gives back another 12-24hrs of operation. What I've tracked down so far: - zfs destroy'ing old snapshots definitively makes those failures pop up earlier - I've been collecting some data shortly around the memory problems, which I post below. Since this is a production machine (I know, I shoudn't - but hey, you made us lick blood and now we ended up wanting more! So, yes, I confirm, you definitively _are_ evil! ;)), I'm almost ready to move that back to UFS. But if it can be useful for debugging, I would be willing to set up a =20= zabbix agent or such to track whichever values could be useful over time for =20= a day or two. If on the other hand these bugs (leaks, or whatever) are likely to be solved in the recent commit, I'll just move back to UFS until they're ported to -STABLE. Here follows some data about memory usage (strangely, I never saw this even halfway reaching 1.5 GB, but it's really almost voodoo to me so I leave the analysis up to others): TEXT=3D`kldstat | tr a-f A-F | awk 'BEGIN {print "ibase=3D16"}; NR > 1 =20= {print $4}' | bc | awk '{a+=3D$1}; END {print a}'` DATA=3D`vmstat -m | sed 's/K//' | awk '{a+=3D$3}; END {print a*1024}'` TOTAL=3D`echo $DATA $TEXT | awk '{print $1+$2}'` TEXT=3D13102280, 12.4953 MB DATA=3D470022144, 448.248 MB TOTAL=3D483124424, 460.743 MB vmstat -m | grep vnodes kern.maxvnodes: 100000 kern.minvnodes: 25000 vfs.freevnodes: 2380 vfs.wantfreevnodes: 25000 vfs.numvnodes: 43982 As said, the box has 8 GB of RAM, the following loader.conf, and at the time of the lockups there were about 5GB free userland memory available. my loader.conf: vm.kmem_size=3D"1536M" vm.kmem_size_max=3D"1536M" vfs.zfs.arc_min=3D"512M" vfs.zfs.arc_max=3D"768M" vfs.zfs.prefetch_disable=3D"1" as for the filesystem, I only changed the recordsize and the mountpoint, the rest is default: [horkheimer:lopez] root# zfs get all hkpool/mail NAME PROPERTY VALUE SOURCE hkpool/mail type filesystem - hkpool/mail creation Fri Oct 31 13:28 2008 - hkpool/mail used 5.50G - hkpool/mail available 386G - hkpool/mail referenced 4.33G - hkpool/mail compressratio 1.05x - hkpool/mail mounted yes - hkpool/mail quota none default hkpool/mail reservation none default hkpool/mail recordsize 4K local hkpool/mail mountpoint /jails/mail/mail local hkpool/mail sharenfs off default hkpool/mail checksum on default hkpool/mail compression on local hkpool/mail atime on default hkpool/mail devices on default hkpool/mail exec on default hkpool/mail setuid on default hkpool/mail readonly off default hkpool/mail jailed off local hkpool/mail snapdir hidden default hkpool/mail aclmode groupmask default hkpool/mail aclinherit secure default hkpool/mail canmount on default hkpool/mail shareiscsi off default hkpool/mail xattr off temporary hkpool/mail copies 1 default the pool is using a partition on a hardware RAID1: [horkheimer:lopez] root# zpool status pool: hkpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM hkpool ONLINE 0 0 0 da0s1f ONLINE 0 0 0 Regards and thanx a lot for bringing on zfs, Lorenzo On 18.11.2008, at 10:20, Chao Shin wrote: > On Mon, 17 Nov 2008 23:58:35 +0800=EF=BC=8CPawel Jakub Dawidek = rg> wrote: > >> On Thu, Nov 13, 2008 at 06:53:41PM -0800, Xin LI wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Xin LI wrote: >>> > Hi, Pawel, >>> > >>> > We can still reproduce the ZFS crash (threading+heavy I/O load) =20= >>> on a >>> > fresh 7.1-STABLE build, in a few minutes: >>> > >>> > /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s =20= >>> 30g -i >>> > 0 -i 1 -i 2 -i 8 -+p 70 -C >>> > >>> > I have included a backtrace output from my colleague who has his =20= >>> hands >>> > on the test environment. Should there is more information =20 >>> necessary >>> > please let us know and we wish to provide help on this. >>> >>> Further datapoint. The system used to run with untuned =20 >>> loader.conf, and >>> my colleague just reported that with the following loader.conf, the >>> problem can be triggered sooner: >>> >>> vm.kmem_size_max=3D838860800 >>> vm.kmem_size_scale=3D"2" >>> >>> The system is running FreeBSD/amd64 7.1-PRERELEASE equipped with =20 >>> 8GB of >>> RAM with GENERIC kernel. >> >> With new ZFS I get: >> >> Memory allocation failed:: Cannot allocate memory >> >> Is this expected? >> > > At first, Congratulations to you, thanks to your works, well done! > > I used this command on a FreeBSD 7.1-PRERELEASE amd64 box with 8GB =20 > mem, isn't got output like that, but kernel panic. > Maybe you should lower the threads and file size, for example: > > /usr/local/bin/iozone -M -e -+u -T -t 64 -S 4096 -L 64 -r 4k -s 2g -=20= > i 0 -i 1 -i 2 -i 8 -+p 70 -C > > Actually, we had used this command to test a 8-current with zfs v12 =20= > patch on July, there is no more panic. So we hope > zfs v13 can MFC as soon as possible, because we really need it now. > --=20 > The Power to Serve > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org=20 > "