From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 02:34:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CA471065672; Sun, 10 Oct 2010 02:34:29 +0000 (UTC) (envelope-from pieter@degoeje.nl) Received: from smtp.utwente.nl (smtp1.utsp.utwente.nl [130.89.2.8]) by mx1.freebsd.org (Postfix) with ESMTP id EF7DE8FC14; Sun, 10 Oct 2010 02:34:28 +0000 (UTC) Received: from nox-laptop.student.utwente.nl (nox-laptop.student.utwente.nl [130.89.160.140]) by smtp.utwente.nl (8.12.10/SuSE Linux 0.7) with ESMTP id o9A1Wsx8015918; Sun, 10 Oct 2010 03:32:55 +0200 From: Pieter de Goeje To: freebsd-stable@freebsd.org Date: Sun, 10 Oct 2010 03:32:54 +0200 User-Agent: KMail/1.9.10 References: <169A4F62-0509-4AE9-A4A5-F9CADD08140D@irbisnet.com> In-Reply-To: <169A4F62-0509-4AE9-A4A5-F9CADD08140D@irbisnet.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201010100332.54731.pieter@degoeje.nl> X-UTwente-MailScanner-Information: Scanned by MailScanner. Contact icts.servicedesk@utwente.nl for more information. X-UTwente-MailScanner: Found to be clean X-UTwente-MailScanner-From: pieter@degoeje.nl X-Spam-Status: No Cc: "freebsd-fs@freebsd.org" , Andriy Gapon Subject: Re: Serious zfs slowdown when mixed with another file system (ufs/msdosfs/etc.). X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 02:34:29 -0000 On Saturday 09 October 2010 16:55:35 Andriy Bakay wrote: > Do you know any more convenient way (except make buildword, etc.) to > upgrade/update several boxes to STABLE on regular basis? Something like > freebsd-update or maybe some process, tips, tricks, etc? > > Thanks. Here's how I do it: 1) Build server: make buildworld && make buildkernel 2) Other servers: export / via NFS Repeat for each other server on build server: mount boxN:/ /mnt make installkernel DESTDIR=/mnt -DNO_FSCHG make installworld DESTDIR=/mnt -DNO_FSCHG umount /mnt Note that I use a single filesystem for / and /usr. Obviously if those are separate filesystems more NFS exports and mount commands are necessary. Before the first run all immutable flags need to be removed from the target box, otherwise the install will fail (i.e. chflags -R noschg /). > > On 2010-10-08, at 6:11, Pete French wrote: > >> Ok. But how stable (production ready) the FreeBSD-8-STABLE is? What is > >> your opinion? > > > > I am running 8-STABLE from 27th September on all our ptoduction > > machines (from webservers to database servers to the company mail > > server) and it is fine. I am going to update again over the next > > few days, as there are some ZFS fixes in which I want - and which > > may benifit you too - so I will be able to report back next > > week as to how a more recent version behaves. > > > > In general though, I have never had problems running STABLE on > > prodyction systems over the years. Of course what I do is to test it > > on a singlre machine before rolling it out (a leaf in a webfarm > > so if it goes down it wont affect the business) but it is usually > > fine. keep an eye on -STABLE mailing list though, as that is where > > problems arise. I watch that, and also the dailing commits, either here > > > > http://www.freshbsd.org/?branch=RELENG_8&project=freebsd&committer=&modul > >e=&q= > > > > or here > > > > http://www.secnetix.de/olli/FreeBSD/svnews/?p=stable/8 > > > > Just to see whats going into the tree relative to whats being discussed. > > It only takes a few minutes a dat to monitor the mailin lists and the > > commits, and the result is that we've been running STABLE for a very > > long time (close to a decade I suspect) with great success. > > > > -pete. > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Pieter de Goeje From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 08:16:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D03961065673; Sun, 10 Oct 2010 08:16:29 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E53B58FC18; Sun, 10 Oct 2010 08:16:28 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA05350; Sun, 10 Oct 2010 11:16:24 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P4r4y-000HSH-JH; Sun, 10 Oct 2010 11:16:24 +0300 Message-ID: <4CB17658.5080808@icyb.net.ua> Date: Sun, 10 Oct 2010 11:16:24 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Andriy Bakay References: <169A4F62-0509-4AE9-A4A5-F9CADD08140D@irbisnet.com> In-Reply-To: <169A4F62-0509-4AE9-A4A5-F9CADD08140D@irbisnet.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" , "freebsd-stable@freebsd.org" Subject: Re: Serious zfs slowdown when mixed with another file system (ufs/msdosfs/etc.). X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 08:16:29 -0000 on 09/10/2010 17:55 Andriy Bakay said the following: > Do you know any more convenient way (except make buildword, etc.) to > upgrade/update several boxes to STABLE on regular basis? Something like > freebsd-update or maybe some process, tips, tricks, etc? More convenient? :-) Sorry, I always use "make buildword, etc", works 100% for me and I find it very convenient. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 08:30:22 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F10C106566C for ; Sun, 10 Oct 2010 08:30:22 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 687D78FC08 for ; Sun, 10 Oct 2010 08:30:20 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA05460; Sun, 10 Oct 2010 11:29:55 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P4rI3-000HSz-DN; Sun, 10 Oct 2010 11:29:55 +0300 Message-ID: <4CB17983.3020907@icyb.net.ua> Date: Sun, 10 Oct 2010 11:29:55 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Kai Gallasch References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 08:30:22 -0000 on 09/10/2010 16:37 Kai Gallasch said the following: > I must repeat. I offer my help if someone wants to dig into the locking problem. Kai, I would like to look into this. Can you provide shell access to a system that exhibits the behavior? Or even better serial console for remote debugging, if possible. Also, can you first try with the very latest stable/8 or even head? I have recently MFC-ed a few improvements to ZFS code. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 09:48:16 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8508F106564A; Sun, 10 Oct 2010 09:48:16 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8F9C18FC18; Sun, 10 Oct 2010 09:48:15 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA06095; Sun, 10 Oct 2010 12:47:51 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P4sVT-000HW7-1L; Sun, 10 Oct 2010 12:47:51 +0300 Message-ID: <4CB18BC6.70305@freebsd.org> Date: Sun, 10 Oct 2010 12:47:50 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Kai Gallasch References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <201010061732.o96HW2Vi005945@higson.cam.lispworks.com> <4CAF45A8.3020401@icyb.net.ua> In-Reply-To: <4CAF45A8.3020401@icyb.net.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Konstantin Belousov Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 09:48:16 -0000 on 08/10/2010 19:24 Andriy Gapon said the following: > on 06/10/2010 21:51 Kai Gallasch said the following: >> >> Am 06.10.2010 um 19:32 schrieb Martin Simmons: >> >>>>>>>> On Wed, 6 Oct 2010 14:28:31 +0200, Kai Gallasch said: >>>> >>>> How can I debug this and get further information? >>> >>> procstat -k -k $pid will generate a backtrace (or replace $pid by -a for all >>> processes). >> >> procstat for process 12111 (state: zfs) >> sonnenkraft:~ # procstat -k -k 12111 >> PID TID COMM TDNAME KSTACK >> 12111 102385 httpd - mi_switch+0x21b sleepq_switch+0x123 sleepq_wait+0x4d __lockmgr_args+0x7ae vop_stdlock+0x39 VOP_LOCK1_APV+0x9b _vn_lock+0x57 vget+0x7b cache_lookup+0x4e0 vfs_cache_lookup+0xc0 VOP_LOOKUP_APV+0xb7 lookup+0x3d3 namei+0x457 vn_open_cred+0x1e3 kern_openat+0x181 syscall+0x102 Xfast_syscall+0xe2 >> >> procstat for process 24731 (state: zfsmrb) >> # procstat -k -k 24731 >> PID TID COMM TDNAME KSTACK >> 24731 102273 httpd - mi_switch+0x21b sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x369 zfs_freebsd_read+0x2a6 VOP_READ_APV+0xaf vnode_pager_generic_getpages+0x3ea VOP_GETPAGES_APV+0xb5 vnode_pager_getpages+0x8c vm_fault+0x685 trap_pfault+0x128 trap+0x52c calltrap+0x8 Hm, I think that we actually shouldn't see a stack like that. vm_fault sets VPO_BUSY on a page before calling vnode_pager_generic_getpages, so the thread gets stuck forever in zfs mappedread. It seems like the page that was seen as invalid in vm_fault becomes valid while call flow reaches mappedread. >> In my original post I wrote that only apache httpd processes would lock up.. >> This is wrong. Several other non-httpd processes also got stuck in state zfs or zfsmrb. > > Interesting. > It's possible that TID 102385 might be waiting on a vnode lock held by TID 102273. > But TID 102273 seems to be waiting on a vnode's page lock. > It would be very interesting to learn what process has that page busy, for how > long and why. > Perhaps there is a code path that busies a page, but never un-busies it... > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 12:16:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 519BB106564A; Sun, 10 Oct 2010 12:16:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id BB1478FC1D; Sun, 10 Oct 2010 12:16:28 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o9ACFWHq025034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 10 Oct 2010 15:15:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o9ACFWut059551; Sun, 10 Oct 2010 15:15:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o9ACFWrN059550; Sun, 10 Oct 2010 15:15:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 10 Oct 2010 15:15:32 +0300 From: Kostik Belousov To: Andriy Gapon Message-ID: <20101010121532.GG2392@deviant.kiev.zoral.com.ua> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <201010061732.o96HW2Vi005945@higson.cam.lispworks.com> <4CAF45A8.3020401@icyb.net.ua> <4CB18BC6.70305@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="URkQCorwCiZbgSAY" Content-Disposition: inline In-Reply-To: <4CB18BC6.70305@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 12:16:29 -0000 --URkQCorwCiZbgSAY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Oct 10, 2010 at 12:47:50PM +0300, Andriy Gapon wrote: > on 08/10/2010 19:24 Andriy Gapon said the following: > > on 06/10/2010 21:51 Kai Gallasch said the following: > >> > >> Am 06.10.2010 um 19:32 schrieb Martin Simmons: > >> > >>>>>>>> On Wed, 6 Oct 2010 14:28:31 +0200, Kai Gallasch said: > >>>> > >>>> How can I debug this and get further information? > >>> > >>> procstat -k -k $pid will generate a backtrace (or replace $pid by -a = for all > >>> processes). > >> > >> procstat for process 12111 (state: zfs) > >> sonnenkraft:~ # procstat -k -k 12111 > >> PID TID COMM TDNAME KSTACK = =20 > >> 12111 102385 httpd - mi_switch+0x21b sleepq_= switch+0x123 sleepq_wait+0x4d __lockmgr_args+0x7ae vop_stdlock+0x39 VOP_LOC= K1_APV+0x9b _vn_lock+0x57 vget+0x7b cache_lookup+0x4e0 vfs_cache_lookup+0xc= 0 VOP_LOOKUP_APV+0xb7 lookup+0x3d3 namei+0x457 vn_open_cred+0x1e3 kern_open= at+0x181 syscall+0x102 Xfast_syscall+0xe2 > >> > >> procstat for process 24731 (state: zfsmrb) > >> # procstat -k -k 24731 > >> PID TID COMM TDNAME KSTACK = =20 > >> 24731 102273 httpd - mi_switch+0x21b sleepq_= switch+0x123 sleepq_wait+0x4d _sleep+0x369 zfs_freebsd_read+0x2a6 VOP_READ_= APV+0xaf vnode_pager_generic_getpages+0x3ea VOP_GETPAGES_APV+0xb5 vnode_pag= er_getpages+0x8c vm_fault+0x685 trap_pfault+0x128 trap+0x52c calltrap+0x8 >=20 > Hm, I think that we actually shouldn't see a stack like that. > vm_fault sets VPO_BUSY on a page before calling vnode_pager_generic_getpa= ges, so > the thread gets stuck forever in zfs mappedread. > It seems like the page that was seen as invalid in vm_fault becomes valid= while > call flow reaches mappedread. The vnode is share-locked, and vm object lock is dropped and reacquired several times until control reaches zfs_mappedread. This indeed allows a window during which page might be read by other thread. There are two possible routes to solve the issue: 1. Provide zfs-specific VOP_GETPAGES(). 2. Use my vm6 patch. Sigh. >=20 > >> In my original post I wrote that only apache httpd processes would loc= k up.. > >> This is wrong. Several other non-httpd processes also got stuck in sta= te zfs or zfsmrb. > >=20 > > Interesting. > > It's possible that TID 102385 might be waiting on a vnode lock held by = TID 102273. > > But TID 102273 seems to be waiting on a vnode's page lock. > > It would be very interesting to learn what process has that page busy, = for how > > long and why. > > Perhaps there is a code path that busies a page, but never un-busies it= ... > >=20 >=20 >=20 > --=20 > Andriy Gapon --URkQCorwCiZbgSAY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkyxrmQACgkQC3+MBN1Mb4hV9ACeLIfbAZYd14eJsqFc1G2qTUhP AVIAnA8z9BMl1sb5RFLOKZOwAengP7gD =7NAB -----END PGP SIGNATURE----- --URkQCorwCiZbgSAY-- From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 12:27:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7154106564A for ; Sun, 10 Oct 2010 12:27:50 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 113FC8FC0A for ; Sun, 10 Oct 2010 12:27:49 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA07568; Sun, 10 Oct 2010 15:27:24 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P4uzs-000Hcq-Dv; Sun, 10 Oct 2010 15:27:24 +0300 Message-ID: <4CB1B12B.3010107@freebsd.org> Date: Sun, 10 Oct 2010 15:27:23 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Kostik Belousov References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <201010061732.o96HW2Vi005945@higson.cam.lispworks.com> <4CAF45A8.3020401@icyb.net.ua> <4CB18BC6.70305@freebsd.org> <20101010121532.GG2392@deviant.kiev.zoral.com.ua> In-Reply-To: <20101010121532.GG2392@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 12:27:50 -0000 on 10/10/2010 15:15 Kostik Belousov said the following: > On Sun, Oct 10, 2010 at 12:47:50PM +0300, Andriy Gapon wrote: >> Hm, I think that we actually shouldn't see a stack like that. >> vm_fault sets VPO_BUSY on a page before calling vnode_pager_generic_getpages, so >> the thread gets stuck forever in zfs mappedread. >> It seems like the page that was seen as invalid in vm_fault becomes valid while >> call flow reaches mappedread. > The vnode is share-locked, and vm object lock is dropped and reacquired > several times until control reaches zfs_mappedread. This indeed allows > a window during which page might be read by other thread. But wouldn't a page still stay protected by VPO_BUSY all that time? I mean that the page shouldn't be read in and marked valid by other thread while it's flagged with VPO_BUSY. And, AFAICS, vm_fault has the page busy for the whole duration. > There are two possible routes to solve the issue: > 1. Provide zfs-specific VOP_GETPAGES(). > 2. Use my vm6 patch. Sigh. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 12:51:18 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CF981065675; Sun, 10 Oct 2010 12:51:18 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id F33478FC13; Sun, 10 Oct 2010 12:51:17 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o9ACpHVE043246; Sun, 10 Oct 2010 05:51:17 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201010101251.o9ACpHVE043246@chez.mckusick.com> To: Ivan Voras In-reply-to: Date: Sun, 10 Oct 2010 05:51:17 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org Subject: Re: Increasing ufs.dirhash_maxmem by default X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 12:51:19 -0000 > To: freebsd-fs@freebsd.org > From: Ivan Voras > Date: Sat, 09 Oct 2010 21:51:47 +0200 > Subject: Increasing ufs.dirhash_maxmem by default > > hi, > > Several people have worked on dirhash in the past so I'm posting here > instead of individually pinging them. > > The default dirhash_maxmem is currently set as 2 MB, which while may be > sufficient some time ago it certainly isn't now. I've had to increase it > on practically all non-trivial servers and even high-end desktops, and > there are occasional reports on the lists that suggest it's a fairly > common thing. > > What I'd like to do is either: > > 1) Simply increase the default to e.g. 32 MB (trivial change) or > 2) Make it a function of hibufspace (e.g. 1/32th of it, capped at 64 MB) > which is itself autotuned. This would happen in ufsdirhash_init(). > > The current incarnation of dirhash has a vm_lowmem handler so it doesn't > look like it could starve a system if overtuned. > > Ideas? Objections? I am a strong proponent of auto tuning. Otherwise, one is constantly needing to fix defaults as we are discussing here. You suggestion #2 above seems reasonable except that I would not put an upper limit on it as that just gets us back to the previous problem after a few years. Given that dirhash has a vm_lowmem handler, and we are only considering a small percentage of the memory, I do not think that an upper bound is really needed. ~Kirk From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 15:34:50 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0582C106566B for ; Sun, 10 Oct 2010 15:34:50 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 930218FC14 for ; Sun, 10 Oct 2010 15:34:49 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 06EB1153434 for ; Sun, 10 Oct 2010 17:34:48 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jeOCiQkun6z8 for ; Sun, 10 Oct 2010 17:34:46 +0200 (CEST) Received: from [127.0.0.1] (opteron [192.168.10.67]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 08B9F153433 for ; Sun, 10 Oct 2010 17:34:46 +0200 (CEST) Message-ID: <4CB1DD0F.6000209@digiware.nl> Date: Sun, 10 Oct 2010 17:34:39 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: ZFS freeze/livelock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 15:34:50 -0000 Hi, Just had my FreeBSD freeze on me with what I would think is sort of an livelock.... While I was receiving zfs snapshots on my data pool. Top and systat just kept running, but anything getting near a shell (and perhaps disk-io) ended up in: root@zfs.digiware.nl# gpart create -s gpt da6 load: 0.00 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 26.12r 0.00u 0.00s 0% 2480k load: 0.10 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 96.01r 0.00u 0.00s 0% 2480k Trying to execute to execute shutdown -r now had no effect what so ever. Neither did the three-finger salute. (Well at least not in 60 sec I was willing to wait.) Only way out of this situation was hard-reset. And I do have to admit I like ZFS for the speed it recovers after unexpected reboot. To bad there was no alt-ctrl-backspace escape to debugger compiled in. I'll do that with the next kernel, just in case. So the only data point I can give is the ^T output above. --WjW From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 19:34:18 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 826C9106567A for ; Sun, 10 Oct 2010 19:34:18 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.westchester.pa.mail.comcast.net (qmta10.westchester.pa.mail.comcast.net [76.96.62.17]) by mx1.freebsd.org (Postfix) with ESMTP id 302F68FC13 for ; Sun, 10 Oct 2010 19:34:17 +0000 (UTC) Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72]) by qmta10.westchester.pa.mail.comcast.net with comcast id H7Nj1f0021ZXKqc5A7aH5H; Sun, 10 Oct 2010 19:34:17 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta21.westchester.pa.mail.comcast.net with comcast id H7aG1f00S3LrwQ23h7aH6A; Sun, 10 Oct 2010 19:34:17 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 484DD9B418; Sun, 10 Oct 2010 12:34:15 -0700 (PDT) Date: Sun, 10 Oct 2010 12:34:15 -0700 From: Jeremy Chadwick To: Willem Jan Withagen Message-ID: <20101010193415.GA93540@icarus.home.lan> References: <4CB1DD0F.6000209@digiware.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB1DD0F.6000209@digiware.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS freeze/livelock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 19:34:18 -0000 On Sun, Oct 10, 2010 at 05:34:39PM +0200, Willem Jan Withagen wrote: > Just had my FreeBSD freeze on me with what I would think is sort of > an livelock.... > > While I was receiving zfs snapshots on my data pool. > > Top and systat just kept running, > but anything getting near a shell (and perhaps disk-io) ended up in: > > root@zfs.digiware.nl# gpart create -s gpt da6 > load: 0.00 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 26.12r > 0.00u 0.00s 0% 2480k > load: 0.10 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 96.01r > 0.00u 0.00s 0% 2480k > > Trying to execute to execute shutdown -r now had no effect what so ever. > Neither did the three-finger salute. > (Well at least not in 60 sec I was willing to wait.) > > Only way out of this situation was hard-reset. And I do have to > admit I like ZFS for the speed it recovers after unexpected reboot. > > To bad there was no alt-ctrl-backspace escape to debugger compiled > in. I'll do that with the next kernel, just in case. > > So the only data point I can give is the ^T output above. We don't know what FreeBSD version you're using (specifically uname -a output, since build date matters), but if it's RELENG_8 with ZFS v15, you might check out this thread (be sure to read Kai and I's diagnoses): http://lists.freebsd.org/pipermail/freebsd-fs/2010-October/009687.html I'm in the process of moving all of my machines, including my home server, over to gmirror. (Home machine started showing signs of serious ZFS performance degredation; mutt doing a stat() on 24 files and directories total taking literally 0.4 seconds on a dual-core machine. Makes no sense, doesn't happen with UFS2, I'm done.) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 20:00:33 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0C02106564A for ; Sun, 10 Oct 2010 20:00:33 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7ACCB8FC12 for ; Sun, 10 Oct 2010 20:00:33 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id D057A153437; Sun, 10 Oct 2010 22:00:31 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Otr5-Z+b5Tqg; Sun, 10 Oct 2010 22:00:20 +0200 (CEST) Received: from [192.168.10.215] (unknown [192.168.10.215]) by mail.digiware.nl (Postfix) with ESMTP id 5E725153433; Sun, 10 Oct 2010 22:00:20 +0200 (CEST) References: <4CB1DD0F.6000209@digiware.nl> <20101010193415.GA93540@icarus.home.lan> Message-Id: From: Willem Jan Withagen To: Jeremy Chadwick In-Reply-To: <20101010193415.GA93540@icarus.home.lan> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Mailer: iPad Mail (7B405) Mime-Version: 1.0 (iPad Mail 7B405) Date: Sun, 10 Oct 2010 22:06:05 +0200 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS freeze/livelock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 20:00:34 -0000 Op 10 okt. 2010 om 21:34 heeft Jeremy Chadwick = het volgende geschreven: > On Sun, Oct 10, 2010 at 05:34:39PM +0200, Willem Jan Withagen wrote: >> Just had my FreeBSD freeze on me with what I would think is sort of >> an livelock.... >>=20 >> While I was receiving zfs snapshots on my data pool. >>=20 >> Top and systat just kept running, >> but anything getting near a shell (and perhaps disk-io) ended up in: >>=20 >> root@zfs.digiware.nl# gpart create -s gpt da6 >> load: 0.00 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 26.12r >> 0.00u 0.00s 0% 2480k >> load: 0.10 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 96.01r >> 0.00u 0.00s 0% 2480k >>=20 >> Trying to execute to execute shutdown -r now had no effect what so = ever. >> Neither did the three-finger salute. >> (Well at least not in 60 sec I was willing to wait.) >>=20 >> Only way out of this situation was hard-reset. And I do have to >> admit I like ZFS for the speed it recovers after unexpected reboot. >>=20 >> To bad there was no alt-ctrl-backspace escape to debugger compiled >> in. I'll do that with the next kernel, just in case. >>=20 >> So the only data point I can give is the ^T output above. >=20 > We don't know what FreeBSD version you're using (specifically uname -a > output, since build date matters), but if it's RELENG_8 with ZFS v15, > you might check out this thread (be sure to read Kai and I's diagnoses Sorry about that. I'm running Stable on this box, as of last tuesday so thats v15, but = the disks are still at v14.=20 >=20 > http://lists.freebsd.org/pipermail/freebsd-fs/2010-October/009687.html I'll check It out. > I'm in the process of moving all of my machines, including my home > server, over to gmirror. (Home machine started showing signs of = serious > ZFS performance degredation; mutt doing a stat() on 24 files and > directories total taking literally 0.4 seconds on a dual-core machine. > Makes no sense, doesn't happen with UFS2, I'm done.) Well, all new things require time, hard work and diligent testing. It is = no different than any new serious component added. Be It the migration = to real multi processor, or giant removal. Both went through phases of better and not so better stability. Had we all given up, then there would not have been the current state of = freebsd. So i understand your feelings, but then i'm not running It on super = essential servers and not giving up so easily. --WjW =20= From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 20:57:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B62B5106564A for ; Sun, 10 Oct 2010 20:57:42 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from fep23.mx.upcmail.net (fep23.mx.upcmail.net [62.179.121.43]) by mx1.freebsd.org (Postfix) with ESMTP id EEB078FC14 for ; Sun, 10 Oct 2010 20:57:41 +0000 (UTC) Received: from edge04.upcmail.net ([192.168.13.239]) by viefep15-int.chello.at (InterMail vM.8.01.02.02 201-2260-120-106-20100312) with ESMTP id <20101010203921.CBQB1472.viefep15-int.chello.at@edge04.upcmail.net>; Sun, 10 Oct 2010 22:39:21 +0200 Received: from pinky ([213.46.23.80]) by edge04.upcmail.net with edge id H8fK1f00m1jgp3H048fLiF; Sun, 10 Oct 2010 22:39:21 +0200 X-SourceIP: 213.46.23.80 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "Willem Jan Withagen" , "Jeremy Chadwick" References: <4CB1DD0F.6000209@digiware.nl> <20101010193415.GA93540@icarus.home.lan> Date: Sun, 10 Oct 2010 22:39:19 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <20101010193415.GA93540@icarus.home.lan> User-Agent: Opera Mail/10.62 (Win32) X-Cloudmark-Analysis: v=1.1 cv=O+FWVpunvrlG1gSnSO6WiIQ7o0MJ4laHqrEcUJ8XjIg= c=1 sm=0 a=bgpUlknNv7MA:10 a=kj9zAlcOel0A:10 a=QycZ5dHgAAAA:8 a=6I5d2MoRAAAA:8 a=5leI9aT5W7OPK-t9VcYA:9 a=jYG_2NK6JWUZdG3Nez0A:7 a=rG5oTdzxozAb1bwx_sDAbIEyCJ8A:4 a=CjuIK1q_8ugA:10 a=LEW0jtIvgjIA:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS freeze/livelock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 20:57:42 -0000 On Sun, 10 Oct 2010 21:34:15 +0200, Jeremy Chadwick wrote: > On Sun, Oct 10, 2010 at 05:34:39PM +0200, Willem Jan Withagen wrote: >> Just had my FreeBSD freeze on me with what I would think is sort of >> an livelock.... >> >> While I was receiving zfs snapshots on my data pool. >> >> Top and systat just kept running, >> but anything getting near a shell (and perhaps disk-io) ended up in: >> >> root@zfs.digiware.nl# gpart create -s gpt da6 >> load: 0.00 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 26.12r >> 0.00u 0.00s 0% 2480k >> load: 0.10 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 96.01r >> 0.00u 0.00s 0% 2480k >> >> Trying to execute to execute shutdown -r now had no effect what so ever. >> Neither did the three-finger salute. >> (Well at least not in 60 sec I was willing to wait.) >> >> Only way out of this situation was hard-reset. And I do have to >> admit I like ZFS for the speed it recovers after unexpected reboot. >> >> To bad there was no alt-ctrl-backspace escape to debugger compiled >> in. I'll do that with the next kernel, just in case. >> >> So the only data point I can give is the ^T output above. > > We don't know what FreeBSD version you're using (specifically uname -a > output, since build date matters), but if it's RELENG_8 with ZFS v15, > you might check out this thread (be sure to read Kai and I's diagnoses): > > http://lists.freebsd.org/pipermail/freebsd-fs/2010-October/009687.html > > I'm in the process of moving all of my machines, including my home > server, over to gmirror. (Home machine started showing signs of serious > ZFS performance degredation; mutt doing a stat() on 24 files and > directories total taking literally 0.4 seconds on a dual-core machine. > Makes no sense, doesn't happen with UFS2, I'm done.) > Sorry to hear it didn't work out for you this time. But if you are running very important things on very fresh code you should make some testing stage or fail over to older versions available or be able to go back from backup or ... . At my company we roll out new minor version updates of mysql now and than, but always make sure we have an old version available. Our customers are more important than running the latest versions. Home machines are different with regards to having plenty of backup machines, but is it possible to give a developer an temporary account to debug this? That would help the project going forward. Ronald. From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 21:57:10 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3751A1065672 for ; Sun, 10 Oct 2010 21:57:10 +0000 (UTC) (envelope-from andriy@irbisnet.com) Received: from smtp102.rog.mail.re2.yahoo.com (smtp102.rog.mail.re2.yahoo.com [206.190.36.80]) by mx1.freebsd.org (Postfix) with SMTP id B4B028FC0C for ; Sun, 10 Oct 2010 21:57:09 +0000 (UTC) Received: (qmail 16583 invoked from network); 10 Oct 2010 21:57:08 -0000 Received: from smtp.irbisnet.com (andriy@99.235.226.221 with login) by smtp102.rog.mail.re2.yahoo.com with SMTP; 10 Oct 2010 14:57:08 -0700 PDT X-Yahoo-SMTP: dz9sigaswBA5kWoYWVTZrGHmIs2vaKgG1w-- X-YMail-OSG: Q6398G0VM1nbpdaXT4eqs65NIzULnUbp7u_i33WpuVDc3x_ s7wVNVSqlaB.CprQuRsPiQvZ4HkLHBb1HgJuZuTdrkSKzDMneCzIUfQJ0TFQ vMfLVJBFve12Gh1Ez7jpclDDX8UAhaPtgDA_5Vv1X5_1HDh6QqYkGNDzRUWQ sa6G.KkksOC9lgCOOvH7GPFcfn12Qx6pimBoZL_sGwH.TsabGsV5NsyIxTUk pLhudkgQ2MuET9_g9_lmjnMtTcfaHV5z51DPKcLLOCpcfxOjuuxhhkA2A6wf 9Er1Ur5h3alAP7d8Um2U4p485Gk6ByMfINMSi1Ilf2dnmmU2Z80jUXVDIN1B ko2syaHo8WS3ACuCevM9YwYirezGHSrHwh6j5kGsg4soAI5gK.M1iE_EPLFQ EPgCBBPcLQA-- X-Yahoo-Newman-Property: ymail-3 Received: from genie.irbisnet.lan (genie.irbisnet.lan [192.168.0.2]) by smtp.irbisnet.com (Postfix) with ESMTPSA id A1E26A628; Sun, 10 Oct 2010 17:57:07 -0400 (EDT) Message-Id: From: Andriy Bakay To: Andriy Gapon In-Reply-To: <4CB17658.5080808@icyb.net.ua> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Sun, 10 Oct 2010 17:57:04 -0400 References: <169A4F62-0509-4AE9-A4A5-F9CADD08140D@irbisnet.com> <4CB17658.5080808@icyb.net.ua> X-Mailer: Apple Mail (2.936) Cc: "freebsd-fs@freebsd.org" , "freebsd-stable@freebsd.org" Subject: Re: Serious zfs slowdown when mixed with another file system (ufs/msdosfs/etc.). X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 21:57:10 -0000 On 10-Oct-10, at 4:16 AM, Andriy Gapon wrote: > More convenient? :-) > Sorry, I always use "make buildword, etc", works 100% for me and I > find it very > convenient. > > -- > Andriy Gapon I mean some thing like freebsd-update for FreeBSD-STABLE monthly snapshots. You are right, update from sources is working perfectly and I used it before freebsd-update came along. But I found freebsd-update way more convenient. Anyway thank you all for all your tips. -- Andriy From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 22:00:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DFA77106564A for ; Sun, 10 Oct 2010 22:00:30 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id 46EB58FC13 for ; Sun, 10 Oct 2010 22:00:29 +0000 (UTC) Received: (qmail 99491 invoked from network); 11 Oct 2010 00:00:28 +0200 Received: from smtp.free.de (HELO orwell.free.de) ([91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 11 Oct 2010 00:00:28 +0200 References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> In-Reply-To: <4CB17983.3020907@icyb.net.ua> Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii Message-Id: Content-Transfer-Encoding: quoted-printable From: Kai Gallasch Date: Mon, 11 Oct 2010 00:00:28 +0200 To: Andriy Gapon X-Mailer: Apple Mail (2.1081) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 22:00:31 -0000 Am 10.10.2010 um 10:29 schrieb Andriy Gapon: > on 09/10/2010 16:37 Kai Gallasch said the following: >> I must repeat. I offer my help if someone wants to dig into the = locking problem. =20 >=20 > Kai, >=20 > I would like to look into this. > Can you provide shell access to a system that exhibits the behavior? > Or even better serial console for remote debugging, if possible. Andriy, glad to hear this :) I can give you root access and serial = console access to a secondary server, that also showed the same lockups, = like the big server. I'll mail you the details. > Also, can you first try with the very latest stable/8 or even head? Yes, I can do so if it's what you want, but wouldn't you loose the = chance to identify the real reason for the lockups? If a new kernel with your changes to ZFS sources makes the locking = situation disappear, could we be sure that the real reason for the = stuck, unkillable processes was fixed? My proposal: I'll do my best to trigger the known bug on the current = kernel and let you know. You have a look at this and when you are done = gathering data I'll build a new kernel and see if the problem persists. > I have recently MFC-ed a few improvements to ZFS code. The server you will connect to runs on a 8.1-STABLE from Mon Oct 4 = 01:19:41 CEST 2010. So your commits were after Oct 4th? = http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts= /common/fs/zfs/?sortby=3Ddate#dirlist Or is this the wrong place in the src tree? Kai.= From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 10:42:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64FA0106564A for ; Mon, 11 Oct 2010 10:42:52 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id 2355E8FC12 for ; Mon, 11 Oct 2010 10:42:50 +0000 (UTC) Received: (qmail 82500 invoked from network); 11 Oct 2010 12:42:49 +0200 Received: from smtp.free.de (HELO orwell.free.de) ([91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 11 Oct 2010 12:42:49 +0200 From: Kai Gallasch Content-Type: text/plain; charset=us-ascii Message-Id: Date: Mon, 11 Oct 2010 12:42:49 +0200 To: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) Subject: UFSJ project - some questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 10:42:52 -0000 Hi. Some days ago I read about Jeff Roberson's UFSJ (UFS + Journaling) = project, which wants to bring ufs journaling to FreeBSD. I read through all the old status reports and also = http://jeffr-tech.livejournal.com/ but cannot find current status = information about the project. On http://svn.freebsd.org/viewvc/base/projects/suj/ I found the svn = repository of the suj project. Seems an 8-STABLE branch is also available there. I'm very interested in testing suj on 8-STABLE and providing feedback. - What is the current status of the project? - Is 8-STABLE really tracked and maintained? (last commits 2 months old) - How can I get http://svn.freebsd.org/viewvc/base/projects/suj/8/ into = an 8-STABLE source tree? svn checkout -r210042 svn://svn.freebsd.org/base/projects/suj/8/ = /usr/src ?? Regards, Kai Gallasch.= From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 11:06:54 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7DD410656AE for ; Mon, 11 Oct 2010 11:06:54 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5E6078FC23 for ; Mon, 11 Oct 2010 11:06:54 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o9BB6sJ7037545 for ; Mon, 11 Oct 2010 11:06:54 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o9BB6r4Q037543 for freebsd-fs@FreeBSD.org; Mon, 11 Oct 2010 11:06:53 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 11 Oct 2010 11:06:53 GMT Message-Id: <201010111106.o9BB6r4Q037543@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 11:06:54 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/151082 fs [zfs] [patch] sappend-flaged files on ZFS not working o kern/150910 fs [nfs] wsize=16384 on udp nfs mount unusable o kern/150796 fs [panic] [suj] [ufs] [softupdates] Panic on portbuild o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149855 fs [gvinum] growfs causes fsck to report errors in Filesy o kern/149495 fs [zfs] chflags sappend on zfs not working right o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149022 fs [hang] File system operations hangs with suspfs state o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/147292 fs [nfs] [patch] readahead missing in nfs client options o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/146375 fs [nfs] [patch] Typos in macro variables names in sys/fs s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c o kern/144458 fs [nfs] [patch] nfsd fails as a kld p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143345 fs [ext2fs] [patch] extfs minor header cleanups to better o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142924 fs [ext2fs] [patch] Small cleanup for the inode struct in o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs o bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/139363 fs [nfs] diskless root nfs mount from non FreeBSD server o kern/138790 fs [zfs] ZFS ceases caching when mem demand is high o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic o kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135667 fs [lor] LORs causing ufs filesystem corruption on XEN Do o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [panic] panic: ffs_truncate: read-only filesystem o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS p kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [ffs] [snapshot] System crashes when manipulat o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp f kern/115645 fs [ffs] [snapshots] [panic] lockmgr: thread 0xc4c00d80, o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o bin/94635 fs snapinfo(8)/libufs only works for disk-backed filesyst o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 208 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 15:15:12 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 07D74106564A for ; Mon, 11 Oct 2010 15:15:12 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id E0C0F8FC1C for ; Mon, 11 Oct 2010 15:15:11 +0000 (UTC) Received: from omta08.emeryville.ca.mail.comcast.net ([76.96.30.12]) by qmta09.emeryville.ca.mail.comcast.net with comcast id HRXt1f0070FhH24A9TFBSG; Mon, 11 Oct 2010 15:15:11 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta08.emeryville.ca.mail.comcast.net with comcast id HTF91f00B3LrwQ28UTF9mY; Mon, 11 Oct 2010 15:15:10 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 07FFC9B418; Mon, 11 Oct 2010 08:15:09 -0700 (PDT) Date: Mon, 11 Oct 2010 08:15:09 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101011151508.GA10917@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB17983.3020907@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 15:15:12 -0000 On Sun, Oct 10, 2010 at 11:29:55AM +0300, Andriy Gapon wrote: > on 09/10/2010 16:37 Kai Gallasch said the following: > > I must repeat. I offer my help if someone wants to dig into the locking problem. > > I would like to look into this. > Can you provide shell access to a system that exhibits the behavior? > Or even better serial console for remote debugging, if possible. > > Also, can you first try with the very latest stable/8 or even head? > I have recently MFC-ed a few improvements to ZFS code. Andriy, If you need or want a secondary test box (with serial console access and kernel debugger, just not remote gdb/kgdb), I have one available which I can build (would take me only part of the morning) mimicking production. Let me know if you want/need a secondary test bed. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 15:25:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB304106564A for ; Mon, 11 Oct 2010 15:25:47 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2FE4F8FC12 for ; Mon, 11 Oct 2010 15:25:46 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA27617; Mon, 11 Oct 2010 18:25:42 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB32C75.2060000@icyb.net.ua> Date: Mon, 11 Oct 2010 18:25:41 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> In-Reply-To: <20101011151508.GA10917@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 15:25:47 -0000 on 11/10/2010 18:15 Jeremy Chadwick said the following: > On Sun, Oct 10, 2010 at 11:29:55AM +0300, Andriy Gapon wrote: >> on 09/10/2010 16:37 Kai Gallasch said the following: >>> I must repeat. I offer my help if someone wants to dig into the locking problem. >> >> I would like to look into this. >> Can you provide shell access to a system that exhibits the behavior? >> Or even better serial console for remote debugging, if possible. >> >> Also, can you first try with the very latest stable/8 or even head? >> I have recently MFC-ed a few improvements to ZFS code. > > Andriy, > > If you need or want a secondary test box (with serial console access and > kernel debugger, just not remote gdb/kgdb), I have one available which I > can build (would take me only part of the morning) mimicking production. > Let me know if you want/need a secondary test bed. Jeremy, were are in a process of debugging this issue with Kai and I think that we are onto something. I would like to ask you for some additional testing. What kind of workload do your run? Do you have anything using sendfile(2)? E.g. Apache with EnableSendfile enabled (it might be by default, without explicit options). Can you try to disable sendfile(2) use, reboot and see how system behaves? If you still experience the same problem after doing the above, then I'd like to ask you for shell access with root privileges; or establishing communication via IM and running some commands for me. Thank you! -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 15:40:24 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 321631065673 for ; Mon, 11 Oct 2010 15:40:24 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 6346F8FC0C for ; Mon, 11 Oct 2010 15:40:21 +0000 (UTC) Received: by qyk30 with SMTP id 30so438017qyk.13 for ; Mon, 11 Oct 2010 08:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=fHmw2MPTusZkBBvAjiNij6RjUYAw1hWQBfAkaM0T2H4=; b=Ub7qwlT7PnKZJ/l2Kk1fQSqQy3M7hrub7VGAmNG7q4uOuhO11FBpsP2Ii816ivn5PE dm9iXHHN59Ny60sSKEOAFAjQHm1DkjW+8bimHD3qcHJmXOh8gfj1KU8UEXqCgi9u2T5h 44pJteakZh2XmMo1BAEcg80ktDM5pqsq8up6U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=gxmFFn3dYW6lIyfpxDyDUSXDr6h35NSPfgsEdLqfv9Z0nK6HUDK3qj8L88DViwKcti fT64Li44udDhchzTPbPFZ15wXxaW+R4+oJgiA2A9gcI/C3jOVHnP9Bx1Qcyz/nRF5vf9 mkcdNKGcici9+B8Olc8hxJothQ+I5Z8Ur943o= MIME-Version: 1.0 Received: by 10.224.28.77 with SMTP id l13mr3729270qac.198.1286811611269; Mon, 11 Oct 2010 08:40:11 -0700 (PDT) Received: by 10.229.227.131 with HTTP; Mon, 11 Oct 2010 08:40:11 -0700 (PDT) In-Reply-To: <4CB32C75.2060000@icyb.net.ua> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> Date: Mon, 11 Oct 2010 10:40:11 -0500 Message-ID: From: "Sam Fourman Jr." To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 15:40:24 -0000 > Jeremy, > > were are in a process of debugging this issue with Kai and I think that w= e are > onto something. =A0I would like to ask you for some additional testing. > > What kind of workload do your run? > Do you have anything using sendfile(2)? =A0E.g. Apache with EnableSendfil= e enabled > (it might be by default, without explicit options). > Can you try to disable sendfile(2) use, reboot and see how system behaves= ? > > If you still experience the same problem after doing the above, then I'd = like to > ask you for shell access with root privileges; or establishing communicat= ion via > IM and running some commands for me. > > Thank you! > -- > Andriy Gapon > _______________________________________________ Andriy, I am not sure if this is the same issue, but I have a similar problem running FreeBSD 9-CURRENT amd64 (sources as of 1 hour ago) a zfs lockup appears very frequently, it is a NFS server it locks up several times per day. I have been having these issues since before the ZFSv15 patch was committed to HEAD a few months ago. let me know if you need details, I have no problems giving you root ssh on this machine... but it does not have a serial port. so I am not sure how useful it would be. --=20 Sam Fourman Jr. Fourman Networks http://www.fourmannetworks.com From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 15:45:56 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 937A8106566C for ; Mon, 11 Oct 2010 15:45:56 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D6AE58FC15 for ; Mon, 11 Oct 2010 15:45:55 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA28003; Mon, 11 Oct 2010 18:45:53 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB33130.5050400@icyb.net.ua> Date: Mon, 11 Oct 2010 18:45:52 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: "Sam Fourman Jr." References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 15:45:56 -0000 on 11/10/2010 18:40 Sam Fourman Jr. said the following: > I am not sure if this is the same issue, but > I have a similar problem running FreeBSD 9-CURRENT amd64 (sources as > of 1 hour ago) > a zfs lockup appears very frequently, it is a NFS server it locks up > several times per day. > > I have been having these issues since before the ZFSv15 patch was > committed to HEAD a few months ago. > let me know if you need details, I have no problems giving you root > ssh on this machine... but it does not have a serial port. > so I am not sure how useful it would be. Sam, the problem would be the same if you see processes/threads stuck in zfsmrb wait state forever. $ ps axHwwl | fgrep zfsmrb -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 15:46:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7F1C1065670 for ; Mon, 11 Oct 2010 15:46:50 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA11.westchester.pa.mail.comcast.net (qmta11.westchester.pa.mail.comcast.net [76.96.59.211]) by mx1.freebsd.org (Postfix) with ESMTP id 70CA68FC1B for ; Mon, 11 Oct 2010 15:46:49 +0000 (UTC) Received: from omta15.westchester.pa.mail.comcast.net ([76.96.62.87]) by QMTA11.westchester.pa.mail.comcast.net with comcast id HNhl1f0011swQuc5BTmqXb; Mon, 11 Oct 2010 15:46:50 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta15.westchester.pa.mail.comcast.net with comcast id HTmp1f0033LrwQ23bTmpxj; Mon, 11 Oct 2010 15:46:50 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id CCDB29B418; Mon, 11 Oct 2010 08:46:47 -0700 (PDT) Date: Mon, 11 Oct 2010 08:46:47 -0700 From: Jeremy Chadwick To: "Sam Fourman Jr." Message-ID: <20101011154647.GA11532@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 15:46:50 -0000 On Mon, Oct 11, 2010 at 10:40:11AM -0500, Sam Fourman Jr. wrote: > > Jeremy, > > > > were are in a process of debugging this issue with Kai and I think that we are > > onto something.  I would like to ask you for some additional testing. > > > > What kind of workload do your run? > > Do you have anything using sendfile(2)?  E.g. Apache with EnableSendfile enabled > > (it might be by default, without explicit options). > > Can you try to disable sendfile(2) use, reboot and see how system behaves? > > > > If you still experience the same problem after doing the above, then I'd like to > > ask you for shell access with root privileges; or establishing communication via > > IM and running some commands for me. > > > > Thank you! > > -- > > Andriy Gapon > > _______________________________________________ > > > Andriy, > > I am not sure if this is the same issue, but > I have a similar problem running FreeBSD 9-CURRENT amd64 (sources as > of 1 hour ago) > a zfs lockup appears very frequently, it is a NFS server it locks up > several times per day. > > I have been having these issues since before the ZFSv15 patch was > committed to HEAD a few months ago. > let me know if you need details, I have no problems giving you root > ssh on this machine... but it does not have a serial port. > so I am not sure how useful it would be. Are the processes which livelock (get stuck and cannot be killed, even with kill -9) stuck in states "zfs" or "zfsmrb"? You can check with "ps -axl" or top. If so, it's probably the same problem. If not, it may be a different problem. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 16:09:49 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6AE071065670 for ; Mon, 11 Oct 2010 16:09:49 +0000 (UTC) (envelope-from mwlucas@bewilderbeast.blackhelicopters.org) Received: from bewilderbeast.blackhelicopters.org (bewilderbeast.blackhelicopters.org [198.22.63.8]) by mx1.freebsd.org (Postfix) with ESMTP id 15C128FC13 for ; Mon, 11 Oct 2010 16:09:48 +0000 (UTC) Received: from bewilderbeast.blackhelicopters.org (localhost [127.0.0.1]) by bewilderbeast.blackhelicopters.org (8.14.4/8.14.4) with ESMTP id o9BFUpTr015721 for ; Mon, 11 Oct 2010 11:30:51 -0400 (EDT) (envelope-from mwlucas@bewilderbeast.blackhelicopters.org) Received: (from mwlucas@localhost) by bewilderbeast.blackhelicopters.org (8.14.4/8.14.4/Submit) id o9BFUp2O015720 for fs@freebsd.org; Mon, 11 Oct 2010 11:30:51 -0400 (EDT) (envelope-from mwlucas) Date: Mon, 11 Oct 2010 11:30:51 -0400 From: "Michael W. Lucas" To: fs@freebsd.org Message-ID: <20101011153051.GA15699@bewilderbeast.blackhelicopters.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.5 (bewilderbeast.blackhelicopters.org [127.0.0.1]); Mon, 11 Oct 2010 11:30:51 -0400 (EDT) Cc: Subject: hast crash X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 16:09:49 -0000 Hi, I upgraded my HAST cluster to 8.1-stable on 6 October 2010, and am now experiencing crashes in hastd. hastd debug output is showing: ... [DEBUG][2] [mirror] (secondary) recv: (0x8013ecc40) Got request header: WRITE(11752701952, 131072). [DEBUG][2] [mirror] (secondary) recv: (0x8013ecc40) Moving request to the disk queue. [DEBUG][2] [mirror] (secondary) disk: (0x8013ecc40) Got request: WRITE(11752701952, 131072). [DEBUG][2] [mirror] (secondary) recv: Taking free request. [DEBUG][2] [mirror] (secondary) recv: (0x8013ecbf0) Got request. [ERROR] [mirror] (secondary) Unable to receive request header: RPC version wrong. [DEBUG][1] Unable to receive event header: Socket is not connected. [DEBUG][1] Accepting connection to tcp4://0.0.0.0:8457. [INFO] Connection from tcp4://192.168.0.1:21493 to tcp4://192.168.0.2:8457. [DEBUG][2] tcp4://192.168.0.1:21493: resource=mirror [DEBUG][1] [mirror] (secondary) Initial connection from tcp4://192.168.0.1:21493. [DEBUG][1] [mirror] (secondary) Worker process exists (pid=8826), stopping it. [ERROR] [mirror] (secondary) Worker process exited ungracefully (pid=8826, exitcode=75). Assertion failed: (conn != NULL), function proto_close, file /usr/src/sbin/hastd/proto.c, line 287. Abort (core dumped) Both machines are running on VMWare ESXi. The second machine is a clone of the first. Any thoughts, folks? Thanks, ==ml -- Michael W. Lucas mwlucas@BlackHelicopters.org http://www.MichaelWLucas.com/, http://blather.MichaelWLucas.com/ New book available: Network Flow Analysis http://www.networkflowanalysis.com/ From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 17:48:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44BBF1065670 for ; Mon, 11 Oct 2010 17:48:19 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id DB9EE8FC16 for ; Mon, 11 Oct 2010 17:48:18 +0000 (UTC) Received: by gxk4 with SMTP id 4so255404gxk.13 for ; Mon, 11 Oct 2010 10:48:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=1AdEJ9CZUV7seSEomWpKppQ5PpJHDb/E+wdlQCokCkk=; b=OLYQWOh2r8FmQnPXDiNCYZwwB0uSU1jWa2DeTckU2b2NvbUj44SqpGu9tJfz8Uspst 2IRRl/CVQZllYYQvhGNGWaCkWUKLH6p7y9Pi6bHi3UJ1HFBTeON5BGBYBYz5NYte7GsU 1cxq3AzaW1iy9bKf0Eo/lzbRWGn8Oysn8Rhic= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=kYYYrhZ8zh4uahEcY7iYTCf9eNM+wyh1PoA7yq0Eu+2evyoLa0okcwOIt/N90F5+Db MZiTjPwHI3h9LsZ7rL8qZZ6lSYlApIXNqvd6dPSSPstg5YBnY6x/iONwMCDcQfZW/E/i SYsOV6oDnZBDOmRb9OYKPU4GpaM7yah5/5okY= MIME-Version: 1.0 Received: by 10.42.191.16 with SMTP id dk16mr1878354icb.506.1286819297824; Mon, 11 Oct 2010 10:48:17 -0700 (PDT) Received: by 10.231.167.65 with HTTP; Mon, 11 Oct 2010 10:48:17 -0700 (PDT) In-Reply-To: <20101011154647.GA11532@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011154647.GA11532@icarus.home.lan> Date: Mon, 11 Oct 2010 12:48:17 -0500 Message-ID: From: "Sam Fourman Jr." To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 17:48:19 -0000 On Mon, Oct 11, 2010 at 10:46 AM, Jeremy Chadwick wrote: > On Mon, Oct 11, 2010 at 10:40:11AM -0500, Sam Fourman Jr. wrote: >> > Jeremy, >> > >> > were are in a process of debugging this issue with Kai and I think tha= t we are >> > onto something. =A0I would like to ask you for some additional testing= . >> > >> > What kind of workload do your run? >> > Do you have anything using sendfile(2)? =A0E.g. Apache with EnableSend= file enabled >> > (it might be by default, without explicit options). >> > Can you try to disable sendfile(2) use, reboot and see how system beha= ves? >> > >> > If you still experience the same problem after doing the above, then I= 'd like to >> > ask you for shell access with root privileges; or establishing communi= cation via >> > IM and running some commands for me. >> > >> > Thank you! >> > -- >> > Andriy Gapon >> > _______________________________________________ >> >> >> Andriy, >> >> I am not sure if this is the same issue, but >> I have a similar problem running FreeBSD 9-CURRENT amd64 (sources as >> of 1 hour ago) >> a zfs lockup appears very frequently, it is a NFS server it locks up >> several times per day. >> >> I have been having these issues since before the ZFSv15 patch was >> committed to HEAD a few months ago. >> let me know if you need details, I have no problems giving you root >> ssh on this machine... but it does =A0not have a serial port. >> so I am not sure how useful it would be. > > Are the processes which livelock (get stuck and cannot be killed, even > with kill -9) stuck in states "zfs" or "zfsmrb"? =A0You can check with "p= s > -axl" or top. > > If so, it's probably the same problem. =A0If not, it may be a different > problem. > > -- ok so in ~3 hours I recived yet another zfs lockup I was in the middle of a scrub and zpool status -v, will not return FNFS# FNFS# zpool status -v pool: Network state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 2h1m, 14.52% done, 11h52m to go config: NAME STATE READ WRITE CKSUM Network ONLINE 0 0 0 mirror ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 mirror ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 mirror ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 mirror ONLINE 0 0 0 da11 ONLINE 0 0 0 da12 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /Network/tv/Sam-tv/Lost/Lost.S05E08.HDTV.XviD-XOR.avi /Network/jail/mirror/usr/local/man/man3/XChangeDeviceProperty.3.gz /Network/jail/mirror/usr/local/man/man3/XForceScreenSaver.3.gz pool: zFNFS state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zFNFS ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 gpt/disk2 ONLINE 0 0 0 gpt/disk3 ONLINE 0 0 0 errors: No known data errors FNFS# FNFS# FNFS# zpool status -v FNFS# ps axHwwl UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMA= ND 0 0 0 0 -16 0 0 2608 sched DLs ?? 1:31.59 [kern= el] 0 0 0 0 8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 8 0 0 2608 - DLs ?? 0:00.01 [kern= el] 0 0 0 0 -68 0 0 2608 - DLs ?? 0:01.39 [kern= el] 0 0 0 0 -84 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -84 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -84 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -84 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 deadlk DLs ?? 0:00.23 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.03 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.01 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.02 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.07 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.07 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.10 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.10 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.10 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.07 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.07 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.15 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.04 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 6:04.82 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 6:04.74 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 6:04.78 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.62 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.60 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.62 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.14 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.06 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.05 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -16 0 0 2608 - DLs ?? 0:00.02 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.18 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 0 0 0 -8 0 0 2608 - DLs ?? 0:00.00 [kern= el] 0 1 0 0 44 0 7296 436 wait ILs ?? 0:00.02 /sbin/init -- 0 2 0 0 -8 0 0 16 - DL ?? 0:00.44 [g_ev= ent] 0 3 0 0 -8 0 0 16 - DL ?? 0:50.05 [g_up= ] 0 4 0 0 -8 0 0 16 - DL ?? 1:25.38 [g_do= wn] 0 5 0 0 -8 0 0 128 arc_re DL ?? 0:01.49 [zfsk= ern] 0 5 0 0 -8 0 0 128 l2arc_ DL ?? 0:00.04 [zfsk= ern] 0 5 0 0 -8 0 0 128 tx->tx DL ?? 0:00.00 [zfsk= ern] 0 5 0 0 -8 0 0 128 tx->tx DL ?? 0:00.07 [zfsk= ern] 0 5 0 0 -8 0 0 128 tx->tx DL ?? 0:00.00 [zfsk= ern] 0 5 0 0 -8 0 0 128 spa->s DL ?? 5:04.17 [zfsk= ern] 0 6 0 0 -16 0 0 16 waitin DL ?? 0:00.00 [sctp_iterator] 0 7 0 0 76 0 0 16 ccb_sc DL ?? 0:05.10 [xpt_= thrd] 0 8 0 0 -16 0 0 16 psleep DL ?? 0:00.01 [pagedaemon] 0 9 0 0 -16 0 0 16 psleep DL ?? 0:00.00 [vmda= emon] 0 10 0 0 -16 0 0 16 audit_ DL ?? 0:00.00 [audi= t] 0 11 0 0 171 0 0 64 - RL ?? 197:54.91 [idle= ] 0 11 0 0 171 0 0 64 - RL ?? 195:59.68 [idle= ] 0 11 0 0 171 0 0 64 - RL ?? 196:58.20 [idle= ] 0 11 0 0 171 0 0 64 - RL ?? 170:17.14 [idle= ] 0 12 0 0 -44 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -32 0 0 336 - WL ?? 0:07.07 [intr= ] 0 12 0 0 -32 0 0 336 - WL ?? 0:00.89 [intr= ] 0 12 0 0 -32 0 0 336 - WL ?? 0:00.42 [intr= ] 0 12 0 0 -32 0 0 336 - WL ?? 0:00.17 [intr= ] 0 12 0 0 -36 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -24 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -24 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -40 0 0 336 - WL ?? 8:54.70 [intr= ] 0 12 0 0 -28 0 0 336 - WL ?? 0:00.04 [intr= ] 0 12 0 0 -52 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 31:08.03 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -64 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -60 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -48 0 0 336 - WL ?? 0:00.00 [intr= ] 0 12 0 0 -60 0 0 336 - WL ?? 0:00.00 [intr= ] 0 13 0 0 44 0 0 16 - DL ?? 0:06.17 [yarr= ow] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.07 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.08 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.09 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.08 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.09 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.09 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -68 0 0 448 - DL ?? 0:00.00 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.06 [usb] 0 14 0 0 -64 0 0 448 - DL ?? 0:00.00 [usb] 0 15 0 0 44 0 0 16 pgzero DL ?? 0:00.00 [page= zero] 0 16 0 0 -16 0 0 16 psleep DL ?? 0:00.04 [bufdaemon] 0 17 0 0 44 0 0 16 vlruwt DL ?? 0:00.27 [vnlr= u] 0 18 0 0 44 0 0 16 zio->i DL ?? 0:04.80 [sync= er] 0 19 0 0 -16 0 0 16 sdflus DL ?? 0:00.10 [softdepflush] 0 20 0 0 -16 0 0 16 flowcl DL ?? 0:00.04 [flowcleaner] 0 158 1 0 76 0 2892 824 pause Is ?? 0:00.00 adjkerntz -i 0 743 1 0 44 0 7296 744 select Is ?? 0:00.00 /sbin= /devd 0 938 1 0 44 0 11284 1352 select Ss ?? 0:00.03 /usr/sbin/syslogd -s 0 962 1 0 44 0 12212 1624 select Ss ?? 0:00.02 /usr/sbin/rpcbind 0 1043 1 0 76 0 10056 1288 pause Is ?? 0:00.00 nfsuserd: master (nfsuserd) 0 1044 1043 0 44 0 10056 1300 select S ?? 0:00.01 nfsuserd: slave (nfsuserd) 0 1045 1043 0 44 0 10056 1300 select S ?? 0:00.01 nfsuserd: slave (nfsuserd) 0 1046 1043 0 44 0 10056 1300 select S ?? 0:00.01 nfsuserd: slave (nfsuserd) 0 1047 1043 0 44 0 10056 1300 select S ?? 0:00.01 nfsuserd: slave (nfsuserd) 0 1066 1 0 44 0 11280 1924 select Is ?? 0:00.06 /usr/sbin/mountd -e -r -l /etc/exports /etc/zfs/exports 0 1074 1 0 50 0 10052 1564 select Is ?? 0:00.02 nfsd: master (nfsd) 0 1081 1074 0 44 0 10052 1080 rpcsvc S ?? 0:00.03 nfsd: server (nfsd) 0 1081 1074 0 44 0 10052 1080 rpcsvc S ?? 0:00.03 nfsd: server (nfsd) 0 1081 1074 0 44 0 10052 1080 rpcsvc S ?? 0:00.03 nfsd: server (nfsd) 0 1081 1074 0 44 0 10052 1080 rpcsvc S ?? 0:00.03 nfsd: server (nfsd) 0 1082 1 0 44 0 273396 1452 select Ss ?? 0:00.01 /usr/sbin/rpc.statd 0 1089 1 0 76 0 12340 1424 rpcsvc Ss ?? 0:00.01 /usr/sbin/rpc.lockd 0 1249 1 0 44 0 30548 3060 select Is ?? 0:00.00 /usr/sbin/sshd 0 1257 1 0 44 0 16408 3392 select Ss ?? 0:00.12 sendmail: accepting connections (sendmail) 25 1263 1 0 44 0 16408 3240 pause Is ?? 0:00.00 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail) 0 1270 1 0 44 0 12212 1412 nanslp Is ?? 0:00.02 /usr/sbin/cron -s 0 1469 1 0 44 0 6888 1208 select SsJ ?? 0:00.02 /usr/sbin/syslogd -s 70 1594 1 0 44 0 60780 5612 select SsJ ?? 0:00.14 /usr/local/bin/postgres -D /usr/local/pgsql/data 0 1618 1 0 76 0 26004 3036 select IsJ ?? 0:00.00 /usr/sbin/sshd 0 1625 1 0 44 0 11976 3360 select SsJ ?? 0:00.14 sendmail: accepting connections (sendmail) 25 1631 1 0 44 0 11976 3068 pause IsJ ?? 0:00.00 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail) 0 1638 1 0 44 0 7944 1276 nanslp IsJ ?? 0:00.02 /usr/sbin/cron -s 0 1801 1 0 44 0 6888 1200 select SsJ ?? 0:00.02 /usr/sbin/syslogd -s 1002 1867 1 0 76 0 28376 1852 accept IsJ ?? 0:00.00 /usr/local/bin/svnserve -d --listen-port=3D3690 --listen-host 0.0.0.0 -r /usr/local/svn 70 1916 1 0 44 0 53492 5412 select SsJ ?? 0:00.14 /usr/local/bin/postgres -D /usr/local/pgsql/data 0 1948 1 0 76 0 26004 3056 select IsJ ?? 0:00.00 /usr/sbin/sshd 0 1955 1 0 44 0 11976 3096 select SsJ ?? 0:00.14 sendmail: accepting connections (sendmail) 25 1961 1 0 44 0 11976 2980 pause IsJ ?? 0:00.00 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail) 0 1967 1 0 44 0 7944 1264 nanslp IsJ ?? 0:00.02 /usr/sbin/cron -s 0 2118 1 0 44 0 6888 1200 select IsJ ?? 0:00.02 /usr/sbin/syslogd -s 1002 2237 1 0 76 0 6704 1980 select IsJ ?? 0:00.00 /usr/local/sbin/cvsupd -e -C 8 -l @daemon -b /usr/local/etc/cvsup -s sup.client 0 2257 1 0 44 0 11976 2856 select SsJ ?? 0:00.14 sendmail: accepting connections (sendmail) 25 2263 1 0 44 0 11976 2796 pause IsJ ?? 0:00.00 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail) 0 2269 1 0 44 0 7944 1256 nanslp IsJ ?? 0:00.02 /usr/sbin/cron -s 0 2375 1 0 76 0 13268 1544 select Is ?? 0:00.00 /usr/sbin/inetd -wW -C 60 70 2419 1594 0 44 0 60780 5680 select SsJ ?? 0:01.35 postgres: writer process (postgres) 70 2420 1594 0 44 0 60780 5664 select SsJ ?? 0:00.73 postgres: wal writer process (postgres) 70 2421 1594 0 44 0 60780 5688 zio->i DsJ ?? 0:00.15 postgres: autovacuum launcher process (postgres) 70 2422 1594 0 44 0 23788 3704 select SsJ ?? 0:00.12 postgres: stats collector process (postgres) 70 2425 1916 0 44 0 53492 5520 select SsJ ?? 0:01.19 postgres: writer process (postgres) 70 2426 1916 0 44 0 53492 5512 select SsJ ?? 0:00.68 postgres: wal writer process (postgres) 70 2427 1916 0 44 0 53492 5492 zio->i DsJ ?? 0:00.16 postgres: autovacuum launcher process (postgres) 70 2428 1916 0 44 0 24900 3624 select SsJ ?? 0:00.12 postgres: stats collector process (postgres) 0 2621 1249 0 44 0 42480 4228 select Is ?? 0:00.03 sshd: root@pts/0 (sshd) 0 3711 1967 0 44 0 7944 1332 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3712 3711 0 44 0 7940 952 zio->i DsJ ?? 0:00.00 /usr/libexec/atrun 0 3713 2269 0 44 0 7944 1328 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3714 3713 0 44 0 7940 936 zio->i DsJ ?? 0:00.00 /usr/libexec/atrun 0 3717 1638 0 44 0 7944 1336 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3718 3717 0 44 0 7940 952 zio->i DsJ ?? 0:00.00 /usr/libexec/atrun 0 3719 1967 0 44 0 7944 1332 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3720 3719 0 44 0 7944 1332 zio->i DVsJ ?? 0:00.00 cron: running job (cron) 0 3721 2269 0 44 0 7944 1328 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3722 3721 0 44 0 7944 1328 zio->i DVsJ ?? 0:00.00 cron: running job (cron) 0 3735 1638 0 44 0 7944 1336 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3736 3735 0 44 0 7944 1336 zio->i DVsJ ?? 0:00.00 cron: running job (cron) 0 3737 1967 0 44 0 7944 1332 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3738 3737 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3739 2269 0 44 0 7944 1328 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3740 3739 0 44 0 4884 824 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3743 1638 0 44 0 7944 1336 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3744 3743 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3745 1967 0 44 0 7944 1332 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3746 3745 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3747 2269 0 44 0 7944 1328 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3748 3747 0 44 0 4884 824 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3751 1638 0 44 0 7944 1336 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3752 3751 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3753 1967 0 44 0 7944 1332 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3754 3753 0 44 0 7944 1332 db->db DVsJ ?? 0:00.00 cron: running job (cron) 0 3755 2269 0 44 0 7944 1328 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3756 3755 0 44 0 7944 1328 db->db DVsJ ?? 0:00.00 cron: running job (cron) 0 3769 1638 0 44 0 7944 1336 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3770 3769 0 44 0 7944 1336 db->db DVsJ ?? 0:00.00 cron: running job (cron) 0 3771 1967 0 44 0 7944 1332 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3772 3771 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3773 2269 0 44 0 7944 1328 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3774 3773 0 44 0 4884 824 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3777 1638 0 44 0 7944 1336 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3778 3777 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3779 1967 0 44 0 7944 1332 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3780 3779 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3781 2269 0 44 0 7944 1328 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3782 3781 0 44 0 4884 824 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3785 1638 0 44 0 7944 1336 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3786 3785 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3788 1249 0 44 0 42480 4344 select Ss ?? 0:00.02 sshd: root@pts/1 (sshd) 0 3796 1967 0 44 0 7944 1332 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3797 3796 0 44 0 7944 1332 db->db DVsJ ?? 0:00.00 cron: running job (cron) 0 3798 2269 0 44 0 7944 1328 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3799 3798 0 44 0 7944 1328 db->db DVsJ ?? 0:00.00 cron: running job (cron) 0 3812 1638 0 44 0 7944 1336 ppwait DJ ?? 0:00.00 cron: running job (cron) 2 3813 3812 0 44 0 7944 1336 db->db DVsJ ?? 0:00.00 cron: running job (cron) 0 3818 1967 0 44 0 7944 1332 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3819 3818 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3820 2269 0 44 0 7944 1328 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3821 3820 0 44 0 4884 824 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 3824 1638 0 44 0 7944 1336 piperd IJ ?? 0:00.00 cron: running job (cron) 1 3825 3824 0 44 0 4884 840 zfs DsJ ?? 0:00.00 /usr/libexec/atrun 0 2409 1 0 76 0 11152 1160 ttyin Is+ v0 0:00.00 /usr/libexec/getty Pc ttyv0 0 2410 1 0 76 0 11152 1160 ttyin Is+ v1 0:00.00 /usr/libexec/getty Pc ttyv1 0 2411 1 0 76 0 11152 1160 ttyin Is+ v2 0:00.00 /usr/libexec/getty Pc ttyv2 0 2412 1 0 76 0 11152 1160 ttyin Is+ v3 0:00.00 /usr/libexec/getty Pc ttyv3 0 2413 1 0 76 0 11152 1160 ttyin Is+ v4 0:00.00 /usr/libexec/getty Pc ttyv4 0 2414 1 0 76 0 11152 1160 ttyin Is+ v5 0:00.00 /usr/libexec/getty Pc ttyv5 0 2415 1 0 76 0 11152 1160 ttyin Is+ v6 0:00.00 /usr/libexec/getty Pc ttyv6 0 2416 1 0 76 0 11152 1160 ttyin Is+ v7 0:00.00 /usr/libexec/getty Pc ttyv7 0 2624 2621 0 44 0 14556 2700 pause Is 0 0:00.03 -csh = (csh) 0 3787 2624 0 44 0 19948 1600 zio->i D+ 0 0:00.00 zpool status -v 0 3792 3788 0 44 0 14556 2724 pause Ss 1 0:00.02 -csh = (csh) 0 3827 3792 0 44 0 12272 1604 - R+ 1 0:00.00 ps ax= Hwwl FNFS# --=20 Sam Fourman Jr. Fourman Networks http://www.fourmannetworks.com From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 18:09:06 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A218106566B for ; Mon, 11 Oct 2010 18:09:06 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C10C78FC1B for ; Mon, 11 Oct 2010 18:09:05 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA00356; Mon, 11 Oct 2010 21:09:01 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB352BC.9020808@icyb.net.ua> Date: Mon, 11 Oct 2010 21:09:00 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: "Sam Fourman Jr." References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011154647.GA11532@icarus.home.lan> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 18:09:07 -0000 on 11/10/2010 20:48 Sam Fourman Jr. said the following: > ok so in ~3 hours I recived yet another zfs lockup > > I was in the middle of a scrub and zpool status -v, will not return It looks like something totally different. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 18:13:44 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1B931065863; Mon, 11 Oct 2010 18:13:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 815FB8FC12; Mon, 11 Oct 2010 18:13:43 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 27CEC46B0C; Mon, 11 Oct 2010 14:13:43 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 34DC58A009; Mon, 11 Oct 2010 14:13:42 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Mon, 11 Oct 2010 11:50:00 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20101004123725.65d09b9e.daichi@ongs.co.jp> <20101005153926.88b4c1e1.daichi@freebsd.org> In-Reply-To: <20101005153926.88b4c1e1.daichi@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201010111150.00785.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 11 Oct 2010 14:13:42 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: Daichi GOTO , freebsd-current@freebsd.org, Garrett Cooper Subject: Re: fcntl always fails to delete lock file, and PID is always -6464 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 18:13:44 -0000 On Tuesday, October 05, 2010 2:39:26 am Daichi GOTO wrote: > Next step discussion engaged from this research I guess. > > Should we do change FreeBSD's fcntl(2) to return correct l_pid > when called with F_SETLK? Or keep current behavior?? > I want to hear other developers ideas and suggetions. POSIX doesn't say that F_SETLK returns a valid l_pid, so I think FreeBSD's current behavior is fine. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 18:37:10 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 544561065679 for ; Mon, 11 Oct 2010 18:37:10 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id F2BFD8FC1E for ; Mon, 11 Oct 2010 18:37:09 +0000 (UTC) Received: from omta13.westchester.pa.mail.comcast.net ([76.96.62.52]) by qmta02.westchester.pa.mail.comcast.net with comcast id HQKA1f00117dt5G52WdAXW; Mon, 11 Oct 2010 18:37:10 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta13.westchester.pa.mail.comcast.net with comcast id HWd81f00Q3LrwQ23ZWd9RC; Mon, 11 Oct 2010 18:37:10 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6D5339B418; Mon, 11 Oct 2010 11:37:07 -0700 (PDT) Date: Mon, 11 Oct 2010 11:37:07 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101011183707.GA13925@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB32C75.2060000@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 18:37:10 -0000 On Mon, Oct 11, 2010 at 06:25:41PM +0300, Andriy Gapon wrote: > on 11/10/2010 18:15 Jeremy Chadwick said the following: > > On Sun, Oct 10, 2010 at 11:29:55AM +0300, Andriy Gapon wrote: > >> on 09/10/2010 16:37 Kai Gallasch said the following: > >>> I must repeat. I offer my help if someone wants to dig into the locking problem. > >> > >> I would like to look into this. > >> Can you provide shell access to a system that exhibits the behavior? > >> Or even better serial console for remote debugging, if possible. > >> > >> Also, can you first try with the very latest stable/8 or even head? > >> I have recently MFC-ed a few improvements to ZFS code. > > > > Andriy, > > > > If you need or want a secondary test box (with serial console access and > > kernel debugger, just not remote gdb/kgdb), I have one available which I > > can build (would take me only part of the morning) mimicking production. > > Let me know if you want/need a secondary test bed. > > Jeremy, > > were are in a process of debugging this issue with Kai and I think that we are > onto something. I would like to ask you for some additional testing. No problem. I'm in the process of setting up my testbed box now, and will be putting RELENG_8 on it + all the ports/configuration details that we use in production. Now to talk about the system which was seeing the zfs/zfsmrb problem... That system is production and has to remain usable/up. That machine has had ZFS removed from it entirely and now uses gmirror. Sorry, just one of those things where service uptime has priority. > What kind of workload do your run? > Do you have anything using sendfile(2)? E.g. Apache with EnableSendfile enabled > (it might be by default, without explicit options). The aforementioned system is a multi-role box: - Primary front-end webserver -- Apache 2.2.16 with ITK MPM (prefork) -- Apache is built with threading disabled -- PHP 5.3.3 is used - Primary DNS server (master, not slave) -- Using base system BIND, nothing crazy in the configuration - Primary mail server -- Using postfix - Primary shell and FTP server -- Using base system OpenSSH and base system ftpd Hardware-wise, the system is: - SYS: Supermicro SuperServer 5015M-T+B http://www.supermicro.com/products/system/1U/5015/SYS-5015M-T_.cfm - CPU: Intel Core 2 Duo E6420 (2.13GHz, 4MB cache, 1066FSB) - RAM: 8GB ECC (two 2GB pairs) - Disk ada0: 320GB, SATA300: OS disk and partial ZFS disk (mirror) - Disk ada1: 250GB, SATA300: ZFS disk (mirror) Filesystem layout: ada0s1a = 1GB = UFS2 = / ada0s1b = 16GB = swap ada0s1d = 16GB = UFS2+SU = /var ada0s1e = 4GB = UFS2+SU = /tmp ada0s1f = 8GB = UFS2+SU = /usr ada0s1g = 275GB = ZFS pool (mirror) ada1 = 320GB = ZFS pool (mirror) The ZFS mirror was therefore ~275GB in size (since ada0s1g was the smaller of the two devices in the pool). Only two ZFS filesystems were created: /home and /var/mail. All filesystem settings were default (no compression, etc.). The original ZFS pool and filesystems were created on ZFS v14 and upgraded using "zpool upgrade" and "zfs upgrade" after a fresh RELENG_8 OS was installed. The OS was completely reinstalled (not in-place upgraded), having previously run RELENG_7 (uptime: 221 days). I'd say 80% of the I/O happening on the box was induced either httpd and postfix/SpamAssassin. Regarding Apache: We do use sendfile. Here are the two settings in our httpd.conf which we use that may be relevant to ZFS: EnableMMAP on EnableSendfile on > Can you try to disable sendfile(2) use, reboot and see how system behaves? > If you still experience the same problem after doing the above, then I'd like to > ask you for shell access with root privileges; or establishing communication via > IM and running some commands for me. Well, the system is using gmirror + UFS2 (without SU) now, so I can't disable sendfile to see how it behaves. I'll let you do that on the testbed box once I get it up + configured. You'll have root-level access (both via serial console and SSH) to it, and you can reboot it as you please. If the testbed box explodes or bursts into flame, no biggie -- it's there for you to bang on. :-) The testbed box does have significantly different hardware and much less RAM (only 1GB), so it's not identical in capability. However, given the nature of the problem, I would think reproducing it would be easy, especially since Kai's box is significantly different from mine and we both saw the same problem. I'll let you know here in the thread when things are available, and then we can communicate privately (get your SSH public key, etc.). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 20:39:51 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35A8B1065674 for ; Mon, 11 Oct 2010 20:39:51 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id B7E5D8FC0C for ; Mon, 11 Oct 2010 20:39:50 +0000 (UTC) Received: by fxm12 with SMTP id 12so1073797fxm.13 for ; Mon, 11 Oct 2010 13:39:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=PBWwzfUI+/yH4DgjpjBLtGlStyTT+bjElnH4PFEdHbc=; b=U3RMHGr26d3HS/vFwIQJNsUOhm9nAfsOM89Mmv36tL70i359KedVE1dDSxZtLX/9oc Ds+uecMS/wSFTdZbPjz1dd7QRCP0wbJY+av4pSpIqea7dsrFXeDs08Zv+FFQ2o3Ogqcp taEtEWpLtga9xh+lAAG2s/8jugN/7eWU6kiz4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=cMCf0qMsNeVgF0Bzr4JeOrPfBnI1TzmLgPpoId2R8L/1PTd9vaXdeL0AKR4ZGucXoQ CPtQNvs+gnA/9vk2Bo2zuT87PNq1R8bPbkRuZwcYBLvaCkVXz3pK+40hF5N/vVWLg8gd UWYDjXwMnvNvucmP67qdfH1w3dZnOzGJWc1rY= Received: by 10.223.121.132 with SMTP id h4mr2143911far.2.1286827864557; Mon, 11 Oct 2010 13:11:04 -0700 (PDT) Received: from localhost ([95.69.174.185]) by mx.google.com with ESMTPS id k4sm2357651faa.32.2010.10.11.13.11.02 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 11 Oct 2010 13:11:03 -0700 (PDT) From: Mikolaj Golub To: "Michael W. Lucas" References: <20101011153051.GA15699@bewilderbeast.blackhelicopters.org> X-Comment-To: Michael W. Lucas Date: Mon, 11 Oct 2010 23:11:00 +0300 In-Reply-To: <20101011153051.GA15699@bewilderbeast.blackhelicopters.org> (Michael W. Lucas's message of "Mon, 11 Oct 2010 11:30:51 -0400") Message-ID: <868w24cwx7.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: fs@freebsd.org Subject: Re: hast crash X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 20:39:51 -0000 On Mon, 11 Oct 2010 11:30:51 -0400 Michael W. Lucas wrote: MWL> Hi, MWL> I upgraded my HAST cluster to 8.1-stable on 6 October 2010, and am now MWL> experiencing crashes in hastd. hastd debug output is showing: MWL> ... MWL> [DEBUG][2] [mirror] (secondary) recv: (0x8013ecc40) Got request header: WRITE(11752701952, 131072). MWL> [DEBUG][2] [mirror] (secondary) recv: (0x8013ecc40) Moving request to the disk queue. MWL> [DEBUG][2] [mirror] (secondary) disk: (0x8013ecc40) Got request: WRITE(11752701952, 131072). MWL> [DEBUG][2] [mirror] (secondary) recv: Taking free request. MWL> [DEBUG][2] [mirror] (secondary) recv: (0x8013ecbf0) Got request. MWL> [ERROR] [mirror] (secondary) Unable to receive request header: RPC version wrong. MWL> [DEBUG][1] Unable to receive event header: Socket is not connected. MWL> [DEBUG][1] Accepting connection to tcp4://0.0.0.0:8457. MWL> [INFO] Connection from tcp4://192.168.0.1:21493 to tcp4://192.168.0.2:8457. MWL> [DEBUG][2] tcp4://192.168.0.1:21493: resource=mirror MWL> [DEBUG][1] [mirror] (secondary) Initial connection from tcp4://192.168.0.1:21493. MWL> [DEBUG][1] [mirror] (secondary) Worker process exists (pid=8826), stopping it. MWL> [ERROR] [mirror] (secondary) Worker process exited ungracefully (pid=8826, exitcode=75). MWL> Assertion failed: (conn != NULL), function proto_close, file /usr/src/sbin/hastd/proto.c, line 287. This assertion has been fixed in r213579. MWL> Abort (core dumped) MWL> Both machines are running on VMWare ESXi. The second machine is a MWL> clone of the first. MWL> Any thoughts, folks? I would recommend upgrading sbin/hastd to current or wait a couple of days for MFC :-). -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 20:43:41 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C95B81065675 for ; Mon, 11 Oct 2010 20:43:41 +0000 (UTC) (envelope-from tobfr108@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5A1098FC08 for ; Mon, 11 Oct 2010 20:43:40 +0000 (UTC) Received: by ewy27 with SMTP id 27so2010241ewy.13 for ; Mon, 11 Oct 2010 13:43:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:content-type :content-transfer-encoding:subject:date:message-id:to:mime-version :x-mailer; bh=Z7h0b9e75hk7exDbc4902sS7ZfvWH7iQWUmzdGo5nUM=; b=elIt6VSWY/CN2hbTk+Tgqgfci4pwkyCVqcdhVkRQ8DQGlpMfK7IiNlIUJ4QEEhqsYr yhhgo1V9RJvNj8bpnkniY3LCE+MUBdWrv7lyk3s+YgQ4QlMWRmMkEIs9MvFwnlarPifo AxSV2fvH/gVHlS9r/YzYtTk50SwOi+dPHU6UA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; b=eGgsqhmWabvJQ5Wx3Q3MQqNbdwFFYA+acxwER/2viYY60fMiv8OvfUizInk5CZzoCV yFDdnHS3SO0e8bRMOHBJa//1ylhnQh84FY4Rpj1e9j6c04S+XH11Rj4EnhUYXVOYHZWr nuEz1+yoC2VApri1cOXWoR5DZmHz8YDF2jL7I= Received: by 10.213.80.140 with SMTP id t12mr2013530ebk.27.1286828326105; Mon, 11 Oct 2010 13:18:46 -0700 (PDT) Received: from [10.0.1.191] (h213n1-gl-a-a31.ias.bredband.telia.com [90.231.139.213]) by mx.google.com with ESMTPS id x54sm1452188eeh.17.2010.10.11.13.18.41 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 11 Oct 2010 13:18:42 -0700 (PDT) From: Tobias Fredriksson Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Mon, 11 Oct 2010 22:18:39 +0200 Message-Id: <6AE65535-DCCE-46A5-BBB9-358FEB34C18C@gmail.com> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) Subject: Growing large UFS over 16TB? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 20:43:41 -0000 Hi, So I have this UFS, unfortunately it was just done newfs /dev/da1 with = no regards to installing GPT or anything like that. It started out as a 6x2TB system raid, then we expanded with 2x2TB. Growing this filesystem was not possible with the current tool however = with the patch from http://masq.tychl.net/growfs.patch it worked fine. Now that I'm trying to grow it from 12TB to closer to 20TB it fails = after about 15 minutes. It displays all the super-block backups all the way to "39061769312" = then it sits there reading from the volume for about 5-10 minutes. The following error message is then output growfs: rdfs: attempting to read negative block number: Inappropriate = ioctl for device I understand the reason for this, its trying to read a block and the = integer just wrapped around. Nice. The relevant lines from growfs.c are static void rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi) { [...] if (bno < 0) { err(32, "rdfs: attempting to read negative block = number"); } [...] Just for fun I commented the if part out and recompiled. growfs: rdfs: read error: -4889807711788704476: Input/output error The only place that ufs2_daddr_t is defined is in = /usr/include/ufs/ufs/dinode.h typedef int64_t ufs2_daddr_t; So again for fun I changed this to u_int64_t. I also removed the = comments on that if part in growfs.c This caused the same message as last to be repeated. But not the = negative number. growfs: rdfs: read error: -4889807711788704476: Input/output error This leads me to believe that I'm at least doing something partially = right. The next part that growfs.c is doing in rdfs is n =3D read(fsi, bf, size); if (n !=3D (ssize_t)size) { err(34, "rdfs: read error: %jd", (intmax_t)bno); } So I changed "rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi)" to "rdfs(ufs2_daddr_t bno, size_t size, void *bf, u_int64_t fsi)" However this changed nothing. Same output. Since its failing out at the if sentance then of course the problem is = in ssize_t not being big enough. As such I of course changed the values for this in = /usr/include/machine/_types.h and I also checked out _limit.h in the = same dir. _types.h typedef __int64_t __ssize_t; to typedef __uint64_t __ssize_t; _limit.h #define __SSIZE_MAX __LONG_MAX /* max value for a ssize_t */ #define __SIZE_T_MAX __ULONG_MAX /* max value for a size_t */ to #define __SSIZE_MAX __ULLONG_MAX /* max value for a ssize_t */ #define __SIZE_T_MAX __ULLONG_MAX /* max value for a size_t */ However this failed to make any sort of change. I returned all of the = later values as nothing helped. So I'm turning to the fs gurus. At the moment I have no way of moving = the data off and creating this properly. As such changing to another fs = is not an option right now neither. So if anybody has any suggestions on what to do to temporarily fix the = issue until we can move the data of the raid and rebuild it properly, = please let me know. [root@stor1 /usr/src/sbin/growfs]# ./growfs /dev/da1 We strongly recommend you to make a backup before growing the Filesystem Did you backup your data (Yes/No) ? Yes new file systemsize is: 9765570560 frags Warning: 136832 sector(s) cannot be allocated. growfs: 19073314.0MB (39062145408 sectors) block size 16384, fragment = size 2048 using 103818 cylinder groups of 183.72MB, 11758 blks, 23552 = inodes. [...] growfs: rdfs: read error: -4889807711788704476: Input/output error Cheers= From owner-freebsd-fs@FreeBSD.ORG Mon Oct 11 21:52:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B00E4106566B for ; Mon, 11 Oct 2010 21:52:52 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id F2FB08FC28 for ; Mon, 11 Oct 2010 21:52:51 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA03064; Tue, 12 Oct 2010 00:52:17 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P5QI4-000LIR-Q3; Tue, 12 Oct 2010 00:52:16 +0300 Message-ID: <4CB3870F.7070107@icyb.net.ua> Date: Tue, 12 Oct 2010 00:52:15 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick , Kai Gallasch , Kostik Belousov References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> In-Reply-To: <20101011183707.GA13925@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Oct 2010 21:52:52 -0000 on 11/10/2010 21:37 Jeremy Chadwick said the following: > EnableMMAP on > EnableSendfile on Yes, it is it. Jeremy, Kai, could you please try to test this patch? http://people.freebsd.org/~avg/zfs-mappedread-sendfile.diff Kostik, could you please review it? Thanks! -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 03:21:58 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2533106564A; Tue, 12 Oct 2010 03:21:58 +0000 (UTC) (envelope-from daichi@ongs.co.jp) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90]) by mx1.freebsd.org (Postfix) with ESMTP id A04478FC12; Tue, 12 Oct 2010 03:21:58 +0000 (UTC) Received: from [192.168.2.3] (blexe.ongs.co.jp [202.216.246.93]) by natial.ongs.co.jp (Postfix) with ESMTPSA id DA2A112543B; Tue, 12 Oct 2010 12:21:56 +0900 (JST) References: <20101004123725.65d09b9e.daichi@ongs.co.jp> <20101005153926.88b4c1e1.daichi@freebsd.org> <201010111150.00785.jhb@freebsd.org> Message-Id: <91E542A6-331C-4164-84FE-187916338B60@ongs.co.jp> From: Daichi GOTO To: John Baldwin In-Reply-To: <201010111150.00785.jhb@freebsd.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: iPad Mail (7B500) Mime-Version: 1.0 (iPad Mail 7B500) Date: Tue, 12 Oct 2010 12:22:18 +0200 Cc: "freebsd-fs@freebsd.org" , Daichi GOTO , "freebsd-current@freebsd.org" , Garrett Cooper Subject: Re: fcntl always fails to delete lock file, and PID is always -6464 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 03:21:59 -0000 Sent from my iPad On Oct 11, 2010, at 5:50 PM, John Baldwin wrote: > On Tuesday, October 05, 2010 2:39:26 am Daichi GOTO wrote: >> Next step discussion engaged from this research I guess. >> >> Should we do change FreeBSD's fcntl(2) to return correct l_pid >> when called with F_SETLK? Or keep current behavior?? >> I want to hear other developers ideas and suggetions. > > POSIX doesn't say that F_SETLK returns a valid l_pid, so I think FreeBSD's > current behavior is fine. Yes, I agree. POSIX says so. > -- > John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 10:07:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 522D0106564A for ; Tue, 12 Oct 2010 10:07:11 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by mx1.freebsd.org (Postfix) with ESMTP id 32C7C8FC14 for ; Tue, 12 Oct 2010 10:07:10 +0000 (UTC) Received: from omta17.emeryville.ca.mail.comcast.net ([76.96.30.73]) by qmta01.emeryville.ca.mail.comcast.net with comcast id Hm4f1f0011afHeLA1m7AEA; Tue, 12 Oct 2010 10:07:10 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta17.emeryville.ca.mail.comcast.net with comcast id Hm791f0053LrwQ28dm79JR; Tue, 12 Oct 2010 10:07:10 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 1370C9B418; Tue, 12 Oct 2010 03:07:09 -0700 (PDT) Date: Tue, 12 Oct 2010 03:07:09 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101012100709.GA29861@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB3870F.7070107@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 10:07:11 -0000 On Tue, Oct 12, 2010 at 12:52:15AM +0300, Andriy Gapon wrote: > on 11/10/2010 21:37 Jeremy Chadwick said the following: > > EnableMMAP on > > EnableSendfile on > > Yes, it is it. > > Jeremy, Kai, > could you please try to test this patch? > http://people.freebsd.org/~avg/zfs-mappedread-sendfile.diff > > Kostik, > could you please review it? Andriy, I've been trying to reproduce this problem on my testbed box without much luck so far. The box differs severely -- the biggest differences being the testbed runs i386 (due to CPU), only has 1GB RAM, and is single-core. I don't have an amd64 testbed system on hand right now. I've been trying to reproduce it by enabling Sendfile and MMAP in Apache on the system, putting up some very large files on an Apache-accessible ZFS filesystem, and using something like "wget -r" to download everything. I've been watching "netstat -m" to monitor the number of sendfile requests. There have been a couple cases where I've seen processes go into "zfs" state, but I have yet to see any lock up. Is there something amd64-specific to the problem at hand, or maybe some VM feature which isn't getting triggered on i386? Or do you know of a reliable way to reproduce the issue at this point? -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 10:10:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EC08106564A for ; Tue, 12 Oct 2010 10:10:50 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1B7638FC13 for ; Tue, 12 Oct 2010 10:10:49 +0000 (UTC) Received: by bwz16 with SMTP id 16so1447499bwz.13 for ; Tue, 12 Oct 2010 03:10:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=6JKgssC9PWDudBP2/GcQoN4u8oRuTdkmGhRtmdA4Mlk=; b=uQ1aXrHd8AUlFhd4y4bMzAQo//4Kb5JPTCcLn/69/JQennGQwMH6WBZGUBGE62WyIf tqEZYhbtYu3KmIGkRr1t36+JIwTnxhpxu9mBZdZirqxWooEg/WSW+mdIA3de0m3tpMlU 26USPb3gDDjsRqX2qttLLwukdgGbpzYmzoIEU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=m93XUH0dnC3BJHSidqv+1nYrrGMyFCnp6XP2L9CLIaPtNZnsatsK5vTVIwX2VNdL5K fBcNtU4pz2606CaHbRqEm+jL5xShMQF4FhwM5vcomZ9+vGDq0VPDkj1RewEXMbXm9ZGH LC3gTXZFxiZR20168OVQJvF6ZBCz553UYvEdY= MIME-Version: 1.0 Received: by 10.204.77.137 with SMTP id g9mr5902981bkk.189.1286878247837; Tue, 12 Oct 2010 03:10:47 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Tue, 12 Oct 2010 03:10:47 -0700 (PDT) In-Reply-To: <6AE65535-DCCE-46A5-BBB9-358FEB34C18C@gmail.com> References: <6AE65535-DCCE-46A5-BBB9-358FEB34C18C@gmail.com> Date: Tue, 12 Oct 2010 12:10:47 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Growing large UFS over 16TB? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 10:10:50 -0000 On Mon, Oct 11, 2010 at 10:18 PM, Tobias Fredriksson w= rote: > Hi, > > So I have this UFS, unfortunately it was just done newfs /dev/da1 with no= regards to installing GPT or anything like that. > > It started out as a 6x2TB system raid, then we expanded with 2x2TB. > > Growing this filesystem was not possible with the current tool however wi= th the patch from http://masq.tychl.net/growfs.patch it worked fine. > > Now that I'm trying to grow it from 12TB to closer to 20TB it fails after= about 15 minutes. > It displays all the super-block backups all the way to "39061769312" then= it sits there reading from the volume for about 5-10 minutes. > > The following error message is then output > growfs: rdfs: attempting to read negative block number: Inappropriate ioc= tl for device > > I understand the reason for this, its trying to read a block and the inte= ger just wrapped around. Nice. > The relevant lines from growfs.c are > > static void > rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi) > { > =A0 =A0 =A0 =A0[...] > =A0 =A0 =A0 =A0if (bno < 0) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0err(32, "rdfs: attempting to read negative= block number"); > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0[...] > > Just for fun I commented the if part out and recompiled. > growfs: rdfs: read error: -4889807711788704476: Input/output error > > The only place that ufs2_daddr_t is defined is in /usr/include/ufs/ufs/di= node.h > typedef int64_t =A0 =A0 =A0 ufs2_daddr_t; > > So again for fun I changed this to u_int64_t. I also removed the comments= on that if part in growfs.c > > This caused the same message as last to be repeated. But not the negative= number. > growfs: rdfs: read error: -4889807711788704476: Input/output error > > This leads me to believe that I'm at least doing something partially righ= t. > > The next part that growfs.c is doing in rdfs is > =A0 =A0 =A0 =A0n =3D read(fsi, bf, size); > =A0 =A0 =A0 =A0if (n !=3D (ssize_t)size) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0err(34, "rdfs: read error: %jd", (intmax_t= )bno); > =A0 =A0 =A0 =A0} > > So I changed > "rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi)" > to > "rdfs(ufs2_daddr_t bno, size_t size, void *bf, u_int64_t fsi)" > However this changed nothing. Same output. > Since its failing out at the if sentance then of course the problem is in= ssize_t not being big enough. > As such I of course changed the values for this in /usr/include/machine/_= types.h and I also checked out _limit.h in the same dir. > _types.h > typedef __int64_t =A0 =A0 =A0__ssize_t; > to > typedef __uint64_t =A0 =A0 =A0__ssize_t; > > _limit.h > #define __SSIZE_MAX =A0 =A0 __LONG_MAX =A0 =A0 =A0/* max value for a ssiz= e_t */ > #define __SIZE_T_MAX =A0 =A0__ULONG_MAX =A0 =A0 /* max value for a size_t= */ > to > #define __SSIZE_MAX =A0 =A0__ULLONG_MAX =A0 =A0 =A0/* max value for a ssi= ze_t */ > #define __SIZE_T_MAX =A0 =A0__ULLONG_MAX =A0 =A0 /* max value for a size_= t */ > > However this failed to make any sort of change. I returned all of the lat= er values as nothing helped. > > So I'm turning to the fs gurus. At the moment I have no way of moving the= data off and creating this properly. As such changing to another fs is not= an option right now neither. > > So if anybody has any suggestions on what to do to temporarily fix the is= sue until we can move the data of the raid and rebuild it properly, please = let me know. > > [root@stor1 /usr/src/sbin/growfs]# ./growfs /dev/da1 > We strongly recommend you to make a backup before growing the Filesystem > > =A0Did you backup your data (Yes/No) ? Yes > new file systemsize is: 9765570560 frags > Warning: 136832 sector(s) cannot be allocated. > growfs: 19073314.0MB (39062145408 sectors) block size 16384, fragment siz= e 2048 > =A0 =A0 =A0 =A0using 103818 cylinder groups of 183.72MB, 11758 blks, 2355= 2 inodes. > [...] > growfs: rdfs: read error: -4889807711788704476: Input/output error > Unfortunately I can't answer your question, but have you considered using Z= FS? From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 10:36:24 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F26E01065670 for ; Tue, 12 Oct 2010 10:36:24 +0000 (UTC) (envelope-from tobfr108@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 44BEA8FC08 for ; Tue, 12 Oct 2010 10:36:23 +0000 (UTC) Received: by ewy21 with SMTP id 21so10750ewy.13 for ; Tue, 12 Oct 2010 03:36:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:mime-version :content-type:from:in-reply-to:date:cc:content-transfer-encoding :message-id:references:to:x-mailer; bh=g1T1+QkzAUW1+lQZTa1s0HyiMyy640mW6lXOBPGzeko=; b=eDtO7AwATSrpu/26nwV1Khw8X/ijN3EAQ/Fz41UnayQSnJEIs4jPAiMInZTqWtnsE2 KzoJBdfrCOaH4Hv2Z3cEWFCzT4js3fEEkheTLMYBgAHqW1IdV2i2WMJR8VO5Ok5x3wcZ YN9whRD07ld4yzCWBEMf/R42lfo18b28h0p5Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=TnPsjGZl4f0Lv6yNoPWMejyOnV9N9HSqzQhjo5EFpK7qKqHhkoNcp50rkrtih1CVq7 FoqYBatqJyU+/N2yVgxec49WYjOsyn2XN9HKG62O7RI1J4idGP1oLDcRvnCRuVwcmDHC ipUgTMwe6ptgYU0aQuSt8JiWvLDu8d5Uvk860= Received: by 10.14.119.7 with SMTP id m7mr4001296eeh.39.1286879783177; Tue, 12 Oct 2010 03:36:23 -0700 (PDT) Received: from [192.168.3.110] (gn62-116-241-3.business.gavlenet.com [62.116.241.3]) by mx.google.com with ESMTPS id v59sm12641561eeh.4.2010.10.12.03.36.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 12 Oct 2010 03:36:21 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Tobias Fredriksson In-Reply-To: Date: Tue, 12 Oct 2010 12:36:20 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <6AE65535-DCCE-46A5-BBB9-358FEB34C18C@gmail.com> To: Torbjorn Kristoffersen X-Mailer: Apple Mail (2.1081) Cc: freebsd-fs@freebsd.org Subject: Re: Growing large UFS over 16TB? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 10:36:25 -0000 Yes, Unfortunately I can't move the data from UFS to ZFS as I have no other = array available with enough space. System information if that helps FreeBSD stor1.vmlocal.lan 7.0-RELEASE amd64 8GB ram 3ware 9650SE-24M8 12 okt 2010 kl. 12.09 skrev Torbjorn Kristoffersen: > On Mon, Oct 11, 2010 at 10:18 PM, Tobias Fredriksson = wrote: >> Hi, >>=20 >> So I have this UFS, unfortunately it was just done newfs /dev/da1 = with no regards to installing GPT or anything like that. >>=20 >> It started out as a 6x2TB system raid, then we expanded with 2x2TB. >>=20 >> Growing this filesystem was not possible with the current tool = however with the patch from http://masq.tychl.net/growfs.patch it worked = fine. >>=20 >> Now that I'm trying to grow it from 12TB to closer to 20TB it fails = after about 15 minutes. >> It displays all the super-block backups all the way to "39061769312" = then it sits there reading from the volume for about 5-10 minutes. >>=20 >> The following error message is then output >> growfs: rdfs: attempting to read negative block number: Inappropriate = ioctl for device >>=20 >> I understand the reason for this, its trying to read a block and the = integer just wrapped around. Nice. >> The relevant lines from growfs.c are >>=20 >> static void >> rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi) >> { >> [...] >> if (bno < 0) { >> err(32, "rdfs: attempting to read negative block = number"); >> } >> [...] >>=20 >> Just for fun I commented the if part out and recompiled. >> growfs: rdfs: read error: -4889807711788704476: Input/output error >>=20 >> The only place that ufs2_daddr_t is defined is in = /usr/include/ufs/ufs/dinode.h >> typedef int64_t ufs2_daddr_t; >>=20 >> So again for fun I changed this to u_int64_t. I also removed the = comments on that if part in growfs.c >>=20 >> This caused the same message as last to be repeated. But not the = negative number. >> growfs: rdfs: read error: -4889807711788704476: Input/output error >>=20 >> This leads me to believe that I'm at least doing something partially = right. >>=20 >> The next part that growfs.c is doing in rdfs is >> n =3D read(fsi, bf, size); >> if (n !=3D (ssize_t)size) { >> err(34, "rdfs: read error: %jd", (intmax_t)bno); >> } >>=20 >> So I changed >> "rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi)" >> to >> "rdfs(ufs2_daddr_t bno, size_t size, void *bf, u_int64_t fsi)" >> However this changed nothing. Same output. >> Since its failing out at the if sentance then of course the problem = is in ssize_t not being big enough. >> As such I of course changed the values for this in = /usr/include/machine/_types.h and I also checked out _limit.h in the = same dir. >> _types.h >> typedef __int64_t __ssize_t; >> to >> typedef __uint64_t __ssize_t; >>=20 >> _limit.h >> #define __SSIZE_MAX __LONG_MAX /* max value for a ssize_t */ >> #define __SIZE_T_MAX __ULONG_MAX /* max value for a size_t */ >> to >> #define __SSIZE_MAX __ULLONG_MAX /* max value for a ssize_t = */ >> #define __SIZE_T_MAX __ULLONG_MAX /* max value for a size_t */ >>=20 >> However this failed to make any sort of change. I returned all of the = later values as nothing helped. >>=20 >> So I'm turning to the fs gurus. At the moment I have no way of moving = the data off and creating this properly. As such changing to another fs = is not an option right now neither. >>=20 >> So if anybody has any suggestions on what to do to temporarily fix = the issue until we can move the data of the raid and rebuild it = properly, please let me know. >>=20 >> [root@stor1 /usr/src/sbin/growfs]# ./growfs /dev/da1 >> We strongly recommend you to make a backup before growing the = Filesystem >>=20 >> Did you backup your data (Yes/No) ? Yes >> new file systemsize is: 9765570560 frags >> Warning: 136832 sector(s) cannot be allocated. >> growfs: 19073314.0MB (39062145408 sectors) block size 16384, fragment = size 2048 >> using 103818 cylinder groups of 183.72MB, 11758 blks, 23552 = inodes. >> [...] >> growfs: rdfs: read error: -4889807711788704476: Input/output error >>=20 >=20 > Unfortunately I can't answer your question, but have you considered = using ZFS? From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 10:42:55 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05D8F106566B for ; Tue, 12 Oct 2010 10:42:55 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id DADEF8FC0C for ; Tue, 12 Oct 2010 10:42:54 +0000 (UTC) Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60]) by qmta14.emeryville.ca.mail.comcast.net with comcast id Hmgc1f0021HpZEsAEmiuc9; Tue, 12 Oct 2010 10:42:54 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta14.emeryville.ca.mail.comcast.net with comcast id Hmit1f0013LrwQ28amitXA; Tue, 12 Oct 2010 10:42:54 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 2F8F09B418; Tue, 12 Oct 2010 03:42:53 -0700 (PDT) Date: Tue, 12 Oct 2010 03:42:53 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101012104253.GA30501@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101012100709.GA29861@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 10:42:55 -0000 On Tue, Oct 12, 2010 at 03:07:09AM -0700, Jeremy Chadwick wrote: > On Tue, Oct 12, 2010 at 12:52:15AM +0300, Andriy Gapon wrote: > > on 11/10/2010 21:37 Jeremy Chadwick said the following: > > > EnableMMAP on > > > EnableSendfile on > > > > Yes, it is it. > > > > Jeremy, Kai, > > could you please try to test this patch? > > http://people.freebsd.org/~avg/zfs-mappedread-sendfile.diff > > > > Kostik, > > could you please review it? > > Andriy, > > I've been trying to reproduce this problem on my testbed box without > much luck so far. The box differs severely -- the biggest differences > being the testbed runs i386 (due to CPU), only has 1GB RAM, and is > single-core. I don't have an amd64 testbed system on hand right now. > > I've been trying to reproduce it by enabling Sendfile and MMAP in Apache > on the system, putting up some very large files on an Apache-accessible > ZFS filesystem, and using something like "wget -r" to download > everything. I've been watching "netstat -m" to monitor the number of > sendfile requests. > > There have been a couple cases where I've seen processes go into "zfs" > state, but I have yet to see any lock up. > > Is there something amd64-specific to the problem at hand, or maybe some > VM feature which isn't getting triggered on i386? Or do you know of a > reliable way to reproduce the issue at this point? An additional question/point of interest: The testbed (i386) box I'm using is built from RELENG_8 sources dated October 11th around 20:00 PDT. I went looking through RELENG_8 commits and I found this committed approximately 25 hours ago: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#rev1.24.2.8 The testbed/i386 box therefore has the above commit. Could this commit be the fix for the problem? In the meantime, I'm going to try rolling back my RELENG_8 src-all on the testbed/i386 box to October 8th and then re-try my tests to see if the problem happens. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 11:07:04 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E7C5106566C for ; Tue, 12 Oct 2010 11:07:04 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from forward3.mail.yandex.net (forward3.mail.yandex.net [77.88.46.8]) by mx1.freebsd.org (Postfix) with ESMTP id 39BED8FC15 for ; Tue, 12 Oct 2010 11:07:04 +0000 (UTC) Received: from smtp4.mail.yandex.net (smtp4.mail.yandex.net [77.88.46.104]) by forward3.mail.yandex.net (Yandex) with ESMTP id 74FD556D8C7F; Tue, 12 Oct 2010 14:51:32 +0400 (MSD) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1286880692; bh=oWi0EmvH3HS/M2ZkJzVgAQSC3AwNge+6uiVTiGRN8aA=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type; b=myXsDiT+UzqstsGXeMIJtZ+z4DXHOVhD8wU/T4vNJfdTlLcTAM5WEbOvDlJVZW7ji ywzOlj3VFVxFl+PKo5II1hnl9fxo5Qu8E1ABRBow8M8a5EdeefgpkY3Z4itv7FsC9o u/vG43fvObrV6eDsDs00LQxqc4Lpcs0KuBnb4XzE= Received: from [127.0.0.1] (mail.kirov.so-cdu.ru [77.72.136.145]) by smtp4.mail.yandex.net (Yandex) with ESMTPSA id 4184C1280A5; Tue, 12 Oct 2010 14:51:32 +0400 (MSD) Message-ID: <4CB43DA6.9010907@yandex.ru> Date: Tue, 12 Oct 2010 14:51:18 +0400 From: "Andrey V. Elsukov" User-Agent: Mozilla Thunderbird 1.5 (FreeBSD/20051231) MIME-Version: 1.0 To: Tobias Fredriksson References: <6AE65535-DCCE-46A5-BBB9-358FEB34C18C@gmail.com> In-Reply-To: <6AE65535-DCCE-46A5-BBB9-358FEB34C18C@gmail.com> X-Enigmail-Version: 1.1.1 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigB80D6213E4F798D3B9756DAE" X-Yandex-TimeMark: 1286880692 X-Yandex-Spam: 1 X-Yandex-Front: smtp4.mail.yandex.net Cc: freebsd-fs@freebsd.org Subject: Re: Growing large UFS over 16TB? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 11:07:04 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB80D6213E4F798D3B9756DAE Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable On 12.10.2010 0:18, Tobias Fredriksson wrote: > I understand the reason for this, its trying to read a block and the in= teger just wrapped around. > Nice. The relevant lines from growfs.c are >=20 > static void rdfs(ufs2_daddr_t bno, size_t size, void *bf, int fsi) { [.= =2E.] if (bno < 0) { err(32, > "rdfs: attempting to read negative block number"); } [...] >=20 > Just for fun I commented the if part out and recompiled. growfs: rdfs: = read error: > -4889807711788704476: Input/output error It seems that 20T is not so big to overflow int64_t. I think it can be so= mewhere is rdfs called from. You can try to add abort(3) call to the if part and recompile with DEBUG_= FLAGS=3D-g. When this if statement will be triggered again growfs will drop core file= and you could inspect it with gdb. But i don't know how is is dangerous for your = FS. --=20 WBR, Andrey V. Elsukov --------------enigB80D6213E4F798D3B9756DAE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMtD2rAAoJEAHF6gQQyKF6wpgH/jOUDTI/byhncLUNY+KS1Vg8 TSE61JTC+4auf1AxXurT04uYtBPE5KArbdU51cL9/YxIPdtazd9bIeEsrJC0Yj7j VXwkYQ/+k11LMWUOtuq/fHB1C2ZZ5bIalU6QCIUEhZbukGqZnaRAhVkM/0qvZFzt Rg1iTX5g0mXzGtImWWGSHGNcW57/nTmDUZuUG0icfeCsXTvH9ezosclsfUIet0Bh z6RBLV8MvfSAvprieluGgnHc+AWRl/RZkweOVBH+KDD4kMos6vBpAqscqp871d+H MEoPpcesdUP/SOAOVB3lqld8Vpl4rzdtFs+6XmM8fmWuRaZmUMdgwuyFfiO7zKw= =DPpW -----END PGP SIGNATURE----- --------------enigB80D6213E4F798D3B9756DAE-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 11:12:33 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFCA4106566C for ; Tue, 12 Oct 2010 11:12:33 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0DBD68FC16 for ; Tue, 12 Oct 2010 11:12:32 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA16326; Tue, 12 Oct 2010 14:12:29 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P5cmT-000OOg-8C; Tue, 12 Oct 2010 14:12:29 +0300 Message-ID: <4CB4429C.9040109@icyb.net.ua> Date: Tue, 12 Oct 2010 14:12:28 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> In-Reply-To: <20101012100709.GA29861@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 11:12:33 -0000 on 12/10/2010 13:07 Jeremy Chadwick said the following: > I've been trying to reproduce this problem on my testbed box without > much luck so far. The box differs severely -- the biggest differences > being the testbed runs i386 (due to CPU), only has 1GB RAM, and is > single-core. I don't have an amd64 testbed system on hand right now. > > I've been trying to reproduce it by enabling Sendfile and MMAP in Apache > on the system, putting up some very large files on an Apache-accessible > ZFS filesystem, and using something like "wget -r" to download > everything. I've been watching "netstat -m" to monitor the number of > sendfile requests. > > There have been a couple cases where I've seen processes go into "zfs" > state, but I have yet to see any lock up. > > Is there something amd64-specific to the problem at hand, or maybe some > VM feature which isn't getting triggered on i386? Or do you know of a > reliable way to reproduce the issue at this point? I don't have an easy way to reproduce it. The theory is that you should sendfile a file with size which is not multiple of page size (4K) and then you should mmap and read the same file; the last step should lock up. Perhaps, tools/regression/sockets/sendfile/sendfile.c with the following patch would reproduce it? http://people.freebsd.org/~avg/sendfile.diff -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 11:13:25 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0B2310656A4 for ; Tue, 12 Oct 2010 11:13:25 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0B3048FC1C for ; Tue, 12 Oct 2010 11:13:24 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA16339; Tue, 12 Oct 2010 14:13:22 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P5cnJ-000OOl-Oz; Tue, 12 Oct 2010 14:13:21 +0300 Message-ID: <4CB442D1.8040802@icyb.net.ua> Date: Tue, 12 Oct 2010 14:13:21 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <20101012104253.GA30501@icarus.home.lan> In-Reply-To: <20101012104253.GA30501@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 11:13:25 -0000 on 12/10/2010 13:42 Jeremy Chadwick said the following: > The testbed (i386) box I'm using is built from RELENG_8 sources dated > October 11th around 20:00 PDT. > > I went looking through RELENG_8 commits and I found this committed > approximately 25 hours ago: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#rev1.24.2.8 > > The testbed/i386 box therefore has the above commit. Could this > commit be the fix for the problem? Very improbable. > In the meantime, I'm going to try rolling back my RELENG_8 src-all on > the testbed/i386 box to October 8th and then re-try my tests to see if > the problem happens. > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 13:02:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5EA04106564A for ; Tue, 12 Oct 2010 13:02:47 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 3EDBA8FC14 for ; Tue, 12 Oct 2010 13:02:47 +0000 (UTC) Received: from omta20.emeryville.ca.mail.comcast.net ([76.96.30.87]) by qmta05.emeryville.ca.mail.comcast.net with comcast id Ho8V1f0051smiN4A5p2m4K; Tue, 12 Oct 2010 13:02:46 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta20.emeryville.ca.mail.comcast.net with comcast id Hp2l1f00A3LrwQ28gp2mYk; Tue, 12 Oct 2010 13:02:46 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 82E309B425; Tue, 12 Oct 2010 06:02:45 -0700 (PDT) Date: Tue, 12 Oct 2010 06:02:45 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101012130245.GA32584@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB4429C.9040109@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 13:02:47 -0000 On Tue, Oct 12, 2010 at 02:12:28PM +0300, Andriy Gapon wrote: > on 12/10/2010 13:07 Jeremy Chadwick said the following: > > I've been trying to reproduce this problem on my testbed box without > > much luck so far. The box differs severely -- the biggest differences > > being the testbed runs i386 (due to CPU), only has 1GB RAM, and is > > single-core. I don't have an amd64 testbed system on hand right now. > > > > I've been trying to reproduce it by enabling Sendfile and MMAP in Apache > > on the system, putting up some very large files on an Apache-accessible > > ZFS filesystem, and using something like "wget -r" to download > > everything. I've been watching "netstat -m" to monitor the number of > > sendfile requests. > > > > There have been a couple cases where I've seen processes go into "zfs" > > state, but I have yet to see any lock up. > > > > Is there something amd64-specific to the problem at hand, or maybe some > > VM feature which isn't getting triggered on i386? Or do you know of a > > reliable way to reproduce the issue at this point? > > I don't have an easy way to reproduce it. > The theory is that you should sendfile a file with size which is not multiple of > page size (4K) and then you should mmap and read the same file; the last step > should lock up. > Perhaps, tools/regression/sockets/sendfile/sendfile.c with the following patch > would reproduce it? > http://people.freebsd.org/~avg/sendfile.diff This patch only works on HEAD. I downloaded the HEAD version of sendfile.c from here: http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/sockets/sendfile/sendfile.c?rev=1.7;content-type=text%2Fplain And the HEAD Makefile as well (since libmd linking is needed): http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/sockets/sendfile/Makefile?rev=1.6;content-type=text%2Fplain And then applied your patch. However, the result doesn't induce a lock-up. Bummer. testbox# ./sendfile 1..11 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10 ok 11 mmap test testbox# Other stuff I tried: - Verified getpagesize() returns 4096 (PAE isn't enabled on this box; I'm assuming PAE results in 2MByte pages is why I mention it) - Enabling the #if 0'd code - Adjusting TEST_EXTRA a bit (200, 1000, and 3819; just numbers I pulled out of thin air) Alternately, I can try building an amd64 testbed box, but it'll be a virtual machine under VMware, which I try to avoid using as a testbed for low-level changes (VM, etc.). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 13:28:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A627A106566C for ; Tue, 12 Oct 2010 13:28:30 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EAC168FC14 for ; Tue, 12 Oct 2010 13:28:29 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA19436; Tue, 12 Oct 2010 16:28:26 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB4627A.1010807@icyb.net.ua> Date: Tue, 12 Oct 2010 16:28:26 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> In-Reply-To: <20101012130245.GA32584@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 13:28:30 -0000 on 12/10/2010 16:02 Jeremy Chadwick said the following: > On Tue, Oct 12, 2010 at 02:12:28PM +0300, Andriy Gapon wrote: >> I don't have an easy way to reproduce it. >> The theory is that you should sendfile a file with size which is not multiple of >> page size (4K) and then you should mmap and read the same file; the last step >> should lock up. >> Perhaps, tools/regression/sockets/sendfile/sendfile.c with the following patch >> would reproduce it? >> http://people.freebsd.org/~avg/sendfile.diff > > This patch only works on HEAD. I downloaded the HEAD version of > sendfile.c from here: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/sockets/sendfile/sendfile.c?rev=1.7;content-type=text%2Fplain > > And the HEAD Makefile as well (since libmd linking is needed): > > http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/sockets/sendfile/Makefile?rev=1.6;content-type=text%2Fplain > > And then applied your patch. However, the result doesn't induce a > lock-up. Bummer. Oh, I am stupid - it should be a different process doing mmap, not the same one. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 13:45:31 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC8311065674; Tue, 12 Oct 2010 13:45:31 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id 6AC0F8FC0A; Tue, 12 Oct 2010 13:45:31 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id BD67B119F44; Tue, 12 Oct 2010 15:45:30 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id GqdnJXd4-Du3; Tue, 12 Oct 2010 15:45:28 +0200 (CEST) Received: from [10.0.3.3] (188-167-67-67.dynamic.chello.sk [188.167.67.67]) by mail.vx.sk (Postfix) with ESMTPSA id 7EEBB119F3C; Tue, 12 Oct 2010 15:45:28 +0200 (CEST) Message-ID: <4CB46679.2010604@FreeBSD.org> Date: Tue, 12 Oct 2010 15:45:29 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <20101009143439.GA63604@icarus.home.lan> In-Reply-To: <20101009143439.GA63604@icarus.home.lan> X-Enigmail-Version: 1.1.1 Content-Type: multipart/mixed; boundary="------------060506090408050400070707" Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 13:45:31 -0000 This is a multi-part message in MIME format. --------------060506090408050400070707 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit One of my managed servers has also run into this problem with git and smbd, backtraces are attached. --------------060506090408050400070707 Content-Type: text/plain; name="bt.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="bt.txt" git (state: zfsmrb) Tracing pid 12378 tid 100168 td 0xffffff000842c7c0 sched_switch() at sched_switch+0x152 mi_switch() at mi_switch+0x219 sleepq_switch() at sleepq_switch+0xfa sleepq_wait() at sleepq_wait+0x46 _sleep() at _sleep+0x256 zfs_freebsd_read() at zfs_freebsd_read+0x26c VOP_READ_APV() at VOP_READ_APV+0xb6 vnode_pager_generic_getpages() at vnode_pager_generic_getpages+0x3c7 VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xb5 vnode_pager_getpages() at vnode_pager_getpages+0x81 vm_fault() at vm_fault+0xb42 trap_pfault() at trap_pfault+0x103 trap() at trap+0x519 calltrap() at calltrap+0x8 --- trap 0xc, rip = 0x800b3c7f9, rsp = 0x7fffffffd2b0, rbp = 0x7fffffffd3f0 --- smbd (state: zfs) (pids: 15605, 15553, 15500, 15448, ...) Tracing id 15605 tid 100398 td 0xffffff014f9cc7c0 sched_switch() at sched_switch+0x152 mi_switch() at mi_switch+0x219 sleepq_switch() at sleepq_switch+0xfa sleepq_wait() at sleepq_wait+0x46 __lockmgr_args() at __lockmgr_args+0x74f vop_stdlock() at vop_stdlock+0x39 VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x5d extattr_get_vp() at exattr_get_vp+0x59 extattr_get_file() at extattr_get_file+0x126 syscall() at syscall+0x102 Xfast_syscall() at Xfast_syscall+0xe2 --- syscall (357, FreeBSD ELF64, extattr_get_file), rip = 0x80222fd2c, rsp = 0x7fffffffd8a8, rbp = 0x1501b84 --- --------------060506090408050400070707-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 14:13:02 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56B86106566C for ; Tue, 12 Oct 2010 14:13:02 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9CB7F8FC12 for ; Tue, 12 Oct 2010 14:13:01 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA20244; Tue, 12 Oct 2010 17:12:58 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB46CE9.20905@icyb.net.ua> Date: Tue, 12 Oct 2010 17:12:57 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <20101009111241.GA58948@icarus.home.lan> <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> In-Reply-To: <20101012130245.GA32584@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 14:13:02 -0000 on 12/10/2010 16:02 Jeremy Chadwick said the following: > And then applied your patch. However, the result doesn't induce a > lock-up. Bummer. > > testbox# ./sendfile > 1..11 > ok 1 > ok 2 > ok 3 > ok 4 > ok 5 > ok 6 > ok 7 > ok 8 > ok 9 > ok 10 > ok 11 > mmap test > testbox# > > Other stuff I tried: > > - Verified getpagesize() returns 4096 (PAE isn't enabled on this box; > I'm assuming PAE results in 2MByte pages is why I mention it) > - Enabling the #if 0'd code > - Adjusting TEST_EXTRA a bit (200, 1000, and 3819; just numbers I > pulled out of thin air) > Do you have DTrace support on the test system? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 14:36:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B83C1065696 for ; Tue, 12 Oct 2010 14:36:01 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta15.emeryville.ca.mail.comcast.net (qmta15.emeryville.ca.mail.comcast.net [76.96.27.228]) by mx1.freebsd.org (Postfix) with ESMTP id 7C1228FC16 for ; Tue, 12 Oct 2010 14:36:01 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta15.emeryville.ca.mail.comcast.net with comcast id HoqQ1f00D0x6nqcAFqc0hF; Tue, 12 Oct 2010 14:36:00 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta12.emeryville.ca.mail.comcast.net with comcast id Hqbz1f00L3LrwQ28Yqc0Hg; Tue, 12 Oct 2010 14:36:00 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 912A99B425; Tue, 12 Oct 2010 07:35:59 -0700 (PDT) Date: Tue, 12 Oct 2010 07:35:59 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101012143559.GA34396@icarus.home.lan> References: <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> <4CB46CE9.20905@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB46CE9.20905@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 14:36:01 -0000 On Tue, Oct 12, 2010 at 05:12:57PM +0300, Andriy Gapon wrote: > on 12/10/2010 16:02 Jeremy Chadwick said the following: > > And then applied your patch. However, the result doesn't induce a > > lock-up. Bummer. > > > > testbox# ./sendfile > > 1..11 > > ok 1 > > ok 2 > > ok 3 > > ok 4 > > ok 5 > > ok 6 > > ok 7 > > ok 8 > > ok 9 > > ok 10 > > ok 11 > > mmap test > > testbox# > > > > Other stuff I tried: > > > > - Verified getpagesize() returns 4096 (PAE isn't enabled on this box; > > I'm assuming PAE results in 2MByte pages is why I mention it) > > - Enabling the #if 0'd code > > - Adjusting TEST_EXTRA a bit (200, 1000, and 3819; just numbers I > > pulled out of thin air) > > > > Do you have DTrace support on the test system? Nope, but since it's a single-CPU system I can probably enable it safely (I thought I remember something about CTF not working right on SMP systems...). Give me a little while to figure out how to get it working, assuming you're willing to step me through what I need to do to simulate the lock condition. :-) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 14:40:25 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 648111065698 for ; Tue, 12 Oct 2010 14:40:25 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A99A18FC27 for ; Tue, 12 Oct 2010 14:40:24 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA20779; Tue, 12 Oct 2010 17:40:21 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB47355.1050109@icyb.net.ua> Date: Tue, 12 Oct 2010 17:40:21 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> <4CB46CE9.20905@icyb.net.ua> <20101012143559.GA34396@icarus.home.lan> In-Reply-To: <20101012143559.GA34396@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 14:40:25 -0000 on 12/10/2010 17:35 Jeremy Chadwick said the following: > Nope, but since it's a single-CPU system I can probably enable it > safely (I thought I remember something about CTF not working right on > SMP systems...). Give me a little while to figure out how to get it Never heard about such issue and never hit them on my two-core systems. > working, assuming you're willing to step me through what I need to do to > simulate the lock condition. :-) Enabling DTrace is really easy: http://wiki.freebsd.org/DTrace I would appreciate if you could enable it on your test system. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 14:41:45 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3E0E1065673 for ; Tue, 12 Oct 2010 14:41:45 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 047838FC12 for ; Tue, 12 Oct 2010 14:41:44 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA20797; Tue, 12 Oct 2010 17:41:42 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4CB473A5.8070302@freebsd.org> Date: Tue, 12 Oct 2010 17:41:41 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CB17983.3020907@icyb.net.ua> <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> <4CB46CE9.20905@icyb.net.ua> <20101012143559.GA34396@icarus.home.lan> <4CB47355.1050109@icyb.net.ua> In-Reply-To: <4CB47355.1050109@icyb.net.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 14:41:45 -0000 on 12/10/2010 17:40 Andriy Gapon said the following: > Enabling DTrace is really easy: > http://wiki.freebsd.org/DTrace > > I would appreciate if you could enable it on your test system. Userland DTrace support is not needed (point 6 and 7). -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 15:18:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D261F1065670 for ; Tue, 12 Oct 2010 15:18:54 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id 795A88FC12 for ; Tue, 12 Oct 2010 15:18:53 +0000 (UTC) Received: from omta13.westchester.pa.mail.comcast.net ([76.96.62.52]) by qmta09.westchester.pa.mail.comcast.net with comcast id Hmnm1f00417dt5G59rJuTW; Tue, 12 Oct 2010 15:18:54 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta13.westchester.pa.mail.comcast.net with comcast id HrJt1f00B3LrwQ23ZrJthF; Tue, 12 Oct 2010 15:18:54 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 0EAB79B425; Tue, 12 Oct 2010 08:18:52 -0700 (PDT) Date: Tue, 12 Oct 2010 08:18:52 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20101012151852.GA35014@icarus.home.lan> References: <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> <4CB46CE9.20905@icyb.net.ua> <20101012143559.GA34396@icarus.home.lan> <4CB47355.1050109@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB47355.1050109@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 15:18:55 -0000 On Tue, Oct 12, 2010 at 05:40:21PM +0300, Andriy Gapon wrote: > on 12/10/2010 17:35 Jeremy Chadwick said the following: > > Nope, but since it's a single-CPU system I can probably enable it > > safely (I thought I remember something about CTF not working right on > > SMP systems...). Give me a little while to figure out how to get it > > Never heard about such issue and never hit them on my two-core systems. What I'm remembering (two issues) are below, but apply to RELENG_7. Sounds like issue may have gotten addressed in RELENG_8. http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045093.html http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045180.html > > working, assuming you're willing to step me through what I need to do to > > simulate the lock condition. :-) > > Enabling DTrace is really easy: > http://wiki.freebsd.org/DTrace > > I would appreciate if you could enable it on your test system. Got it -- just finished and is currently running/working. I also installed ports/sysutils/DTraceToolkit and shells/ksh93 "just in case". testbox# dtrace -l | head ID PROVIDER MODULE FUNCTION NAME 1 dtrace BEGIN 2 dtrace END 3 dtrace ERROR 4 dtmalloc fbt malloc 5 dtmalloc fbt free 6 dtmalloc cyclic malloc 7 dtmalloc cyclic free 8 dtmalloc zones_data malloc 9 dtmalloc zones_data free I can provide you root-level access to the box as well as serial console if you'd prefer to do the debugging yourself, otherwise step me through what's needed and I'll be happy to act as remote hands. This should prove educational in another way, as I use Solaris regularly at work but have never sat down to tinker with DTrace (just basic mdb and adb). Fun fun fun. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 12 15:27:00 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7EA01065740 for ; Tue, 12 Oct 2010 15:27:00 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id ECE7F8FC14 for ; Tue, 12 Oct 2010 15:26:59 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA21523; Tue, 12 Oct 2010 18:26:55 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CB47E3F.3050002@icyb.net.ua> Date: Tue, 12 Oct 2010 18:26:55 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <20101011151508.GA10917@icarus.home.lan> <4CB32C75.2060000@icyb.net.ua> <20101011183707.GA13925@icarus.home.lan> <4CB3870F.7070107@icyb.net.ua> <20101012100709.GA29861@icarus.home.lan> <4CB4429C.9040109@icyb.net.ua> <20101012130245.GA32584@icarus.home.lan> <4CB46CE9.20905@icyb.net.ua> <20101012143559.GA34396@icarus.home.lan> <4CB47355.1050109@icyb.net.ua> <20101012151852.GA35014@icarus.home.lan> In-Reply-To: <20101012151852.GA35014@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2010 15:27:00 -0000 on 12/10/2010 18:18 Jeremy Chadwick said the following: > Got it -- just finished and is currently running/working. I also > installed ports/sysutils/DTraceToolkit and shells/ksh93 "just in case". > > testbox# dtrace -l | head > ID PROVIDER MODULE FUNCTION NAME > 1 dtrace BEGIN > 2 dtrace END > 3 dtrace ERROR > 4 dtmalloc fbt malloc > 5 dtmalloc fbt free > 6 dtmalloc cyclic malloc > 7 dtmalloc cyclic free > 8 dtmalloc zones_data malloc > 9 dtmalloc zones_data free > > I can provide you root-level access to the box as well as serial console > if you'd prefer to do the debugging yourself, otherwise step me through > what's needed and I'll be happy to act as remote hands. Great! Let's start now :) I would like you to run the following script with "dtrace -s