From owner-freebsd-fs@FreeBSD.ORG Sun Sep 26 10:39:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A13781065670 for ; Sun, 26 Sep 2010 10:39:30 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 4E8918FC21 for ; Sun, 26 Sep 2010 10:39:24 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id E827D45C98; Sun, 26 Sep 2010 12:39:21 +0200 (CEST) Received: from localhost (chello089077043238.chello.pl [89.77.43.238]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 1D10945C99; Sun, 26 Sep 2010 12:39:17 +0200 (CEST) Date: Sun, 26 Sep 2010 12:38:56 +0200 From: Pawel Jakub Dawidek To: Mikolaj Golub Message-ID: <20100926103856.GI47356@garage.freebsd.pl> References: <86mxr7x0ih.fsf@kopusha.home.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1X+6QtwRodzgDPAC" Content-Disposition: inline In-Reply-To: <86mxr7x0ih.fsf@kopusha.home.net> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: hastd: memory leaks if fork() fails X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Sep 2010 10:39:30 -0000 --1X+6QtwRodzgDPAC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 24, 2010 at 06:51:02AM +0300, Mikolaj Golub wrote: > Hi, >=20 > Although it is rather unlikely situation but anyway :-) >=20 > If fork() fails in hook_execv() hastd leaks some bytes referred by hp. Se= e the > attached patch. Committed, thanks! --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --1X+6QtwRodzgDPAC Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkyfIsAACgkQForvXbEpPzS4lACeMynn016Q0+1nGpUtgA6j1sKe aUUAn3C+MQo2jk8fafYvIy64F9SR1xZ2 =BDnp -----END PGP SIGNATURE----- --1X+6QtwRodzgDPAC-- From owner-freebsd-fs@FreeBSD.ORG Sun Sep 26 21:19:35 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5461E106566C; Sun, 26 Sep 2010 21:19:35 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2B5088FC14; Sun, 26 Sep 2010 21:19:35 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8QLJZNJ019219; Sun, 26 Sep 2010 21:19:35 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8QLJZ9W019215; Sun, 26 Sep 2010 21:19:35 GMT (envelope-from linimon) Date: Sun, 26 Sep 2010 21:19:35 GMT Message-Id: <201009262119.o8QLJZ9W019215@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/150910: [nfs] wsize=16384 on udp nfs mount unusable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Sep 2010 21:19:35 -0000 Old Synopsis: wsize=16384 on udp nfs mount unusable New Synopsis: [nfs] wsize=16384 on udp nfs mount unusable Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Sep 26 21:18:58 UTC 2010 Responsible-Changed-Why: reclassify. http://www.freebsd.org/cgi/query-pr.cgi?pr=150910 From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 00:05:26 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59C4F106566C for ; Mon, 27 Sep 2010 00:05:26 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184]) by mx1.freebsd.org (Postfix) with ESMTP id 21EEF8FC14 for ; Mon, 27 Sep 2010 00:05:25 +0000 (UTC) Received: from [203.206.171.212] (helo=[192.168.10.10]) by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1P01E6-00035b-No; Mon, 27 Sep 2010 10:05:53 +1000 Message-ID: <4C9FDFBC.8030406@dannysplace.net> Date: Mon, 27 Sep 2010 10:05:16 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Martin Simmons References: <4C9AC1F6.90305@dannysplace.net> <201009231340.o8NDeNl5017806@higson.cam.lispworks.com> In-Reply-To: <201009231340.o8NDeNl5017806@higson.cam.lispworks.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29) X-Date: 2010-09-27 10:05:51 X-Connected-IP: 203.206.171.212:55109 X-Message-Linecount: 32 X-Body-Linecount: 17 X-Message-Size: 1299 X-Body-Size: 571 X-Received-Count: 1 X-Recipient-Count: 2 X-Local-Recipient-Count: 2 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on damka.dannysplace.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net) Cc: freebsd-fs@freebsd.org Subject: Re: Devices disappeared after drive shuffle - or - how to recover and mount a slice with UFS partitions. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 00:05:26 -0000 On 23/09/2010 11:40 PM, Martin Simmons wrote: >>>>>> On Thu, 23 Sep 2010 12:56:54 +1000, Danny Carroll said: >> My only real question is. Why did the devices fail to be created in >> /dev from the original disk? > See if the partitions are listed in the output of: > > sysctl -b kern.geom.conftxt > > If not, then it looks like a kernel/geom problem. > Thanks for the tip. Unfortunately I've already wiped the slices and re-partitioned with gpart. Gpart seems to be ok. I'm not too worried about it all. I just thought it was interesting enough to share. -D From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 01:37:43 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0770B106564A for ; Mon, 27 Sep 2010 01:37:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B8B168FC18 for ; Mon, 27 Sep 2010 01:37:42 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAJKSn0yDaFvO/2dsb2JhbACDG6ALtVWRSYEigy50BIo6 X-IronPort-AV: E=Sophos;i="4.57,240,1283745600"; d="scan'208";a="95180231" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 26 Sep 2010 21:37:41 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D52E6B3F36; Sun, 26 Sep 2010 21:37:41 -0400 (EDT) Date: Sun, 26 Sep 2010 21:37:41 -0400 (EDT) From: Rick Macklem To: "Sam Fourman Jr." Message-ID: <1735156067.135783.1285551461797.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [24.65.230.102] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3 (Mac)/6.0.7_GA_2473.RHEL4_64) Cc: freebsd-fs@freebsd.org Subject: Re: NFSRoot pxe available disk space X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 01:37:43 -0000 > I am running FreeBSD 9 amd64 via pxe NFSRoot > I get a negative value for my available space on / > > my FreeBSD NFS server is also running FreeBSD 9 built from todays SVN > sources > this did not happen in FreeBSD 8.1 > > > Sam# uname -a > FreeBSD Sam.PuffyBSD.Com 9.0-CURRENT FreeBSD 9.0-CURRENT #2: Thu Sep > 23 18:24:25 CDT 2010 > root@FNFS.PuffyBSD.Com:/usr/obj/usr/src/sys/WORKSTATION amd64 > > > Sam# df -h > Filesystem Size Used > Avail Capacity Mounted on > 192.168.8.10:/Network/pxe/FreeBSD_AMD64_CURRENT -807G 28G > -834G -3% / > devfs 1.0K 1.0K > 0B 100% /dev > tmpfs 218M 4.0K > 218M 0% /tmp > linprocfs 4.0K 4.0K > 0B 100% /compat/linux/proc > 192.168.8.10:/Network/distfiles 1.2T 26G > 1.2T 2% /usr/ports/distfiles > 192.168.8.10:/Network/tv 1.5T 373G > 1.2T 24% /Network/tv > 192.168.8.10:/Network/iso 1.5T 277G > 1.2T 19% /Network/iso > 192.168.8.10:/Network/wow 1.2T 35G > 1.2T 3% /Network/wow > 192.168.8.10:/Network/music 1.3T 106G > 1.2T 8% /Network/music > 192.168.8.10:/Network/public 1.3T 148G > 1.2T 11% /Network/public > 192.168.8.10:/Network/pxe/FreeBSD_i386_8_1 1.2T 4.4G > 1.2T 0% /compat/FreeBSD-i386 > 192.168.8.10:/Network/home/sfourman 1.2T 33G > 1.2T 3% /usr/home/sfourman > > I did send out a heads up when I committed it. If you replace the kernel, but not pxeboot (built from recent sources), then the default reverts back to using NFSv2. Either add "nfsv3" as an option for the "/" line in /etc/fstab for the root fs on the NFS server or replace "pxeboot" with one built from recent sources. I think this is probably what is causing it. rick ps: NFSv2 only used 32bit #s for the sizes, so an overflow can easily happen. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 11:06:54 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9AA6C106564A for ; Mon, 27 Sep 2010 11:06:54 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6571F8FC1D for ; Mon, 27 Sep 2010 11:06:54 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8RB6sOK023458 for ; Mon, 27 Sep 2010 11:06:54 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8RB6r2Y023454 for freebsd-fs@FreeBSD.org; Mon, 27 Sep 2010 11:06:53 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 27 Sep 2010 11:06:53 GMT Message-Id: <201009271106.o8RB6r2Y023454@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 11:06:54 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/150910 fs [nfs] wsize=16384 on udp nfs mount unusable o kern/150796 fs [panic] [suj] [ufs] [softupdates] Panic on portbuild o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149855 fs [gvinum] growfs causes fsck to report errors in Filesy o kern/149495 fs [zfs] chflags sappend on zfs not working right o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149022 fs [hang] File system operations hangs with suspfs state o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/147292 fs [nfs] [patch] readahead missing in nfs client options o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/146375 fs [nfs] [patch] Typos in macro variables names in sys/fs s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c o kern/144458 fs [nfs] [patch] nfsd fails as a kld p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o kern/143345 fs [ext2fs] [patch] extfs minor header cleanups to better o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142924 fs [ext2fs] [patch] Small cleanup for the inode struct in o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs o bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/139363 fs [nfs] diskless root nfs mount from non FreeBSD server o kern/138790 fs [zfs] ZFS ceases caching when mem demand is high o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb f kern/137037 fs [zfs] [hang] zfs rollback on root causes FreeBSD to fr o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic o kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135667 fs [lor] LORs causing ufs filesystem corruption on XEN Do o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [panic] panic: ffs_truncate: read-only filesystem o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS p kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o bin/94635 fs snapinfo(8)/libufs only works for disk-backed filesyst o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 207 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 14:24:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F18F6106564A for ; Mon, 27 Sep 2010 14:24:49 +0000 (UTC) (envelope-from cal@linu.gs) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124]) by mx1.freebsd.org (Postfix) with ESMTP id B2CF28FC14 for ; Mon, 27 Sep 2010 14:24:49 +0000 (UTC) Received: from [77.109.131.203] (port=39421 helo=aare.localnet) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1P0EdL-0006oU-3B for freebsd-fs@freebsd.org; Mon, 27 Sep 2010 16:24:48 +0200 From: Michael Naef To: "freebsd-fs" Date: Mon, 27 Sep 2010 16:24:42 +0200 User-Agent: KMail/1.13.5 (Linux/2.6.34-gentoo-r1; KDE/4.4.5; i686; ; ) References: <201009231938.09548.cal@linu.gs> <66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch> In-Reply-To: <66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009271624.46655.cal@linu.gs> Subject: Re: Strange behaviour with sappend flag set on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 14:24:50 -0000 Hi all On Friday 24 September 2010 01:15:55 Markus Gebert wrote: > CURRENT and STABLE-8 seem to be affected to. The following patch > seems to fix it (at least Michi's test case works fine with > it): > > ---- > diff -ru > ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops > .c ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > --- > ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops > .c 2010-05-19 08:49:52.000000000 +0200 +++ > ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > 2010-09-23 23:24:43.549846948 +0200 @@ -709,7 +709,7 @@ > */ > pflags = zp->z_phys->zp_flags; > if ((pflags & (ZFS_IMMUTABLE | ZFS_READONLY)) || > - ((pflags & ZFS_APPENDONLY) && !(ioflag & FAPPEND) && > + ((pflags & ZFS_APPENDONLY) && !(ioflag & IO_APPEND) > && (uio->uio_loffset < zp->z_phys->zp_size))) { > ZFS_EXIT(zfsvfs); > return (EPERM); > ---- > > Can someone commit this if the patch is ok? Or should I (or > Michi) open a PR? Whats the next step? Is anyboby willing and able to commit the patch or should/must I open a PR? (Having a patch for bash which solves the most urgent problem, though - but I need a decision.) cheers and thanks, Michi From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 16:12:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D038106564A for ; Mon, 27 Sep 2010 16:12:38 +0000 (UTC) (envelope-from gljennjohn@googlemail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0429A8FC14 for ; Mon, 27 Sep 2010 16:12:37 +0000 (UTC) Received: by fxm9 with SMTP id 9so3746236fxm.13 for ; Mon, 27 Sep 2010 09:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:in-reply-to:references:reply-to:x-mailer:mime-version :content-type:content-transfer-encoding; bh=2fKP+WSswqpLH4tDTFPmeuAveSr1kuZ2qfc/AGE3GVw=; b=dYGLrL4i2ByURrP5aKgCvocADISbCpTISQuS4n5iE5ybPDdmTtTrUCeRy0k/JRU/6G Av6FXjj+i1hQwv93SRQ7p5P6vHOiloD5BKNsMUbJRGpwZ1leGN/BaACi8ZWuyi3sVGpk xwHCiNaMf5yMzoJnPxWNXiL8hBX+PEuSVDRD0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :x-mailer:mime-version:content-type:content-transfer-encoding; b=wVurB1pNVV8K9bcbX4Yr4wDHA/yMnVLT2C5sVXS+j4R2PZm6D5OKy00ij9swtou9mo /CTY89a6uYd6tSd7AzhNWl9ETpVodGBAwmYTeLlG9JZAnFWyoGYb8WDRZuJemgMrvGzl +JCs/jLenkHaplfVzK8c5q5StIdrjsvCjsrxs= Received: by 10.223.120.72 with SMTP id c8mr3500278far.65.1285603956860; Mon, 27 Sep 2010 09:12:36 -0700 (PDT) Received: from ernst.jennejohn.org (p578E3B24.dip.t-dialin.net [87.142.59.36]) by mx.google.com with ESMTPS id r8sm2507997faq.34.2010.09.27.09.12.35 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 27 Sep 2010 09:12:35 -0700 (PDT) Date: Mon, 27 Sep 2010 18:12:33 +0200 From: Gary Jennejohn To: Michael Naef Message-ID: <20100927181233.0e8c2869@ernst.jennejohn.org> In-Reply-To: <201009271624.46655.cal@linu.gs> References: <201009231938.09548.cal@linu.gs> <66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch> <201009271624.46655.cal@linu.gs> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.18.7; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-fs Subject: Re: Strange behaviour with sappend flag set on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gljennjohn@googlemail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 16:12:38 -0000 On Mon, 27 Sep 2010 16:24:42 +0200 Michael Naef wrote: > Hi all > > On Friday 24 September 2010 01:15:55 Markus Gebert wrote: > > > CURRENT and STABLE-8 seem to be affected to. The following patch > > seems to fix it (at least Michi's test case works fine with > > it): > > > > ---- > > diff -ru > > ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops > > .c ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > > --- > > ../src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops > > .c 2010-05-19 08:49:52.000000000 +0200 +++ > > ./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > > 2010-09-23 23:24:43.549846948 +0200 @@ -709,7 +709,7 @@ > > */ > > pflags = zp->z_phys->zp_flags; > > if ((pflags & (ZFS_IMMUTABLE | ZFS_READONLY)) || > > - ((pflags & ZFS_APPENDONLY) && !(ioflag & FAPPEND) && > > + ((pflags & ZFS_APPENDONLY) && !(ioflag & IO_APPEND) > > && (uio->uio_loffset < zp->z_phys->zp_size))) { > > ZFS_EXIT(zfsvfs); > > return (EPERM); > > ---- > > > > Can someone commit this if the patch is ok? Or should I (or > > Michi) open a PR? > > Whats the next step? Is anyboby willing and able to commit the > patch or should/must I open a PR? (Having a patch for bash which > solves the most urgent problem, though - but I need a decision.) > Sending a PR is always a good idea - it ends up in the tracking system and doesn't get lost in the mailing-list noise. -- Gary Jennejohn From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 16:28:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CED3D1065673 for ; Mon, 27 Sep 2010 16:28:38 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0AF9C8FC08 for ; Mon, 27 Sep 2010 16:28:37 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA13048; Mon, 27 Sep 2010 19:28:31 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA0C62E.4080809@icyb.net.ua> Date: Mon, 27 Sep 2010 19:28:30 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: gljennjohn@googlemail.com References: <201009231938.09548.cal@linu.gs> <66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch> <201009271624.46655.cal@linu.gs> <20100927181233.0e8c2869@ernst.jennejohn.org> In-Reply-To: <20100927181233.0e8c2869@ernst.jennejohn.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs Subject: Re: Strange behaviour with sappend flag set on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 16:28:38 -0000 on 27/09/2010 19:12 Gary Jennejohn said the following: > > Sending a PR is always a good idea - it ends up in the tracking > system and doesn't get lost in the mailing-list noise. Yeah, it just gets lost in the PR database instead :) While the thread is active there is some hope that someone would get hooked. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Sep 27 21:23:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 96212106566B for ; Mon, 27 Sep 2010 21:23:11 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 53C008FC22 for ; Mon, 27 Sep 2010 21:23:11 +0000 (UTC) Received: by iwn34 with SMTP id 34so6388153iwn.13 for ; Mon, 27 Sep 2010 14:23:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=EA9NdXItbk6sWNSzraOx0Y1AfiVVwznnxFf7JqcqZM0=; b=uOf5zcH4WsmULiD5E+nt6ZqEhgvhh+3o8PEUnatIGcAvAzrvJ21mh05x0ZRrm8zTnQ UrZ3SALquDUsu5fHhdsk67ZMsKDXhp9F2t5GvKmvLUsW1mONfz7aRNTTi6m2gDcXtzRF DNzifbXCcEpZbwO1nmTczMOb7o0T9X1VpmV28= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=FmlEMClIUAsFqkQgUofuwNkgTD+U86jeZmHWn7XI+M6ilAElue45nBdWCjctLV7SpH 8LNgKsMbl92CK/hNdIrDHinUoGSI6tGWHBv99SSDaQ3uk+hdKb3EkL/SytAqYw9dWOSt LWc1mXiWXnnBpY47YXOxuaXaP2gPKJKwnoB0s= Received: by 10.231.33.203 with SMTP id i11mr8914079ibd.8.1285622590414; Mon, 27 Sep 2010 14:23:10 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-158-110.dsl.klmzmi.sbcglobal.net [99.181.158.110]) by mx.google.com with ESMTPS id i6sm6655851iba.20.2010.09.27.14.23.08 (version=SSLv3 cipher=RC4-MD5); Mon, 27 Sep 2010 14:23:08 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4CA10B3A.10401@DataIX.net> Date: Mon, 27 Sep 2010 17:23:06 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.2.9) Gecko/20100917 Lightning/1.0b1 Thunderbird MIME-Version: 1.0 To: FreeBSD Filesystems References: <201009231938.09548.cal@linu.gs> <66757A1E-E445-4AAD-8F57-382D85BFD579@hostpoint.ch> <201009271624.46655.cal@linu.gs> <20100927181233.0e8c2869@ernst.jennejohn.org> <4CA0C62E.4080809@icyb.net.ua> In-Reply-To: <4CA0C62E.4080809@icyb.net.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andriy Gapon Subject: Re: Strange behaviour with sappend flag set on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 21:23:11 -0000 On 09/27/2010 12:28, Andriy Gapon wrote: > on 27/09/2010 19:12 Gary Jennejohn said the following: >> >> Sending a PR is always a good idea - it ends up in the tracking >> system and doesn't get lost in the mailing-list noise. > > Yeah, it just gets lost in the PR database instead :) > While the thread is active there is some hope that someone would get hooked. > Yeah! ;) On the same note, hopes and wishes high, would be real nice if all userland flags would work. Specifically arch, opaque, uappend, uchg, uunlink. But like I said this is just a wish and more a convenience than anything else. If I would put something at the top of the list it would be opaque and then uchg. Regards, Hi Andriy, -- jhell,v From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 02:38:15 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E21DD106566B for ; Tue, 28 Sep 2010 02:38:15 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9B9E38FC0C for ; Tue, 28 Sep 2010 02:38:15 +0000 (UTC) Received: by yxn35 with SMTP id 35so2200909yxn.13 for ; Mon, 27 Sep 2010 19:38:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :content-type:content-transfer-encoding:mime-version:subject:date :x-mailer; bh=yjWxlVVcQdNyYIImg2i8flXoTwAmx9ObTJSDPT6IMJ8=; b=dkndVuG5ykpA3orIu/+nkdaBYJZWjoyTSQ40i7qlI7JcTbgU5DnsSwUNYl1va87VKL eBO7wMJiUf9Rr83whdlmdJozKuiuaQRwzwmCFWTvYlnKxyaGS0jobQdN/aNTM0BXD+m9 aIF0aVUN78pDEK11NTrMESDBpuvPWgwotZvHM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:content-type:content-transfer-encoding :mime-version:subject:date:x-mailer; b=ga6zFZikABErEfuTkuiLbua12xacL2F5L7lm6j3mGy58Rr9M/gBugZarr+pYTZn0FQ JJH0i92RpqwiwSweInhBaLH6QWhmKViiLxiF6GakLsmF3jOcFPEmBYvezX4Flw7LG+ms vXlXX1pxG1U4tCnNZrj3msSw+PMcrWJb6rUV8= Received: by 10.150.12.9 with SMTP id 9mr10106824ybl.213.1285641492895; Mon, 27 Sep 2010 19:38:12 -0700 (PDT) Received: from [192.168.1.4] (ip98-164-15-137.ks.ks.cox.net [98.164.15.137]) by mx.google.com with ESMTPS id u42sm9966157yba.12.2010.09.27.19.38.11 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 27 Sep 2010 19:38:11 -0700 (PDT) Message-Id: <246F9240-10FE-4BD6-B72B-D374F9BB1FC9@gmail.com> From: Chris Watson To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Mon, 27 Sep 2010 21:38:04 -0500 X-Mailer: Apple Mail (2.936) Subject: zdb and zpool status inconsistency question... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 02:38:16 -0000 Apologies if this is common knowledge but I am confused about that output of zdb and zpool status. Running a: priyanka# zdb data version=14 name='data' state=0 txg=23 pool_guid=7697236283104447800 hostid=1421614680 hostname='priyanka.open-systems.net' vdev_tree type='root' id=0 guid=7697236283104447800 children[0] type='mirror' id=0 guid=13989036133163076272 metaslab_array=26 metaslab_shift=33 ashift=9 asize=1000199946240 is_log=0 children[0] type='disk' id=0 guid=15173803910329500054 path='/dev/ada2' whole_disk=0 children[1] type='disk' id=1 guid=17277025077506889808 path='/dev/ada3' whole_disk=0 children[1] type='mirror' id=1 guid=5773672864445772603 metaslab_array=23 metaslab_shift=33 ashift=9 asize=1000199946240 is_log=0 children[0] type='disk' id=0 guid=2441189965306101196 path='/dev/ada4' whole_disk=0 children[1] type='disk' id=1 guid=6210476332908709518 path='/dev/ada5' whole_disk=0 Uberblock magic = 0000000000bab10c version = 14 txg = 11387 guid_sum = 13222208345635842403 timestamp = 1285637267 UTC = Mon Sep 27 20:27:47 2010 Dataset mos [META], ID 0, cr_txg 4, 1.06M, 44 objects Dataset data/Aperture [ZPL], ID 31, cr_txg 37, 47.9G, 17596 objects Dataset data [ZPL], ID 16, cr_txg 1, 19.0K, 5 objects [...] capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum data 47.9G 1.77T 292 0 32.6M 0 0 0 2 mirror 23.9G 904G 146 0 16.3M 0 0 0 6 /dev/ada2 141 0 16.5M 0 0 0 6 /dev/ada3 141 0 16.5M 0 0 0 6 mirror 23.9G 904G 146 0 16.3M 0 0 0 2 /dev/ada4 141 0 16.5M 0 0 0 2 /dev/ada5 141 0 16.5M 0 0 0 2 priyanka# produces cksum errors of 2,6,6,6,2,2,2 respectively. While a: priyanka# zpool status -v data pool: data state: ONLINE scrub: scrub completed after 0h6m with 0 errors on Mon Sep 27 20:22:10 2010 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 errors: No known data errors priyanka# So the two questions I have that I don't understand are the following: 1) Why does zdb report cksum errors while zpool status does not? 2) Assuming zdb is correct, shouldnt the errors from zdb for the pool "data" be 8 instead of 2? Since the first mirror has 6 and the second mirror has 2 cksum errors? The zdb man page is pretty sparse. And I know it's not meant to be run by the average joe. I'm just trying to learn ZFS as thoroughly as I can. So while I have a test system I am trying many configs and options to learn how it works and why. And the above inconsistency confused me. Again apologies if this is covered elsewhere. Thanks for any schooling about the above! Chris From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 11:24:38 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 622EE1065670; Tue, 28 Sep 2010 11:24:38 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id F003C8FC22; Tue, 28 Sep 2010 11:24:37 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id F0615153434; Tue, 28 Sep 2010 13:24:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id d7i5zrXYXYnO; Tue, 28 Sep 2010 13:24:31 +0200 (CEST) Received: from [127.0.0.1] (opteron [192.168.10.67]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 08986153433; Tue, 28 Sep 2010 13:24:31 +0200 (CEST) Message-ID: <4CA1D06C.9050305@digiware.nl> Date: Tue, 28 Sep 2010 13:24:28 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: stable@freebsd.org, fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 11:24:38 -0000 Hi, This is with stable as of yesterday,but with an un-tunned ZFS box I was still able to generate a kmem exhausted panic. Hard panic, just 3 lines. The box contains 12Gb memory, runs on a 6 core (with HT) xeon. 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log. The box died while rsyncing 5.8T from its partnering system. (that was the only activity on the box) So the obvious would to conclude that auto-tuning voor ZFS on 8.1-Stable is not yet quite there. So I guess that we still need tuning advice even for 8.1. And thus prevent a hard panic. At the moment trying to 'zfs send | rsh zfs receive' the stuff. Which seems to run at about 40Mb/sec, and is a lot faster than the rsync stuff. --WjW From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 12:04:05 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 124591065673 for ; Tue, 28 Sep 2010 12:04:05 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id B4A3C8FC0C for ; Tue, 28 Sep 2010 12:04:04 +0000 (UTC) Received: from omta10.westchester.pa.mail.comcast.net ([76.96.62.28]) by qmta01.westchester.pa.mail.comcast.net with comcast id CAax1f0060cZkys51BqqWX; Tue, 28 Sep 2010 11:50:50 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta10.westchester.pa.mail.comcast.net with comcast id CBqo1f00P3LrwQ23WBqp1Z; Tue, 28 Sep 2010 11:50:50 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 5DA489B418; Tue, 28 Sep 2010 04:50:47 -0700 (PDT) Date: Tue, 28 Sep 2010 04:50:47 -0700 From: Jeremy Chadwick To: Willem Jan Withagen Message-ID: <20100928115047.GA62142@icarus.home.lan> References: <4CA1D06C.9050305@digiware.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CA1D06C.9050305@digiware.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 12:04:05 -0000 On Tue, Sep 28, 2010 at 01:24:28PM +0200, Willem Jan Withagen wrote: > This is with stable as of yesterday,but with an un-tunned ZFS box I > was still able to generate a kmem exhausted panic. > Hard panic, just 3 lines. > > The box contains 12Gb memory, runs on a 6 core (with HT) xeon. > 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log. > > The box died while rsyncing 5.8T from its partnering system. > (that was the only activity on the box) It would help if you could provide output from the following commands (even after the box has rebooted): $ sysctl -a | egrep ^vm.kmem $ sysctl -a | egrep ^vfs.zfs.arc $ sysctl kstat.zfs.misc.arcstats > So the obvious would to conclude that auto-tuning voor ZFS on > 8.1-Stable is not yet quite there. > > So I guess that we still need tuning advice even for 8.1. > And thus prevent a hard panic. Andriy Gapon provides this general recommendation: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html The advice I've given for RELENG_8 (as of the time of this writing), 8.1-STABLE, and 8.1-RELEASE, is that for amd64 you'll need to tune: vm.kmem_size vfs.zfs.arc_max An example machine: amd64, with 4GB physical RAM installed (3916MB available for use (verified via dmesg)) uses values: vm.kmem_size="4096M" vfs.zfs.arc_max="3584M" Another example machine: amd64, with 8GB physical RAM installed (7875MB available for use) uses values: vm.kmem_size="8192M" vfs.zfs.arc_max="6144M" I believe the trick -- Andriy, please correct me if I'm wrong -- is the tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high watermark". However, I believe there have been occasional reports of exhaustion panics despite both of these being set[1]. Those reports are being investigated on an individual basis. I set some other ZFS-related parameters as well (disabling prefetch, adjusting txg.timeout, etc.), but those shouldn't be necessary to gain stability at this point in time. I can't provide tuning advice for i386. [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 12:37:28 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D35BC106564A for ; Tue, 28 Sep 2010 12:37:28 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1E7FF8FC0C for ; Tue, 28 Sep 2010 12:37:27 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA03464; Tue, 28 Sep 2010 15:22:02 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1DDE9.8090107@icyb.net.ua> Date: Tue, 28 Sep 2010 15:22:01 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> In-Reply-To: <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 12:37:28 -0000 on 28/09/2010 14:50 Jeremy Chadwick said the following: > I believe the trick -- Andriy, please correct me if I'm wrong -- is the Wouldn't hurt to CC me, so that I could do it :-) > tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high > watermark". Not sure what you mean here. What is hard limit, what is high watermark, what is the difference and when is "now"? :-) I believe that "the trick" is to set vm.kmem_size high enough, eitehr using this tunable or vm.kmem_size_scale. > However, I believe there have been occasional reports of exhaustion > panics despite both of these being set[1]. Those reports are being > investigated on an individual basis. I don't believe that the report that you quote actually demonstrates what you say it does. Two quotes from it: "During these panics no tuning or /boot/loader.conf values where present." "Only after hitting this behaviour yesterday i created boot/loader.conf" > > [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 13:25:39 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 927601065673; Tue, 28 Sep 2010 13:25:39 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id DF9AC8FC17; Tue, 28 Sep 2010 13:25:38 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 47227153434; Tue, 28 Sep 2010 15:25:37 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Cri7K4Ph43q9; Tue, 28 Sep 2010 15:25:33 +0200 (CEST) Received: from [127.0.0.1] (unknown [192.168.254.10]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 63A24153433; Tue, 28 Sep 2010 15:25:33 +0200 (CEST) Message-ID: <4CA1ECCC.4070801@digiware.nl> Date: Tue, 28 Sep 2010 15:25:32 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> In-Reply-To: <20100928115047.GA62142@icarus.home.lan> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, "avg@icyb.net.ua >> Andriy Gapon" , fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 13:25:39 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28-9-2010 13:50, Jeremy Chadwick wrote: > On Tue, Sep 28, 2010 at 01:24:28PM +0200, Willem Jan Withagen wrote: >> This is with stable as of yesterday,but with an un-tunned ZFS box I >> was still able to generate a kmem exhausted panic. >> Hard panic, just 3 lines. >> >> The box contains 12Gb memory, runs on a 6 core (with HT) xeon. >> 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log. >> >> The box died while rsyncing 5.8T from its partnering system. >> (that was the only activity on the box) > > It would help if you could provide output from the following commands > (even after the box has rebooted): It is currently in the proces of zfs receive of that same 5.8T. > $ sysctl -a | egrep ^vm.kmem > $ sysctl -a | egrep ^vfs.zfs.arc > $ sysctl kstat.zfs.misc.arcstats > sysctl -a | egrep ^vm.kmem vm.kmem_size_scale: 3 vm.kmem_size_max: 329853485875 vm.kmem_size_min: 0 vm.kmem_size: 4156850176 > sysctl -a | egrep ^vfs.zfs.arc vfs.zfs.arc_meta_limit: 770777088 vfs.zfs.arc_meta_used: 33449648 vfs.zfs.arc_min: 385388544 vfs.zfs.arc_max: 3083108352 > sysctl kstat.zfs.misc.arcstats kstat.zfs.misc.arcstats.hits: 3119873 kstat.zfs.misc.arcstats.misses: 98710 kstat.zfs.misc.arcstats.demand_data_hits: 3043947 kstat.zfs.misc.arcstats.demand_data_misses: 3699 kstat.zfs.misc.arcstats.demand_metadata_hits: 67981 kstat.zfs.misc.arcstats.demand_metadata_misses: 90005 kstat.zfs.misc.arcstats.prefetch_data_hits: 121 kstat.zfs.misc.arcstats.prefetch_data_misses: 48 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 7824 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4958 kstat.zfs.misc.arcstats.mru_hits: 34828 kstat.zfs.misc.arcstats.mru_ghost_hits: 21736 kstat.zfs.misc.arcstats.mfu_hits: 3077133 kstat.zfs.misc.arcstats.mfu_ghost_hits: 47605 kstat.zfs.misc.arcstats.allocated: 5507025 kstat.zfs.misc.arcstats.deleted: 5349715 kstat.zfs.misc.arcstats.stolen: 4468221 kstat.zfs.misc.arcstats.recycle_miss: 83995 kstat.zfs.misc.arcstats.mutex_miss: 231 kstat.zfs.misc.arcstats.evict_skip: 130461 kstat.zfs.misc.arcstats.evict_l2_cached: 0 kstat.zfs.misc.arcstats.evict_l2_eligible: 592200836608 kstat.zfs.misc.arcstats.evict_l2_ineligible: 11000092160 kstat.zfs.misc.arcstats.hash_elements: 20585 kstat.zfs.misc.arcstats.hash_elements_max: 150543 kstat.zfs.misc.arcstats.hash_collisions: 761847 kstat.zfs.misc.arcstats.hash_chains: 780 kstat.zfs.misc.arcstats.hash_chain_max: 6 kstat.zfs.misc.arcstats.p: 2266075295 kstat.zfs.misc.arcstats.c: 2410082200 kstat.zfs.misc.arcstats.c_min: 385388544 kstat.zfs.misc.arcstats.c_max: 3083108352 kstat.zfs.misc.arcstats.size: 2410286720 kstat.zfs.misc.arcstats.hdr_size: 7565040 kstat.zfs.misc.arcstats.data_size: 2394099200 kstat.zfs.misc.arcstats.other_size: 8622480 kstat.zfs.misc.arcstats.l2_hits: 0 kstat.zfs.misc.arcstats.l2_misses: 0 kstat.zfs.misc.arcstats.l2_feeds: 0 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_read_bytes: 0 kstat.zfs.misc.arcstats.l2_write_bytes: 0 kstat.zfs.misc.arcstats.l2_writes_sent: 0 kstat.zfs.misc.arcstats.l2_writes_done: 0 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_free_on_write: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 0 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0 kstat.zfs.misc.arcstats.l2_write_in_l2: 0 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 85908 kstat.zfs.misc.arcstats.l2_write_full: 0 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0 kstat.zfs.misc.arcstats.l2_write_pios: 0 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0 >> So the obvious would to conclude that auto-tuning voor ZFS on >> 8.1-Stable is not yet quite there. >> >> So I guess that we still need tuning advice even for 8.1. >> And thus prevent a hard panic. > > Andriy Gapon provides this general recommendation: > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html > > The advice I've given for RELENG_8 (as of the time of this writing), > 8.1-STABLE, and 8.1-RELEASE, is that for amd64 you'll need to tune: Well advises seem to vary, and the latest I understood was that 8.1-stable did not need any tuning. (The other system with a much older kernel is tuned as to what most here are suggesting) And I was shure led to believe that even since 8.0 panics were no longer among us...... > > vm.kmem_size > vfs.zfs.arc_max real memory = 12889096192 (12292 MB) avail memory = 12408684544 (11833 MB) So that prompts vm.kmem_size=18G. Form the other post: > As to arc_max/arc_min, set them based your needs according to general > ZFS recommendations. I'm seriously at a loss what general recommendations would be. The other box has 8G loader.conf: vm.kmem_size="14G" # 2* phys RAM size for ZFS perf. vm.kmem_size_scale="1" vfs.zfs.arc_min="1G" vfs.zfs.arc_max="6G" So I'd select something like 11G for arc_max on a box with 12G mem. > I believe the trick -- Andriy, please correct me if I'm wrong -- is the > tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high > watermark". > I can't provide tuning advice for i386. This is amd64. - --WjW -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMoezMAAoJEP4k4K6R6rBhEScIAI/rZH5/VTmASMGyEYu4NZHU SSFo3TOSOkYPEJicd8/NgM7w7D3xgMA0Xse0fu3tQOsjX940Z6fUKvnM7LCX2OJK vvkW0LpGuKbv/9sFFvkklodjkArtRzzoptLtiCVsaYsoieRqnmYMpBxU9WFYCY2I HoRx1nMbArg2HvKPzeZjf9knnQaU6YOR/PUiFBo6YuHkDJ40noqRElewbPEiOVZz zqnUh90ZDFVdHMYNuZegOKtfSVCA1AifHR3e7+zn8jSco/+svESd7tBIxmHZWQ8u BA1AKyYVTHs+wKsTw2J7u1v8yg74HxJNyVqwPRP048Z8onoPlGgtnFCTWbl2ICU= =KiyH -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 13:36:47 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C76E106564A; Tue, 28 Sep 2010 13:36:47 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 27D078FC0C; Tue, 28 Sep 2010 13:36:45 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04596; Tue, 28 Sep 2010 16:36:41 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1EF69.4040402@icyb.net.ua> Date: Tue, 28 Sep 2010 16:36:41 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> In-Reply-To: <20100928132355.GA63149@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 13:36:47 -0000 on 28/09/2010 16:23 Jeremy Chadwick said the following: > On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote: >> on 28/09/2010 14:50 Jeremy Chadwick said the following: >>> I believe the trick -- Andriy, please correct me if I'm wrong -- is the >> >> Wouldn't hurt to CC me, so that I could do it :-) >> >>> tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high >>> watermark". >> >> Not sure what you mean here. >> What is hard limit, what is high watermark, what is the difference and when is >> "now"? :-) > > There was some speculation on the part of users a while back which lead > to this understanding. Folks were seeing actual ARC usage higher than > what vfs.zfs.arc_max was set to (automatically or administratively). I > believe it started here: > > http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28884.html > > With the "high-water mark" statements being here: > > http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28887.html > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-04/msg00129.html > > The term implies that there is not an explicitly hard limit on the ARC > utilisation/growth. As stated in the unix.derkeiler.com URL above, this > behaviour was in fact changed. Why/when/how? I had to go digging up > the commits -- this took me some time. Here they are, labelled r197816, > for RELENG_8 and RELENG_7 respectively. These were both committed on > 2010/01/08 UTC: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.2 > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.15.2.6 > > In HEAD/CURRENT (yet to be MFC'd), it looks like above code got removed > on 2010/09/17 UTC, citing they should be "enforced by actual > calculations of delta": > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.46 > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.45 > > So what's this "delta" code piece that's mentioned? That appears to be > have been committed to RELENG_8 on 2010/05/24 UTC (thus, between the > above two dates): > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.4 > > (Side note: the "delta stuff" was never committed to RELENG_7 -- and > that's fine. I'm pointing this out not out of retaliation or insult, > but because people will almost certainly Google, find this post, and > wonder if their 7.x machines might be affected.) > > This situation with the ARC, and all its changes over time, is one of > the reasons why I rant aggressively about the need for more > communication transparency (re: what the changes actually affect). Most > SAs and users don't follow commits. Well, no time for me to dig through all that history. arc_max should be a hard limit and it is now. If it ever wasn't then it was a bug. Besides, "high watermark" is still an ambiguous term, for you it "implies" that it is not a hard limit, but for me it "implies" exactly a hard limit. Additionally, going from "non-hard limit" to a "hard limit" on ARC size should improve things memory-wise, not vice versa, right? :) P.S. All that I said above is a hint that this is a pointless branch of the thread :) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 13:37:10 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E5B610657C7 for ; Tue, 28 Sep 2010 13:37:10 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id 445A68FC1D for ; Tue, 28 Sep 2010 13:37:10 +0000 (UTC) Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51]) by qmta04.emeryville.ca.mail.comcast.net with comcast id CC4a1f00416AWCUA4DPxtk; Tue, 28 Sep 2010 13:23:57 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta06.emeryville.ca.mail.comcast.net with comcast id CDPw1f0013LrwQ28SDPwRH; Tue, 28 Sep 2010 13:23:57 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id EC1849B418; Tue, 28 Sep 2010 06:23:55 -0700 (PDT) Date: Tue, 28 Sep 2010 06:23:55 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20100928132355.GA63149@icarus.home.lan> References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CA1DDE9.8090107@icyb.net.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 13:37:10 -0000 On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote: > on 28/09/2010 14:50 Jeremy Chadwick said the following: > > I believe the trick -- Andriy, please correct me if I'm wrong -- is the > > Wouldn't hurt to CC me, so that I could do it :-) > > > tuning of vfs.zfs.arc_max, which is now a hard limit rather than a "high > > watermark". > > Not sure what you mean here. > What is hard limit, what is high watermark, what is the difference and when is > "now"? :-) There was some speculation on the part of users a while back which lead to this understanding. Folks were seeing actual ARC usage higher than what vfs.zfs.arc_max was set to (automatically or administratively). I believe it started here: http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28884.html With the "high-water mark" statements being here: http://www.mailinglistarchive.com/freebsd-current@freebsd.org/msg28887.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-04/msg00129.html The term implies that there is not an explicitly hard limit on the ARC utilisation/growth. As stated in the unix.derkeiler.com URL above, this behaviour was in fact changed. Why/when/how? I had to go digging up the commits -- this took me some time. Here they are, labelled r197816, for RELENG_8 and RELENG_7 respectively. These were both committed on 2010/01/08 UTC: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.2 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.15.2.6 In HEAD/CURRENT (yet to be MFC'd), it looks like above code got removed on 2010/09/17 UTC, citing they should be "enforced by actual calculations of delta": http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.46 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.45 So what's this "delta" code piece that's mentioned? That appears to be have been committed to RELENG_8 on 2010/05/24 UTC (thus, between the above two dates): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.4 (Side note: the "delta stuff" was never committed to RELENG_7 -- and that's fine. I'm pointing this out not out of retaliation or insult, but because people will almost certainly Google, find this post, and wonder if their 7.x machines might be affected.) This situation with the ARC, and all its changes over time, is one of the reasons why I rant aggressively about the need for more communication transparency (re: what the changes actually affect). Most SAs and users don't follow commits. > I believe that "the trick" is to set vm.kmem_size high enough, eitehr using this > tunable or vm.kmem_size_scale. Thanks for the clarification. I just wish I knew how vm.kmem_size_scale fit into the picture (meaning what it does, etc.). The sysctl description isn't very helpful. Again, my lack of VM knowledge... > > However, I believe there have been occasional reports of exhaustion > > panics despite both of these being set[1]. Those reports are being > > investigated on an individual basis. > > I don't believe that the report that you quote actually demonstrates what you say > it does. > Two quotes from it: > "During these panics no tuning or /boot/loader.conf values where present." > "Only after hitting this behaviour yesterday i created boot/loader.conf" > > > [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html You're right -- the report I'm quoting is not the one I thought it was. I'll see if I can dig up the correct mail/report. It could be that I'm thinking of something quite old (pre-ARC-changes (see above paragraphs)). I can barely keep track of all the changes going on. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 13:39:10 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 430CF10656B2; Tue, 28 Sep 2010 13:39:10 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5AD548FC1E; Tue, 28 Sep 2010 13:39:09 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04620; Tue, 28 Sep 2010 16:39:06 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1EFF9.1050802@icyb.net.ua> Date: Tue, 28 Sep 2010 16:39:05 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> In-Reply-To: <20100928132355.GA63149@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 13:39:10 -0000 on 28/09/2010 16:23 Jeremy Chadwick said the following: > On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote: >> I believe that "the trick" is to set vm.kmem_size high enough, eitehr using this >> tunable or vm.kmem_size_scale. > > Thanks for the clarification. I just wish I knew how vm.kmem_size_scale > fit into the picture (meaning what it does, etc.). The sysctl > description isn't very helpful. Again, my lack of VM knowledge... > Roughly, vm.kmem_size would get set to divided by vm.kmem_size_scale. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 13:46:34 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04837106566C; Tue, 28 Sep 2010 13:46:34 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 10D8B8FC08; Tue, 28 Sep 2010 13:46:32 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04711; Tue, 28 Sep 2010 16:46:29 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1F1B4.1020700@icyb.net.ua> Date: Tue, 28 Sep 2010 16:46:28 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Willem Jan Withagen References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> In-Reply-To: <4CA1ECCC.4070801@digiware.nl> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 13:46:34 -0000 on 28/09/2010 16:25 Willem Jan Withagen said the following: > Well advises seem to vary, and the latest I understood was that > 8.1-stable did not need any tuning. (The other system with a much older > kernel is tuned as to what most here are suggesting) > And I was shure led to believe that even since 8.0 panics were no longer > among us...... Well, now you have demonstrated yourself that it is not always so. >> vm.kmem_size >> vfs.zfs.arc_max > > real memory = 12889096192 (12292 MB) > avail memory = 12408684544 (11833 MB) > > So that prompts vm.kmem_size=18G. > > Form the other post: >> As to arc_max/arc_min, set them based your needs according to general >> ZFS recommendations. > > I'm seriously at a loss what general recommendations would be. Have you asked Mr. Google? :) - http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Search for "Memory and Dynamic Reconfiguration Recommendation" - http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache Short version - decide how much memory you need for everything else but ZFS ARC. If autotuned value suits you, then you don't need to change anything. > The other box has 8G > loader.conf: > vm.kmem_size="14G" # 2* phys RAM size for ZFS perf. > vm.kmem_size_scale="1" No need to set both of the above. vm.kmem_size overrides vm.kmem_size_scale. > vfs.zfs.arc_min="1G" > vfs.zfs.arc_max="6G" > > So I'd select something like 11G for arc_max on a box with 12G mem. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 14:02:33 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0DD92106567A; Tue, 28 Sep 2010 14:02:33 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 721E38FC15; Tue, 28 Sep 2010 14:02:32 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id C4C7515346A; Tue, 28 Sep 2010 16:02:31 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WAt0f0YQumbi; Tue, 28 Sep 2010 16:02:25 +0200 (CEST) Received: from [127.0.0.1] (unknown [192.168.254.10]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 9B7D215346C; Tue, 28 Sep 2010 16:02:25 +0200 (CEST) Message-ID: <4CA1F570.6000602@digiware.nl> Date: Tue, 28 Sep 2010 16:02:24 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Andriy Gapon References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua> In-Reply-To: <4CA1F1B4.1020700@icyb.net.ua> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 14:02:33 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28-9-2010 15:46, Andriy Gapon wrote: > on 28/09/2010 16:25 Willem Jan Withagen said the following: >> Well advises seem to vary, and the latest I understood was that >> 8.1-stable did not need any tuning. (The other system with a much >> older kernel is tuned as to what most here are suggesting) And I >> was shure led to believe that even since 8.0 panics were no longer >> among us...... > > Well, now you have demonstrated yourself that it is not always so. I thought I should share the knowledge. ;) Which is not a bad thing ofr those (starting to) use ZFS. I do not read commits, but do read a lot of FreeBSD groups. And for me there is still a shroud of black art over ZFS. Just glad that my main fileserver doesn't crash. (knock on wood). >>> vm.kmem_size vfs.zfs.arc_max >> >> real memory = 12889096192 (12292 MB) avail memory = 12408684544 >> (11833 MB) >> >> So that prompts vm.kmem_size=18G. >> >> Form the other post: >>> As to arc_max/arc_min, set them based your needs according to >>> general ZFS recommendations. >> >> I'm seriously at a loss what general recommendations would be. > > Have you asked Mr. Google? :) - > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > > Search for "Memory and Dynamic Reconfiguration Recommendation" > - > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache > > Short version - decide how much memory you need for everything else > but ZFS ARC. > If autotuned value suits you, then you don't need to change > anything. I do have (read) this document, but still that doesn't really give you guidelines for tuning on FreeBSD. It is a fileserver without any serious other apps. I was using "auto-tuned", and that crashed my box. That is what started this whole thread. - --WjW -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMofVwAAoJEP4k4K6R6rBhFaUH/3wahrGWO71+xBhHi/ayNoaf DfbOWMD262XfualJudPRgoji7xb9lGaRmd4emv7QBcDjqzmcsiyIeXskT5IYKj7P DvJDULIH66iKQrRZeIBouMXMhLfiLjjT85Lj1hE8fuGg8NAOv97dnUwvVIwC0/Ai yzeeEHYivCYbRmzBhISlAWjdpSXk7xVs6gZnaLUUp953+Uv/8KmNLeG+laoWn+Hn wdKHUG3kR0g/XwJIMc5dZzYvs2kdDPh47uLythoYGC0yaLCwtxLHqEGIPtb/Gypy nIIWxOGtueJo2HjpS0+HlX/pTRW8tfYzXTzKgFKDd90t9fDt2p18BPSexuJSLVc= =hSAg -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 14:07:34 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1CBA106566C; Tue, 28 Sep 2010 14:07:34 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C6EA28FC1A; Tue, 28 Sep 2010 14:07:33 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05060; Tue, 28 Sep 2010 17:07:29 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1F6A0.20109@icyb.net.ua> Date: Tue, 28 Sep 2010 17:07:28 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Willem Jan Withagen References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua> <4CA1F570.6000602@digiware.nl> In-Reply-To: <4CA1F570.6000602@digiware.nl> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 14:07:34 -0000 on 28/09/2010 17:02 Willem Jan Withagen said the following: > I do have (read) this document, but still that doesn't really give you > guidelines for tuning on FreeBSD. It is a fileserver without any serious > other apps. > I was using "auto-tuned", and that crashed my box. That is what started > this whole thread. Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 14:09:06 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B4DE106564A; Tue, 28 Sep 2010 14:09:06 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7A4AC8FC25; Tue, 28 Sep 2010 14:09:05 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id D8026153434; Tue, 28 Sep 2010 16:09:04 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NQjP-WbBQvEU; Tue, 28 Sep 2010 16:09:02 +0200 (CEST) Received: from [127.0.0.1] (unknown [192.168.254.10]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 1AE54153433; Tue, 28 Sep 2010 16:09:02 +0200 (CEST) Message-ID: <4CA1F6FD.5090807@digiware.nl> Date: Tue, 28 Sep 2010 16:09:01 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Andriy Gapon References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua> <4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua> In-Reply-To: <4CA1F6A0.20109@icyb.net.ua> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 14:09:06 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28-9-2010 16:07, Andriy Gapon wrote: > on 28/09/2010 17:02 Willem Jan Withagen said the following: >> I do have (read) this document, but still that doesn't really give you >> guidelines for tuning on FreeBSD. It is a fileserver without any serious >> other apps. >> I was using "auto-tuned", and that crashed my box. That is what started >> this whole thread. > > Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. > I consider that a useful statement. - --WjW -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMofb9AAoJEP4k4K6R6rBhqaUH/iFd1GG/pGLEKY+savwCRQDA iitWtiBnUVfscP3Cfy81Mrg0m3SNik+lgRD2ywC03jsE+6sJbExuw52G46RjpExc EleJZTW74KvbLHBnVQd+gWUoULKfGx4sZSBuYlkFpANhbrucpYmyPftbpFzmpD7N IOeeY6H7iOa4vnb03DLYY0iErL+ak8NtiSKqYTLYqDA/UWqVfOsvdcRbywrMIOoV JoaoD+65ZQpFYkugiFr7/BtcxXA9GJNpsUI+vIADbDgr77XmhKfu0ky4/Ci5f/L9 8YbEzhobOtRBTjX4/JAl60ZC2ToPwyZ8F4Al7Kj8r7FJnpnhddw7XlVXqEouJxQ= =X2gD -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 14:25:44 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0A471065695; Tue, 28 Sep 2010 14:25:44 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id CC02B8FC13; Tue, 28 Sep 2010 14:25:43 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05445; Tue, 28 Sep 2010 17:25:39 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1FAE3.9090200@icyb.net.ua> Date: Tue, 28 Sep 2010 17:25:39 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Willem Jan Withagen References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua> <4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua> <4CA1F6FD.5090807@digiware.nl> In-Reply-To: <4CA1F6FD.5090807@digiware.nl> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 14:25:44 -0000 on 28/09/2010 17:09 Willem Jan Withagen said the following: > On 28-9-2010 16:07, Andriy Gapon wrote: >> on 28/09/2010 17:02 Willem Jan Withagen said the following: >>> I do have (read) this document, but still that doesn't really give you >>> guidelines for tuning on FreeBSD. It is a fileserver without any serious >>> other apps. >>> I was using "auto-tuned", and that crashed my box. That is what started >>> this whole thread. > >> Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. > > > I consider that a useful statement. Hm, looks like I've just given a bad advice. It seems that auto-tuned arc_max is based on kmem size. So if you use kmem size that is larger than available physical memory, then you better limit arc_max to the available memory minus 1GB or so, if the autotuned value is larger than that. I think this needs to be fixed in the code. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 14:30:28 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFA661065675; Tue, 28 Sep 2010 14:30:28 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 395C28FC13; Tue, 28 Sep 2010 14:30:28 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 09F23153433; Tue, 28 Sep 2010 16:30:27 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I07McwiaIwDQ; Tue, 28 Sep 2010 16:30:24 +0200 (CEST) Received: from [127.0.0.1] (unknown [192.168.254.10]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 87AB3153435; Tue, 28 Sep 2010 16:30:23 +0200 (CEST) Message-ID: <4CA1FBFE.3020107@digiware.nl> Date: Tue, 28 Sep 2010 16:30:22 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Andriy Gapon References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua> <4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua> <4CA1F6FD.5090807@digiware.nl> <4CA1FAE3.9090200@icyb.net.ua> In-Reply-To: <4CA1FAE3.9090200@icyb.net.ua> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 14:30:28 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28-9-2010 16:25, Andriy Gapon wrote: > on 28/09/2010 17:09 Willem Jan Withagen said the following: >> On 28-9-2010 16:07, Andriy Gapon wrote: >>> on 28/09/2010 17:02 Willem Jan Withagen said the following: >>>> I do have (read) this document, but still that doesn't really give you >>>> guidelines for tuning on FreeBSD. It is a fileserver without any serious >>>> other apps. >>>> I was using "auto-tuned", and that crashed my box. That is what started >>>> this whole thread. >> >>> Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. >> >> >> I consider that a useful statement. > > Hm, looks like I've just given a bad advice. > It seems that auto-tuned arc_max is based on kmem size. > So if you use kmem size that is larger than available physical memory, then you > better limit arc_max to the available memory minus 1GB or so, if the autotuned > value is larger than that. > > I think this needs to be fixed in the code. So in my case (no other serious apps) with 12G phys mem: vm.kmem_size=17G vfs.zfs.arc_max=11G - --WjW -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMofv9AAoJEP4k4K6R6rBhrksH/0L7EP9oSi4hhITZTB0uIk8q 0IEKnc2ltnPUSFJXS9wP1r9iLzNFJJXGqrO1ZvZUFcJeXXwSzSjhD+zbd237yf/r f5nQ7yBNPd7MxZlZjDkIXB9ZJYuE1u0KMfuQSxptzOWB7oin8MpXHa1YdX6CVE7A 3+hSykteHFFqs8qwUSzoUs47r0dW2WxXE2qAEurelL6VFn++K86d32F5WNv/SX4u aN43r+/CgrjiJVNrxG+gchoicEnIaI90jepkjzpEMp8M85VF4skIZbflZrSSNheY Wzi4LD2h8dFf/La+9EB5AYkMgRcTvXcgNkppIsZ94nf7oSyYNZFuxLYC3ilQetY= =WYzV -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 14:32:19 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE0AE106564A; Tue, 28 Sep 2010 14:32:19 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AE7CA8FC14; Tue, 28 Sep 2010 14:32:18 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05575; Tue, 28 Sep 2010 17:32:13 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA1FC6D.1060000@icyb.net.ua> Date: Tue, 28 Sep 2010 17:32:13 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Willem Jan Withagen References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142@icarus.home.lan> <4CA1ECCC.4070801@digiware.nl> <4CA1F1B4.1020700@icyb.net.ua> <4CA1F570.6000602@digiware.nl> <4CA1F6A0.20109@icyb.net.ua> <4CA1F6FD.5090807@digiware.nl> <4CA1FAE3.9090200@icyb.net.ua> <4CA1FBFE.3020107@digiware.nl> In-Reply-To: <4CA1FBFE.3020107@digiware.nl> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 14:32:19 -0000 on 28/09/2010 17:30 Willem Jan Withagen said the following: > So in my case (no other serious apps) with 12G phys mem: > > vm.kmem_size=17G > vfs.zfs.arc_max=11G > Should be good. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 16:24:48 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A33D21065674; Tue, 28 Sep 2010 16:24:48 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 297328FC1B; Tue, 28 Sep 2010 16:24:47 +0000 (UTC) Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152]) (authenticated bits=0) by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SFo5c9027002 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Sep 2010 15:50:06 GMT (envelope-from ben@wanderview.com) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Ben Kelly In-Reply-To: <4CA1EF69.4040402@icyb.net.ua> Date: Tue, 28 Sep 2010 11:50:05 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1 Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 16:24:48 -0000 On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote: > Well, no time for me to dig through all that history. > arc_max should be a hard limit and it is now. If it ever wasn't then = it was a bug. I believe the size of the arc could exceed the limit if your working set = was larger than arc_max. The arc can't (couldn't then, anyway) evict = data that is still referenced. A contributing factor at the time was that the page daemon did not take = into account back pressure from the arc when deciding which pages to = move from active to inactive, etc. So data was more likely to be = referenced and therefore forced to remain in the arc. I'm not sure if this is still the current state. I seem to remember = some changesets mentioning arc back pressure at some point, but I don't = know the details. - Ben= From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 16:30:16 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC0DE1065694; Tue, 28 Sep 2010 16:30:16 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E12CE8FC17; Tue, 28 Sep 2010 16:30:15 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07743; Tue, 28 Sep 2010 19:30:02 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA21809.7090504@icyb.net.ua> Date: Tue, 28 Sep 2010 19:30:01 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Ben Kelly References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 16:30:17 -0000 on 28/09/2010 18:50 Ben Kelly said the following: > > On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote: >> Well, no time for me to dig through all that history. arc_max should be a >> hard limit and it is now. If it ever wasn't then it was a bug. > > I believe the size of the arc could exceed the limit if your working set was > larger than arc_max. The arc can't (couldn't then, anyway) evict data that is > still referenced. I think that you are correct and I was wrong. ARC would still allocate a new buffer even if it's at or above arc_max and can not re-use any exisiting buffer. But I think that this is more likely to happen with "tiny" ARC size. I have hard time imagining a workload at which gigabytes of data would be simultaneously and continuously used (see below for definition of "used"). > A contributing factor at the time was that the page daemon did not take into > account back pressure from the arc when deciding which pages to move from > active to inactive, etc. So data was more likely to be referenced and > therefore forced to remain in the arc. I don't think that this is what happened and I don't think that pagedaemon has anything to do with the discussed issue. I think that ARC buffers exist independently of pagedaemon and page cache. I think that they are held only during time when I/O is happening to or from them. > I'm not sure if this is still the current state. I seem to remember some > changesets mentioning arc back pressure at some point, but I don't know the > details. I think that backpressure has nothing to do with it. If ZFS truly does I/O with all existing buffers and it needs a new buffer, then the choices are limited: either block and wait, or go over the limit. Apparently ZFS designers went with the latter option. But as I've said, for non-tiny ARC sizes it's hard to imagine such amount of parallel I/O that would tie all ARC buffers. Given the adaptive nature of ARC I still see it happening, but only when ARC size is near its minimum, not when it is at maximum. It seems that kstat.zfs.misc.arcstats.recycle_miss is a counter of allocations when ARC refused to grow and no existing buffer could be recycled, but this is not the same as going above ARC maximum size. BTW, such allocation over the limit could be considered as a form of memory pressure from ARC on the rest of the system. P.S. The code is in arc_get_data_buf(). -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 16:46:44 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2144C106566C; Tue, 28 Sep 2010 16:46:44 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 9C4638FC1E; Tue, 28 Sep 2010 16:46:43 +0000 (UTC) Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152]) (authenticated bits=0) by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SGkc6j027489 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Sep 2010 16:46:39 GMT (envelope-from ben@wanderview.com) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Ben Kelly In-Reply-To: <4CA21809.7090504@icyb.net.ua> Date: Tue, 28 Sep 2010 12:46:39 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1 Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 16:46:44 -0000 On Sep 28, 2010, at 12:30 PM, Andriy Gapon wrote: > on 28/09/2010 18:50 Ben Kelly said the following: >>=20 >> On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote: >>> Well, no time for me to dig through all that history. arc_max should = be a >>> hard limit and it is now. If it ever wasn't then it was a bug. >>=20 >> I believe the size of the arc could exceed the limit if your working = set was >> larger than arc_max. The arc can't (couldn't then, anyway) evict = data that is >> still referenced. >=20 > I think that you are correct and I was wrong. > ARC would still allocate a new buffer even if it's at or above arc_max = and can not > re-use any exisiting buffer. > But I think that this is more likely to happen with "tiny" ARC size. = I have hard > time imagining a workload at which gigabytes of data would be = simultaneously and > continuously used (see below for definition of "used"). >=20 >> A contributing factor at the time was that the page daemon did not = take into >> account back pressure from the arc when deciding which pages to move = from >> active to inactive, etc. So data was more likely to be referenced = and >> therefore forced to remain in the arc. >=20 > I don't think that this is what happened and I don't think that = pagedaemon has > anything to do with the discussed issue. > I think that ARC buffers exist independently of pagedaemon and page = cache. > I think that they are held only during time when I/O is happening to = or from them. Hmm. My server is currently idle with no I/O happening: kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max: 46137344 kstat.zfs.misc.arcstats.size: 91863156 If what you say is true, this shouldn't happen, should it? This system = is an i386 machine with kmem max at 800M and arc set to 40M. This is = running head from April 6, 2010, so it is a bit old, though. At one point I had patches running on my system that triggered the = pagedaemon based on arc load and it did allow me to keep my arc below = the max. Or at least I thought it did. In any case, I've never really been able to wrap my head around the VFS = layer and how it interacts with zfs. So I'm more than willing to = believe I'm confused. Any insights are greatly appreciated. Thanks! - Ben= From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 17:17:58 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 536271065695; Tue, 28 Sep 2010 17:17:58 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 667268FC08; Tue, 28 Sep 2010 17:17:56 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA08291; Tue, 28 Sep 2010 20:17:44 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA22337.2010900@icyb.net.ua> Date: Tue, 28 Sep 2010 20:17:43 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Ben Kelly References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> In-Reply-To: <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 17:17:58 -0000 on 28/09/2010 19:46 Ben Kelly said the following: > Hmm. My server is currently idle with no I/O happening: > > kstat.zfs.misc.arcstats.c: 25165824 > kstat.zfs.misc.arcstats.c_max: 46137344 > kstat.zfs.misc.arcstats.size: 91863156 > > If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. Well, your system is a bit old indeed. And the branch is unknown, so I can't really see what sources you have. And I am not sure if I'll be able to say anything about those sources. As to the numbers - yes, with current code I'd expect arcstats.size to go down to arcstats.c when there is no I/O. arc_reclaim_thread should do that. > At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max. Or at least I thought it did. > > In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs. So I'm more than willing to believe I'm confused. Any insights are greatly appreciated. ARC is a ZFS private cache. ZFS doesn't use unified buffer/page cache. So ARC is not directly affected by pagedaemon. But this is not exactly VFS layer thing. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 17:24:36 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 303E31065673; Tue, 28 Sep 2010 17:24:36 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 744F88FC14; Tue, 28 Sep 2010 17:24:33 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA08370; Tue, 28 Sep 2010 20:24:21 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA224C5.8000202@icyb.net.ua> Date: Tue, 28 Sep 2010 20:24:21 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Ben Kelly References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> In-Reply-To: <4CA22337.2010900@icyb.net.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 17:24:36 -0000 on 28/09/2010 20:17 Andriy Gapon said the following: > on 28/09/2010 19:46 Ben Kelly said the following: >> If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. > > Well, your system is a bit old indeed. > And the branch is unknown, so I can't really see what sources you have. Apologies, missed "head" in your description of the system. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 18:37:20 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DD2F106566B for ; Tue, 28 Sep 2010 18:37:20 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id A79368FC16 for ; Tue, 28 Sep 2010 18:37:19 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8SIDRLF015692 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 28 Sep 2010 21:13:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o8SIDRIr087366; Tue, 28 Sep 2010 21:13:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8SIDRM3087365; Tue, 28 Sep 2010 21:13:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 28 Sep 2010 21:13:27 +0300 From: Kostik Belousov To: Andriy Gapon Message-ID: <20100928181327.GS43070@deviant.kiev.zoral.com.ua> References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="v6YRErhBvjoBjrPV" Content-Disposition: inline In-Reply-To: <4CA22337.2010900@icyb.net.ua> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_05, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 18:37:20 -0000 --v6YRErhBvjoBjrPV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Sep 28, 2010 at 08:17:43PM +0300, Andriy Gapon wrote: > ARC is a ZFS private cache. > ZFS doesn't use unified buffer/page cache. > So ARC is not directly affected by pagedaemon. > But this is not exactly VFS layer thing. As a pure speculation, unbacked by any code reasing or understanding of the principles. Can ARC be changed to use some custom vm pager instead of managing memory on its own. As I understand it, ARC uses wired kernel mappings right now. If it starts using managed pages backed by a new pager, then pagedaemon might take actual decisions on the cache shrink by putting and reclaiming pages. Does ARC has some `active' count for the caching unit ? It might be translated to the active count for the page etc. Did I said that this is Pure Speculation ? Seems so, right at the beginning. --v6YRErhBvjoBjrPV Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkyiMEcACgkQC3+MBN1Mb4iyIwCghq2eRbNL1kxbdsWjRcijVT3e WH4An0aCYQpyzr3sawdW5TTcA6Lzjtpc =wO8m -----END PGP SIGNATURE----- --v6YRErhBvjoBjrPV-- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 18:40:07 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E64A9106566C; Tue, 28 Sep 2010 18:40:06 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 9EB6F8FC1D; Tue, 28 Sep 2010 18:40:06 +0000 (UTC) Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152]) (authenticated bits=0) by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SIdx1R028419 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Sep 2010 18:40:00 GMT (envelope-from ben@wanderview.com) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Ben Kelly In-Reply-To: <4CA22337.2010900@icyb.net.ua> Date: Tue, 28 Sep 2010 14:40:00 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1 Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 18:40:07 -0000 On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote: > on 28/09/2010 19:46 Ben Kelly said the following: >> Hmm. My server is currently idle with no I/O happening: >>=20 >> kstat.zfs.misc.arcstats.c: 25165824 >> kstat.zfs.misc.arcstats.c_max: 46137344 >> kstat.zfs.misc.arcstats.size: 91863156 >>=20 >> If what you say is true, this shouldn't happen, should it? This = system is an i386 machine with kmem max at 800M and arc set to 40M. = This is running head from April 6, 2010, so it is a bit old, though. >=20 > Well, your system is a bit old indeed. > And the branch is unknown, so I can't really see what sources you = have. > And I am not sure if I'll be able to say anything about those sources. Quite old. I've been intending to update, but haven't found the time = lately. I'll try to do the upgrade this weekend and see if it changes = anything. > As to the numbers - yes, with current code I'd expect arcstats.size to = go down to > arcstats.c when there is no I/O. arc_reclaim_thread should do that. Thats what I thought as well, but when I debugged it a year or two ago I = found that the buffers were still referenced and thus could not be = reclaimed. As far as I can remember they needed a vfs/vnops like = zfs_vnops_inactive or zfs_vnops_reclaim to be executed in order to free = the reference. What is responsible for making those calls? >=20 >> At one point I had patches running on my system that triggered the = pagedaemon based on arc load and it did allow me to keep my arc below = the max. Or at least I thought it did. >>=20 >> In any case, I've never really been able to wrap my head around the = VFS layer and how it interacts with zfs. So I'm more than willing to = believe I'm confused. Any insights are greatly appreciated. >=20 > ARC is a ZFS private cache. > ZFS doesn't use unified buffer/page cache. > So ARC is not directly affected by pagedaemon. > But this is not exactly VFS layer thing. Can you explain the difference in how the vfs/vnode operations are = called or used for those two situations? I thought that the buffer cache was used by filesystems to implement = these operations. So that the buffer cache was below the vfs/vnops = layer. So while zfs implemented its operations in terms of the arc, = things like UFS implemented vfs/vnops in terms of the buffer cache. I = thought the layers further up the chain like the page daemon did not = distinguish that much between these two implementation due to the VFS = interface layer. (Although there seems to be a layering violation in = that the buffer cache signals directly to the upper page daemon layer to = trigger page reclamation.) The old (ancient) patch I tried previously to help reduce the arc = working set and allow it to shrink is here: http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff Unfortunately, there are a couple ideas on fighting fragmentation mixed = into that patch. See the part about arc_reclaim_pages(). This patch = did seem to allow my arc to stay under the target maximum even when = under load that previously caused the system to exceed the maximum. = When I update this weekend I'll try a stripped down version of the patch = to see if it helps or not with the latest zfs. Thanks for your help in understanding this stuff! - Ben= From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 20:19:21 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86D271065670 for ; Tue, 28 Sep 2010 20:19:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 24B8C8FC0A for ; Tue, 28 Sep 2010 20:19:20 +0000 (UTC) Received: from besplex.bde.org (c122-107-116-249.carlnfd1.nsw.optusnet.com.au [122.107.116.249]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o8SKJAGf002392 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Sep 2010 06:19:18 +1000 Date: Wed, 29 Sep 2010 06:19:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20100929031825.L683@besplex.bde.org> Message-ID: <20100929054826.E797@besplex.bde.org> References: <20100929031825.L683@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 20:19:21 -0000 On Wed, 29 Sep 2010, Bruce Evans wrote: > For benchmarks on ext2fs: > > Under FreeBSD-~5.2 rerun today: > untar: 59.17 real > tar: 19.52 real > > Under -current run today: > untar: 101.16 real > tar: 172.03 real > > So, -current is 8.8 times slower for tar, but only 1.7 times slower for > untar. > ... > dumpe2fs seems to show a bizarre layout: > % ... > % Group 3: (Blocks 98304-131071) > % Backup superblock at 98304, Group descriptors at 98305-98305 > % Block bitmap at 98306 (+2), Inode bitmap at 98307 (+3) > % Inode table at 98308-98816 (+4) > % 6882 free blocks, 16288 free inodes, 0 directories > % Free blocks: 123207, 123209-123215, 123217-123223, 123225-123231, > 123233-123239, 123241-123247, ... > > The last line was about 15000 characters long, and seems to have the > following > pattern except for the first free block: > > 1 block used (12208) > 7 blocks free (123209-123215) > 1 block used (12216) > 7 blocks free (123217-123223) > 1 block used ... > 7 blocks free ... > > So it seems that only 1 block in every 8 is used, and there is a seek > after every block. This asks for an 8-fold reduction in throughput, > and it seems to have got that and a bit more for reading although not > for writing. Even (or especially) with perfect hardware, it must give > an 8-fold reduction. And it is likely to give more, since it defeats > vfs clustering by making all runs of contiguous blocks have length 1. > > Simple sequential allocation should be used unless the allocation policy > and implementation are very good. This work a bit better after zapping the 8-fold way: % Index: ext2_alloc.c % =================================================================== % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v % retrieving revision 1.2 % diff -u -2 -r1.2 ext2_alloc.c % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2 % +++ ext2_alloc.c 28 Sep 2010 19:12:46 -0000 % @@ -1,2 +1,5 @@ % +int bde_blkpref = 0; % +int bde_alloc8 = 1; % + % /*- % * modified for Lites 1.1 % @@ -542,6 +545,12 @@ % then set the goal to what we thought it should be % */ % +if (bde_blkpref == 0) { % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) % return ip->i_next_alloc_goal; % +} else if (bde_blkpref == 1) { % + if(ip->i_next_alloc_block == lbn) % + return ip->i_next_alloc_goal; % +} else % + return 0; % % /* now check whether we were provided with an array that basically % @@ -662,4 +671,5 @@ % * block. % */ % +if (bde_alloc8 == 0) { % if (bpref) % start = dtogd(fs, bpref) / NBBY; % @@ -679,4 +689,5 @@ % } % } % +} % % bno = ext2_mapsearch(fs, bbp, bpref); This gives an improvement of: untar: 101.16 real -> 63.46 tar: 172.03 real -> 50.70 Now -current is only 1.1 times slower for untar and 2.6 times slower for tar. There must be a problem with bpref for things to have been so bad. There is some point to leaving a gap of 7 blocks for expansion, but the gap was left even between blocks in a single file. I don't have a userland program for displaying the layout produced by ext2fs, but I have kernel printfs for it several foofs_bmaparray() functions. Turning this on for ext2fs gives for 3 files: % ino 231895: size 99982(25), lbn 0, bn 3704960-3704967, indir 913288-913295, runp 0 % ino 231895: size 99982(25), lbn 1, bn 913200-913287, indir 913288-913295, runp 10 % ino 231895: size 99982(25), lbn 12, bn 913296-913399, indir 913288-913295, runp 12 25 is the number of 4K blocks. These should be allocated contiguously, except for an indirect block in the middle. (ffs also gets this wrong, by allocating the indirect block far away.) The above and the below show bn's for the lbn 0's all nearby. Then in all cases, the bn for lbn 1 is far away. For lbn1-lbn, the allocation is perfectly contiguous, except for the indirect block in the correct place in the middle. % ino 231880: size 82877(21), lbn 0, bn 3704848-3704855, indir 912224-912231, runp 0 % ino 231880: size 82877(21), lbn 1, bn 912136-912223, indir 912224-912231, runp 10 % ino 231880: size 82877(21), lbn 12, bn 912232-912303, indir 912224-912231, runp 8 % ino 231881: size 82343(21), lbn 0, bn 3704856-3704863, indir 912392-912399, runp 0 % ino 231881: size 82343(21), lbn 1, bn 912304-912391, indir 912392-912399, runp 10 % ino 231881: size 82343(21), lbn 12, bn 912400-912471, indir 912392-912399, runp 8 Same pattern for all files examined. The last 2 have sequential ino's and were probably created sequentially. Everything is perfectly sequential except for jumping back and forth between lbn0 and lbn1. Perhaps bpref (and/or the 'goal' variable) is working as intended to keep the lbn0's together, but something fails so the bn's for all other lbn's are allocated sequentially starting from the beginning of the disk (912K is much smaller than 3704K). Cylinder groups can't be working right either. I haven't tried the bde_blkpref hack in the above. It should kill bpref completely so that there is no jump between lbn0 and lbn1, and break cylinder group based allocation even better. Setting bde_blkpref to 1 restores the bug that was present in ext2fs in FreeBSD between 1995 and 2010. This bug gave seqential allocation starting at the beginning of the disk in almost all cases, so map searches were slow and early groups filled up before later groups were used at all. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 20:25:52 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CB35A106566C for ; Tue, 28 Sep 2010 20:25:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 4EEB18FC17 for ; Tue, 28 Sep 2010 20:25:51 +0000 (UTC) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o8SHq28I030607 for ; Wed, 29 Sep 2010 03:52:02 +1000 Received: from besplex.bde.org (c122-107-116-249.carlnfd1.nsw.optusnet.com.au [122.107.116.249]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o8SHpwQ3002339 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 29 Sep 2010 03:52:00 +1000 Date: Wed, 29 Sep 2010 03:51:58 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: fs@freebsd.org Message-ID: <20100929031825.L683@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 20:25:52 -0000 For benchmarks on ext2fs: Under FreeBSD-~5.2 rerun today: untar: 59.17 real tar: 19.52 real Under -current run today: untar: 101.16 real tar: 172.03 real So, -current is 8.8 times slower for tar, but only 1.7 times slower for untar. FreeBSD-~5.2 is my version of FreeBSD-5.2-CURRENT-old, which has significant changes in ext2fs which make it a few percent faster (real) and a few percent slower (sys) by using the BSD buffer cache instead of a private cache for inodes). I committed most of my changes to ext2fs except the ones that made it slower. More details: the untar benchmark copies about 400 MB of sources from a large subset of /usr/src to a freshly mkfs.ext2'd and mounted file system using 2 tars in a pipe (ends up with 488828 1K-blocks used on ext2fs with 4K-blocks). The source is supposed to be cached, so that the untar is almost from memory. The untar benchmark unmounts the file system, remounts it, and tars up its contents to /dev/zero. This benchmark was originally mainly for finding fs layout problems. In fact, it was originally for figuring out why ext2fs was faster than ffs in 1997 (*). Since the tar part of it is not much affected by caching, its results are much easier to reproduce than for the tar benchmark. Slowness in it normally that the fs layout is bad, and that shouldn't happen for a freshly laid out file system. (*) This turned out to be because the ext2fs layout policy was completely broken (essentially sequential, ignoring cylinder groups), but this was actually an optimization for the relatively small file sets tested by the benchmark (even smaller then), when combined with lack of caching in my disk drive -- the drive was very slow for even small seeks, and the broken allocation policy accidentally avoided lots of small seeks, while ffs's fancier policy tends to generate too many of them. Rawer results with all relevant possible fs parameters: FreeBSD-~5.2: %%% ext2fs-1024-1024: tarcp /f srcs: 68.85 real 0.35 user 7.15 sys tar cf /dev/zero srcs: 22.36 real 0.15 user 4.90 sys ext2fs-1024-1024-as: tarcp /f srcs: 46.00 real 0.27 user 6.23 sys tar cf /dev/zero srcs: 22.89 real 0.08 user 4.94 sys ext2fs-4096-4096: tarcp /f srcs: 59.17 real 0.22 user 5.89 sys tar cf /dev/zero srcs: 19.52 real 0.12 user 2.13 sys ext2fs-4096-4096-as: tarcp /f srcs: 37.73 real 0.22 user 4.94 sys tar cf /dev/zero srcs: 19.40 real 0.19 user 2.05 sys %%% ext2fs-1024-1024 means ext2fs with 1024-blocks and 1024-frags, and the -as suffix means an async mount, etc. tarcp is 2 tars in a pipe (untar). FreeBSD-current: %%% ext2fs-1024-1024: tarcp /f srcs: 130.18 real 0.26 user 6.39 sys tar cf /dev/zero srcs: 73.90 real 0.15 user 2.30 sys ext2fs-1024-1024-as: tarcp /f srcs: 98.22 real 0.30 user 6.38 sys tar cf /dev/zero srcs: 70.36 real 0.13 user 2.29 sys ext2fs-4096-4096: tarcp /f srcs: 101.16 real 0.33 user 5.04 sys tar cf /dev/zero srcs: 172.03 real 0.13 user 1.26 sys ext2fs-4096-4096-as: tarcp /f srcs: 78.23 real 0.21 user 5.09 sys tar cf /dev/zero srcs: 147.87 real 0.15 user 1.23 sys %%% The benchmark also prints the i/o counts using mount -v. This is broken in -current, so it is not easy to see if there are too many i/o's. I guess the problem is mainly a bad layout policy, since the efficiency of the tar step doesn't depend much on the layout. Testing under ~5.2 confirms this: for the file system left at the end of the above run, but tarred up by ~5.2 after reboot %%% tar cf /dev/zero srcs: 151.88 real 0.14 user 2.30 sys %%% So -current is actually 1.03 times faster, not 8.8 times slower, for tar :-/. dumpe2fs seems to show a bizarre layout: % Filesystem volume name: % Last mounted on: % Filesystem UUID: a792ae57-2438-4e78-bad6-4ef939fde0df % Filesystem magic number: 0xEF53 % Filesystem revision #: 1 (dynamic) % Filesystem features: filetype sparse_super % Default mount options: (none) % Filesystem state: not clean % Errors behavior: Continue % Filesystem OS type: unknown % Inode count: 1531072 % Block count: 3058374 % Reserved block count: 152918 % Free blocks: 2888113 % Free inodes: 1498688 % First block: 0 % Block size: 4096 % Fragment size: 4096 % Blocks per group: 32768 % Fragments per group: 32768 % Inodes per group: 16288 % Inode blocks per group: 509 % Filesystem created: Wed Sep 29 02:16:32 2010 % Last mount time: n/a % Last write time: Wed Sep 29 03:15:24 2010 % Mount count: 0 % Maximum mount count: 28 % Last checked: Wed Sep 29 02:16:32 2010 % Check interval: 15552000 (6 months) % Next check after: Mon Mar 28 03:16:32 2011 % Reserved blocks uid: 0 (user root) % Reserved blocks gid: 0 (group wheel) % First inode: 11 % Inode size: 128 % Default directory hash: tea % Directory Hash Seed: 036f029e-7924-4a73-91ec-730fd18e832d % % % Group 0: (Blocks 0-32767) % Primary superblock at 0, Group descriptors at 1-1 % Block bitmap at 2 (+2), Inode bitmap at 3 (+3) % Inode table at 4-512 (+4) % 0 free blocks, 16277 free inodes, 2 directories % Free blocks: % Free inodes: 12-16288 % Group 1: (Blocks 32768-65535) % Backup superblock at 32768, Group descriptors at 32769-32769 % Block bitmap at 32770 (+2), Inode bitmap at 32771 (+3) % Inode table at 32772-33280 (+4) % 0 free blocks, 16288 free inodes, 0 directories % Free blocks: % Free inodes: 16289-32576 % Group 2: (Blocks 65536-98303) % Block bitmap at 65536 (+0), Inode bitmap at 65537 (+1) % Inode table at 65538-66046 (+2) % 32257 free blocks, 16288 free inodes, 0 directories % Free blocks: 66047-98303 % Free inodes: 32577-48864 % Group 3: (Blocks 98304-131071) % Backup superblock at 98304, Group descriptors at 98305-98305 % Block bitmap at 98306 (+2), Inode bitmap at 98307 (+3) % Inode table at 98308-98816 (+4) % 6882 free blocks, 16288 free inodes, 0 directories % Free blocks: 123207, 123209-123215, 123217-123223, 123225-123231, 123233-123239, 123241-123247, ... The last line was about 15000 characters long, and seems to have the following pattern except for the first free block: 1 block used (12208) 7 blocks free (123209-123215) 1 block used (12216) 7 blocks free (123217-123223) 1 block used ... 7 blocks free ... So it seems that only 1 block in every 8 is used, and there is a seek after every block. This asks for an 8-fold reduction in throughput, and it seems to have got that and a bit more for reading although not for writing. Even (or especially) with perfect hardware, it must give an 8-fold reduction. And it is likely to give more, since it defeats vfs clustering by making all runs of contiguous blocks have length 1. Simple sequential allocation should be used unless the allocation policy and implementation are very good. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 21:31:17 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7A6D1065673; Tue, 28 Sep 2010 21:31:17 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C35838FC13; Tue, 28 Sep 2010 21:31:15 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA11500; Wed, 29 Sep 2010 00:31:00 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P0hlL-0000Hk-Qa; Wed, 29 Sep 2010 00:30:59 +0300 Message-ID: <4CA25E92.4060904@icyb.net.ua> Date: Wed, 29 Sep 2010 00:30:58 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Ben Kelly References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 21:31:17 -0000 on 28/09/2010 21:40 Ben Kelly said the following: > > On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote: > >> on 28/09/2010 19:46 Ben Kelly said the following: >>> Hmm. My server is currently idle with no I/O happening: >>> >>> kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max: >>> 46137344 kstat.zfs.misc.arcstats.size: 91863156 >>> >>> If what you say is true, this shouldn't happen, should it? This system >>> is an i386 machine with kmem max at 800M and arc set to 40M. This is >>> running head from April 6, 2010, so it is a bit old, though. >> >> Well, your system is a bit old indeed. And the branch is unknown, so I >> can't really see what sources you have. And I am not sure if I'll be able >> to say anything about those sources. > > Quite old. I've been intending to update, but haven't found the time lately. > I'll try to do the upgrade this weekend and see if it changes anything. > >> As to the numbers - yes, with current code I'd expect arcstats.size to go >> down to arcstats.c when there is no I/O. arc_reclaim_thread should do >> that. > > Thats what I thought as well, but when I debugged it a year or two ago I > found that the buffers were still referenced and thus could not be reclaimed. > As far as I can remember they needed a vfs/vnops like zfs_vnops_inactive or > zfs_vnops_reclaim to be executed in order to free the reference. What is > responsible for making those calls? It's time that we should start showing each other places in code :) Because I don't think that that's how the code work. E.g. I look at how zfs_read() calls dmu_read_uio() which calls dmu_buf_hold_array() and dmu_buf_rele_array() around uimove() call. >From what I see, dmu_buf_hold_array() calls dmu_buf_hold_array_by_dnode() calls dbuf_hold() calls arc_buf_add_ref() or arc_buf_alloc(). And conversely, dmu_buf_rele_array() calls dbuf_rele() calls arc_buf_remove_ref(). So, I am quite sure that ARC buffers are held/referenced only during ongoing I/O to or from them. Perhaps, on the other hand, you had in mind life-cycle of other things (not ARC buffers) that are accounted against ARC size (with type ARC_SPACE_OTHER)? Such as e.g. dmu_buf_impl_t-s allocated in dbuf_create(). I have to admit that I haven't investigated behavior of that part of ARC-assigned memory. It's only a small proportion (~10%) of the whole ARC size on my systems. >>> At one point I had patches running on my system that triggered the >>> pagedaemon based on arc load and it did allow me to keep my arc below the >>> max. Or at least I thought it did. >>> >>> In any case, I've never really been able to wrap my head around the VFS >>> layer and how it interacts with zfs. So I'm more than willing to believe >>> I'm confused. Any insights are greatly appreciated. >> >> ARC is a ZFS private cache. ZFS doesn't use unified buffer/page cache. So >> ARC is not directly affected by pagedaemon. But this is not exactly VFS >> layer thing. > > Can you explain the difference in how the vfs/vnode operations are called or > used for those two situations? They are called exactly the same. VFS layer and code above it are not aware of FS implementation details. > I thought that the buffer cache was used by filesystems to implement these > operations. So that the buffer cache was below the vfs/vnops layer. So Buffer cache works as part of unified VM and its buffers use the same pages as page cache does. > while zfs implemented its operations in terms of the arc, things like UFS > implemented vfs/vnops in terms of the buffer cache. I thought the layers Yes. Filesystems like UFS are "sandwiched" between buffer cache and page cache, which work in concert. Also, they don't (have to) implement their own buffer/page caching policies, because it's all managed by unified VM system. On the contrary, ZFS has its own private cache. So, first of all, its data may be cached in two places at once - page cache and ARC. And, because of that, some assumptions of the higher level code get violated, so ZFS has to jump through the hoops to meet those assumptions (e.g. see UIO_NOCOPY). > further up the chain like the page daemon did not distinguish that much > between these two implementation due to the VFS interface layer. (Although Right, but see above. > there seems to be a layering violation in that the buffer cache signals > directly to the upper page daemon layer to trigger page reclamation.) Umm, not sure if that is a fact. > The old (ancient) patch I tried previously to help reduce the arc working set > and allow it to shrink is here: > > http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff > > Unfortunately, there are a couple ideas on fighting fragmentation mixed into > that patch. See the part about arc_reclaim_pages(). This patch did seem to > allow my arc to stay under the target maximum even when under load that > previously caused the system to exceed the maximum. When I update this > weekend I'll try a stripped down version of the patch to see if it helps or > not with the latest zfs. > > Thanks for your help in understanding this stuff! The patch seems good, especially the part about taking into account the kmem fragmentation. But it also seems to be heavily tuned towards "tiny ARC" systems like yours, so I am not sure yet how suitable it is for "mainstream" systems. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 22:01:27 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F036B106566C; Tue, 28 Sep 2010 22:01:27 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 6EE088FC0A; Tue, 28 Sep 2010 22:01:27 +0000 (UTC) Received: from xykon.in.wanderview.com (xykon.in.wanderview.com [10.76.10.152]) (authenticated bits=0) by mail.wanderview.com (8.14.4/8.14.4) with ESMTP id o8SM1LVX031742 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Sep 2010 22:01:21 GMT (envelope-from ben@wanderview.com) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Ben Kelly In-Reply-To: <4CA25E92.4060904@icyb.net.ua> Date: Tue, 28 Sep 2010 18:01:21 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com> References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> <4CA25E92.4060904@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) X-Spam-Score: -1.01 () ALL_TRUSTED,T_RP_MATCHES_RCVD X-Scanned-By: MIMEDefang 2.67 on 10.76.20.1 Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 22:01:28 -0000 On Sep 28, 2010, at 5:30 PM, Andriy Gapon wrote: << snipped lots of good info here... probably won't have time to look at = it in detail until the weekend >> >> there seems to be a layering violation in that the buffer cache = signals >> directly to the upper page daemon layer to trigger page reclamation.) >=20 > Umm, not sure if that is a fact. I was referring to the code in vfs_bio.c that used to twiddle = vm_pageout_deficit directly. That seems to have been replaced with a = call to vm_page_grab(). >> The old (ancient) patch I tried previously to help reduce the arc = working set >> and allow it to shrink is here: >>=20 >> http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff >>=20 >> Unfortunately, there are a couple ideas on fighting fragmentation = mixed into >> that patch. See the part about arc_reclaim_pages(). This patch did = seem to >> allow my arc to stay under the target maximum even when under load = that >> previously caused the system to exceed the maximum. When I update = this >> weekend I'll try a stripped down version of the patch to see if it = helps or >> not with the latest zfs. >>=20 >> Thanks for your help in understanding this stuff! >=20 > The patch seems good, especially the part about taking into account = the kmem > fragmentation. But it also seems to be heavily tuned towards "tiny = ARC" systems > like yours, so I am not sure yet how suitable it is for "mainstream" = systems. Thanks. Yea, there is a lot of aggressive tuning there. In particular, = the slow growth algorithm is somewhat dubious. What I found, though, = was that the fragmentation jumped whenever the arc was reduced in size, = so it was an attempt to make the size slowly approach peak load without = overshooting. A better long term solution would probably be to enhance UMA to support = custom slab sizes on a zone-by-zone basis. That way all zfs/arc = allocations can use slabs of 128k (at a memory efficiency penalty of = course). I prototyped this with a dumbed down block pool allocator at = one point and was able to avoid most, if not all, of the fragmentation. = Adding the support to UMA seemed non-trivial, though. Thanks again for the information. I hope to get a chance to look at the = code this weekend. - Ben= From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 22:22:59 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F401A106566B; Tue, 28 Sep 2010 22:22:58 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0E61D8FC08; Tue, 28 Sep 2010 22:22:57 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA12308; Wed, 29 Sep 2010 01:22:45 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P0iZR-0000Lq-6i; Wed, 29 Sep 2010 01:22:45 +0300 Message-ID: <4CA26AB4.3050108@icyb.net.ua> Date: Wed, 29 Sep 2010 01:22:44 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Ben Kelly References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> <4CA25E92.4060904@icyb.net.ua> <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com> In-Reply-To: <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 22:22:59 -0000 on 29/09/2010 01:01 Ben Kelly said the following: > Thanks. Yea, there is a lot of aggressive tuning there. In particular, the > slow growth algorithm is somewhat dubious. What I found, though, was that > the fragmentation jumped whenever the arc was reduced in size, so it was an > attempt to make the size slowly approach peak load without overshooting. > > A better long term solution would probably be to enhance UMA to support > custom slab sizes on a zone-by-zone basis. That way all zfs/arc allocations > can use slabs of 128k (at a memory efficiency penalty of course). I > prototyped this with a dumbed down block pool allocator at one point and was > able to avoid most, if not all, of the fragmentation. Adding the support to > UMA seemed non-trivial, though. BTW, have you seen my posts about UMA and ZFS on hackers@ ? I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing size of per-CPU caches for the zones with large-sized items. I further modified the code in my local tree to completely disable per-CPU caches for items > 32KB. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 28 23:15:02 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC29C106566B for ; Tue, 28 Sep 2010 23:15:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 78F678FC2E for ; Tue, 28 Sep 2010 23:15:01 +0000 (UTC) Received: from besplex.bde.org (c122-107-116-249.carlnfd1.nsw.optusnet.com.au [122.107.116.249]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o8SNEvp4006110 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Sep 2010 09:14:58 +1000 Date: Wed, 29 Sep 2010 09:14:57 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20100929054826.E797@besplex.bde.org> Message-ID: <20100929084801.M948@besplex.bde.org> References: <20100929031825.L683@besplex.bde.org> <20100929054826.E797@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2010 23:15:02 -0000 On Wed, 29 Sep 2010, Bruce Evans wrote: > On Wed, 29 Sep 2010, Bruce Evans wrote: > >> For benchmarks on ext2fs: >> >> Under FreeBSD-~5.2 rerun today: >> untar: 59.17 real >> tar: 19.52 real >> >> Under -current run today: >> untar: 101.16 real >> tar: 172.03 real >> >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for >> untar. >> ... >> So it seems that only 1 block in every 8 is used, and there is a seek >> after every block. This asks for an 8-fold reduction in throughput, >> and it seems to have got that and a bit more for reading although not >> for writing. Even (or especially) with perfect hardware, it must give >> an 8-fold reduction. And it is likely to give more, since it defeats >> vfs clustering by making all runs of contiguous blocks have length 1. >> >> Simple sequential allocation should be used unless the allocation policy >> and implementation are very good. > > This work a bit better after zapping the 8-fold way: Things > ... > This gives an improvement of: > > untar: 101.16 real -> 63.46 > tar: 172.03 real -> 50.70 > > Now -current is only 1.1 times slower for untar and 2.6 times slower for > tar. > > There must be a problem with bpref for things to have been so bad. There > is some point to leaving a gap of 7 blocks for expansion, but the gap was > left even between blocks in a single file. > ... > I haven't tried the bde_blkpref hack in the above. It should kill bpref > completely so that there is no jump between lbn0 and lbn1, and break > cylinder group based allocation even better. Setting bde_blkpref to 1 > restores the bug that was present in ext2fs in FreeBSD between 1995 and > 2010. This bug gave seqential allocation starting at the beginning of > the disk in almost all cases, so map searches were slow and early groups > filled up before later groups were used at all. Tried this (patch repeated below), and it gave essentially the same speed as old versions. The main problem seems to be that the `goal' variables aren't initialized. After restoring bits verbatim from an old version, things seem to work as expected: % Index: ext2_alloc.c % =================================================================== % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v % retrieving revision 1.2 % diff -u -2 -r1.2 ext2_alloc.c % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2 % +++ ext2_alloc.c 28 Sep 2010 21:08:42 -0000 % @@ -1,2 +1,5 @@ % +int bde_blkpref = 0; % +int bde_alloc8 = 0; % + % /*- % * modified for Lites 1.1 % @@ -117,4 +120,8 @@ % ext2_alloccg); % if (bno > 0) { % + /* set next_alloc fields as done in block_getblk */ % + ip->i_next_alloc_block = lbn; % + ip->i_next_alloc_goal = bno; % + % ip->i_blocks += btodb(fs->e2fs_bsize); % ip->i_flag |= IN_CHANGE | IN_UPDATE; The only things that changed recently in this block were the 4 deleted lines and 4 lines with tabs corrupted to spaces. Perhaps an editing error. % @@ -542,6 +549,12 @@ % then set the goal to what we thought it should be % */ % +if (bde_blkpref == 0) { % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) % return ip->i_next_alloc_goal; % +} else if (bde_blkpref == 1) { % + if(ip->i_next_alloc_block == lbn) % + return ip->i_next_alloc_goal; % +} else % + return 0; % % /* now check whether we were provided with an array that basically Not needed now. % @@ -662,4 +675,5 @@ % * block. % */ % +if (bde_alloc8 == 0) { % if (bpref) % start = dtogd(fs, bpref) / NBBY; % @@ -679,4 +693,5 @@ % } % } % +} % % bno = ext2_mapsearch(fs, bbp, bpref); The code to skip to the next 8-block boundary should be removed permanently. After fixing the initialization, it doesn't generate holes inside files but it still generates holes between files. The holes are quite large with 4K-blocks. Benchmark results with just the initialization of `goal' variables restored: %%% ext2fs-1024-1024: tarcp /f srcs: 78.79 real 0.31 user 4.94 sys tar cf /dev/zero srcs: 24.62 real 0.19 user 1.82 sys ext2fs-1024-1024-as: tarcp /f srcs: 52.07 real 0.26 user 4.95 sys tar cf /dev/zero srcs: 24.80 real 0.10 user 1.93 sys ext2fs-4096-4096: tarcp /f srcs: 74.14 real 0.34 user 3.96 sys tar cf /dev/zero srcs: 33.82 real 0.10 user 1.19 sys ext2fs-4096-4096-as: tarcp /f srcs: 53.54 real 0.36 user 3.87 sys tar cf /dev/zero srcs: 33.91 real 0.14 user 1.15 sys %%% The much larger holes between the files are apparently responsible for the decreased speed with 4K-blocks. 1K-blocks are really too small, so 4K-blocks should be faster. Benchmark results with the fix and bde_alloc8 = 1. ext2fs-1024-1024: tarcp /f srcs: 71.60 real 0.15 user 2.04 sys tar cf /dev/zero srcs: 22.34 real 0.05 user 0.79 sys ext2fs-1024-1024-as: tarcp /f srcs: 46.03 real 0.14 user 2.02 sys tar cf /dev/zero srcs: 21.97 real 0.05 user 0.80 sys ext2fs-4096-4096: tarcp /f srcs: 59.66 real 0.13 user 1.63 sys tar cf /dev/zero srcs: 19.88 real 0.07 user 0.46 sys ext2fs-4096-4096-as: tarcp /f srcs: 37.30 real 0.12 user 1.60 sys tar cf /dev/zero srcs: 19.93 real 0.05 user 0.49 sys Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 00:01:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D560106564A for ; Wed, 29 Sep 2010 00:01:48 +0000 (UTC) (envelope-from scottj75074@yahoo.com) Received: from web110702.mail.gq1.yahoo.com (web110702.mail.gq1.yahoo.com [67.195.13.209]) by mx1.freebsd.org (Postfix) with SMTP id 313388FC15 for ; Wed, 29 Sep 2010 00:01:47 +0000 (UTC) Received: (qmail 32029 invoked by uid 60001); 29 Sep 2010 00:01:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1285718507; bh=s/EOty7iIA36f4lTrWCyHwF+ttMCv0QGH65S0rCsXtc=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=Hq1AYeX3c0EfU2w3XnDlHWHf7IQjl0fKVnRCUmNtNAL3iFOJV3jEfBv/xVzeqprZgfFN49R3BqTkda+JtNEnjYr9b0NSJVAxfer1ZVaTvoCNUmXh7AUxAzGd2xX0jVL28s4IFyvmh/j8nQSMPCNQtquuqV22/F6cJ8YFyxsE6oc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=tW2NDrhQZgDMJtLXORuswSLB90F0IaB8b0EbLrBoFdgclrxpkC98GKvQEXOLbR5uT26nneCZyjIpnIIK68OdPLur336giYRp8D4c9Sw0wGosMGIooSWmulIdy+08XLLwF5WNdNDB2nEj+GbMdnOo4V3xYVKuxzpljqq6ZZghY/E=; Message-ID: <707981.30589.qm@web110702.mail.gq1.yahoo.com> X-YMail-OSG: ta4L74QVM1lbp3oOg83p_MXvzmUNEE.MAGOTy9PqB_HjHYo enSIYlcvnHIN6GZjbERiSkg3KFj1IT5PmxWNc16g1GJtUcslJx0OPL_yhxMJ MlcJYFk0IGIL_FWASOfQGHPuVeJAOtlOMVagA22eJcgPrvJXyFIi0tQ3YGPG 4YixBXNlRrlVy0m6fgnQ23kxHiHf0hcqCB0hVZmn1bt6Ni7jCphPLbBdgJlV wTfae6ItMXy.9O5L2feclM4J8VjsJTJh.OsdqrCYwJ29XvP1T.rGqhBJBGZ9 UJabzBSNbCRYehAh1aGuQb748Y7INisNymzYqYAHI97O3oZ4versM2w-- Received: from [99.189.91.206] by web110702.mail.gq1.yahoo.com via HTTP; Tue, 28 Sep 2010 17:01:47 PDT X-Mailer: YahooMailRC/497 YahooMailWebService/0.8.105.279950 Date: Tue, 28 Sep 2010 17:01:47 -0700 (PDT) From: Scott Johnson To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: zfs+smb checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 00:01:48 -0000 I've been running FBSD 8.0-rel and now 8.1-rel for 6 months or so, with a 4 disk raidz, weekly scrubs. Never saw a checksum error until last week when I got about 5 during operation, followed by another 10 or so on the next scrub. They all occurred when I was accessing the server through SMB from my WinXP desktop. I've been doing this for months, transferring files to and from regularly, but this was the first time I'd been doing simultaneous heavy reading & writing. I was running Imgburn on WinXP, creating an iso file from folders. Both source files and destination iso file were on the same SMB share on the FBSD server. After all the checksum errors, there was 1 unrecoverable error, on one of the destination iso files. There are 0 read errors, 0 write errors, and 0 new SMART errors on the 4 disks. The checksum errors were spread roughly equally across the 4 disks. All of which leads me to believe this is a software problem at the filesystem level. What is my next move for diagnosing and eventually resolving this? From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 01:08:24 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03EDA1065672; Wed, 29 Sep 2010 01:08:24 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8DB6C8FC08; Wed, 29 Sep 2010 01:08:23 +0000 (UTC) Received: by qyk7 with SMTP id 7so434487qyk.13 for ; Tue, 28 Sep 2010 18:08:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=zs8nNjLa8+S++1leXXRprGA1GllZtA/+P7WblUbSuA4=; b=vPMfki3vezl7Cwbcpf+v5Cr8CJRtXDqiOMRCFpaEqDzzvAebWC8nY12GwUAPfTxp3a 4HpHXruhS/UsJc2pYY4kGaFGWqJ0iA+stJkLKDeE7MumEnkGifSCYc+NTNGDkEkKCmAk kv+E4MaVbFnMRvvc8a01S8YXOEduATm2pW5II= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=HXJtpsEkDkLYcIlYB8eyYZdEW7DRWljp2nRTM5YHwVuBWn+KS4WkVebgYZYcXzl3JY 0bQ6O5dejUCIdy5scgoicAQkrDXfxurR/y6VXMKcsvTqT50vOFFtjBB9xgPtpbD8i8Nk QxdPecCfi8qZZJSvCviiRfiq6B42Wky4m0hmk= MIME-Version: 1.0 Received: by 10.220.63.5 with SMTP id z5mr188074vch.105.1285720710225; Tue, 28 Sep 2010 17:38:30 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.220.176.77 with HTTP; Tue, 28 Sep 2010 17:38:30 -0700 (PDT) In-Reply-To: <4CA26AB4.3050108@icyb.net.ua> References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> <4CA25E92.4060904@icyb.net.ua> <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com> <4CA26AB4.3050108@icyb.net.ua> Date: Tue, 28 Sep 2010 17:38:30 -0700 X-Google-Sender-Auth: n2VHTRWuhNlv4iTT-adWLojl3Vg Message-ID: From: Artem Belevich To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: stable@freebsd.org, fs@freebsd.org, Ben Kelly Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 01:08:24 -0000 On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon wrote: > BTW, have you seen my posts about UMA and ZFS on hackers@ ? > I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing > size of per-CPU caches for the zones with large-sized items. > I further modified the code in my local tree to completely disable per-CPU > caches for items > 32KB. Do you have updated patch disabling per-cpu caches for large items? I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050 from -head to compile) and so far things look good. I'll re-enable UMA for ZFS and see how it flies in a couple of days. --Artem From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 04:43:21 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D954A1065674 for ; Wed, 29 Sep 2010 04:43:20 +0000 (UTC) (envelope-from sarawgi.aditya@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 64BB68FC19 for ; Wed, 29 Sep 2010 04:43:19 +0000 (UTC) Received: by fxm9 with SMTP id 9so314881fxm.13 for ; Tue, 28 Sep 2010 21:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=md0dTwv/f8csoKd+A6bpUnt6BePVtdQ4PC+Xyecld3w=; b=YICrY4xgMZzS6WKobl558pPzNp5ZvvebGLvAeopBzSm2udlpF9QlUiNSG52ag4XOcg cjWMGcS4ASDgdaKYOva2xi3GKrZGTexPHXjvU5b0TcwYRboHF1nzHcbBsYwCiaWfsdRe 1LNh8pFASa7G07ApK6f+//pgAIbwI+4HR8yF8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=hXC4RMPTevUkcX1Ck3dz+Fma+GgHRUV5xNUfV8HFSNlgbS7PWbpLCsSuhSeMJj8LPY kx51drIhHYr8jQDQnZtsopA6kSrJ9v4HQXtqL2+Ov9/wP/5AGXkZ3DwTj5S0OyUz5T/K P1us+kKBkCkzbgqMBnorL6jg8uarDOPUqHFmM= Received: by 10.223.104.17 with SMTP id m17mr1068337fao.22.1285733698510; Tue, 28 Sep 2010 21:14:58 -0700 (PDT) Received: from aditya ([183.87.49.235]) by mx.google.com with ESMTPS id k25sm3548185fac.41.2010.09.28.21.14.54 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 28 Sep 2010 21:14:57 -0700 (PDT) Date: Wed, 29 Sep 2010 09:46:55 +0530 From: Aditya Sarawgi To: Bruce Evans Message-ID: <20100929041650.GA1553@aditya> References: <20100929031825.L683@besplex.bde.org> <20100929054826.E797@besplex.bde.org> <20100929084801.M948@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100929084801.M948@besplex.bde.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 04:43:21 -0000 On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote: > On Wed, 29 Sep 2010, Bruce Evans wrote: > > > On Wed, 29 Sep 2010, Bruce Evans wrote: > > > >> For benchmarks on ext2fs: > >> > >> Under FreeBSD-~5.2 rerun today: > >> untar: 59.17 real > >> tar: 19.52 real > >> > >> Under -current run today: > >> untar: 101.16 real > >> tar: 172.03 real > >> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for > >> untar. > >> ... > >> So it seems that only 1 block in every 8 is used, and there is a seek > >> after every block. This asks for an 8-fold reduction in throughput, > >> and it seems to have got that and a bit more for reading although not > >> for writing. Even (or especially) with perfect hardware, it must give > >> an 8-fold reduction. And it is likely to give more, since it defeats > >> vfs clustering by making all runs of contiguous blocks have length 1. > >> > >> Simple sequential allocation should be used unless the allocation policy > >> and implementation are very good. > > > > This work a bit better after zapping the 8-fold way: > Things > > ... > > This gives an improvement of: > > > > untar: 101.16 real -> 63.46 > > tar: 172.03 real -> 50.70 > > > > Now -current is only 1.1 times slower for untar and 2.6 times slower for > > tar. > > > > There must be a problem with bpref for things to have been so bad. There > > is some point to leaving a gap of 7 blocks for expansion, but the gap was > > left even between blocks in a single file. > > ... > > I haven't tried the bde_blkpref hack in the above. It should kill bpref > > completely so that there is no jump between lbn0 and lbn1, and break > > cylinder group based allocation even better. Setting bde_blkpref to 1 > > restores the bug that was present in ext2fs in FreeBSD between 1995 and > > 2010. This bug gave seqential allocation starting at the beginning of > > the disk in almost all cases, so map searches were slow and early groups > > filled up before later groups were used at all. > > Tried this (patch repeated below), and it gave essentially the same > speed as old versions. > > The main problem seems to be that the `goal' variables aren't initialized. > After restoring bits verbatim from an old version, things seem to work as > expected: > > % Index: ext2_alloc.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v > % retrieving revision 1.2 > % diff -u -2 -r1.2 ext2_alloc.c > % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2 > % +++ ext2_alloc.c 28 Sep 2010 21:08:42 -0000 > % @@ -1,2 +1,5 @@ > % +int bde_blkpref = 0; > % +int bde_alloc8 = 0; > % + > % /*- > % * modified for Lites 1.1 > % @@ -117,4 +120,8 @@ > % ext2_alloccg); > % if (bno > 0) { > % + /* set next_alloc fields as done in block_getblk */ > % + ip->i_next_alloc_block = lbn; > % + ip->i_next_alloc_goal = bno; > % + > % ip->i_blocks += btodb(fs->e2fs_bsize); > % ip->i_flag |= IN_CHANGE | IN_UPDATE; > > The only things that changed recently in this block were the 4 deleted > lines and 4 lines with tabs corrupted to spaces. Perhaps an editing > error. > > % @@ -542,6 +549,12 @@ > % then set the goal to what we thought it should be > % */ > % +if (bde_blkpref == 0) { > % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) > % return ip->i_next_alloc_goal; > % +} else if (bde_blkpref == 1) { > % + if(ip->i_next_alloc_block == lbn) > % + return ip->i_next_alloc_goal; > % +} else > % + return 0; > % > % /* now check whether we were provided with an array that basically > > Not needed now. > > % @@ -662,4 +675,5 @@ > % * block. > % */ > % +if (bde_alloc8 == 0) { > % if (bpref) > % start = dtogd(fs, bpref) / NBBY; > % @@ -679,4 +693,5 @@ > % } > % } > % +} > % > % bno = ext2_mapsearch(fs, bbp, bpref); > > The code to skip to the next 8-block boundary should be removed permanently. > After fixing the initialization, it doesn't generate holes inside files but > it still generates holes between files. The holes are quite large with > 4K-blocks. > > Benchmark results with just the initialization of `goal' variables restored: > > %%% > ext2fs-1024-1024: > tarcp /f srcs: 78.79 real 0.31 user 4.94 sys > tar cf /dev/zero srcs: 24.62 real 0.19 user 1.82 sys > ext2fs-1024-1024-as: > tarcp /f srcs: 52.07 real 0.26 user 4.95 sys > tar cf /dev/zero srcs: 24.80 real 0.10 user 1.93 sys > ext2fs-4096-4096: > tarcp /f srcs: 74.14 real 0.34 user 3.96 sys > tar cf /dev/zero srcs: 33.82 real 0.10 user 1.19 sys > ext2fs-4096-4096-as: > tarcp /f srcs: 53.54 real 0.36 user 3.87 sys > tar cf /dev/zero srcs: 33.91 real 0.14 user 1.15 sys > %%% > > The much larger holes between the files are apparently responsible for the > decreased speed with 4K-blocks. 1K-blocks are really too small, so 4K-blocks > should be faster. > > Benchmark results with the fix and bde_alloc8 = 1. > > ext2fs-1024-1024: > tarcp /f srcs: 71.60 real 0.15 user 2.04 sys > tar cf /dev/zero srcs: 22.34 real 0.05 user 0.79 sys > ext2fs-1024-1024-as: > tarcp /f srcs: 46.03 real 0.14 user 2.02 sys > tar cf /dev/zero srcs: 21.97 real 0.05 user 0.80 sys > ext2fs-4096-4096: > tarcp /f srcs: 59.66 real 0.13 user 1.63 sys > tar cf /dev/zero srcs: 19.88 real 0.07 user 0.46 sys > ext2fs-4096-4096-as: > tarcp /f srcs: 37.30 real 0.12 user 1.60 sys > tar cf /dev/zero srcs: 19.93 real 0.05 user 0.49 sys > > Bruce Hi, I see what you are saying. The gap of 8 block between the files is due to the old preallocation which used to allocate additional 8 blocks in advance for a particular inode when allocating a block for it. The gap between blocks of the same file shouldn't be there too. Both of these cases should be removed. I will look into this during this week. The slowness is also due to lack of preallocation in the new code. Thanks Aditya Sarawgi From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 07:25:09 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4C321065670; Wed, 29 Sep 2010 07:25:09 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EFA738FC0A; Wed, 29 Sep 2010 07:25:08 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA19592; Wed, 29 Sep 2010 10:24:51 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1P0r23-00037A-0A; Wed, 29 Sep 2010 10:24:51 +0300 Message-ID: <4CA2E9C2.3030806@icyb.net.ua> Date: Wed, 29 Sep 2010 10:24:50 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Artem Belevich References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> <4CA25E92.4060904@icyb.net.ua> <5BD33772-C0EA-48A9-BE9A-C8FBAF0008D7@wanderview.com> <4CA26AB4.3050108@icyb.net.ua> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org, fs@freebsd.org, Ben Kelly Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 07:25:10 -0000 on 29/09/2010 03:38 Artem Belevich said the following: > On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon wrote: >> BTW, have you seen my posts about UMA and ZFS on hackers@ ? >> I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing >> size of per-CPU caches for the zones with large-sized items. >> I further modified the code in my local tree to completely disable per-CPU >> caches for items > 32KB. > > Do you have updated patch disabling per-cpu caches for large items? > I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050 > from -head to compile) and so far things look good. I'll re-enable UMA > for ZFS and see how it flies in a couple of days. I've just uploaded uma-3.diff. It implements what uma-1.diff did, plus totally skips per-CPU caches for items > 32KB, and also has code from uma-2.diff for flushing per-CPU caches on significant memory shortage. Will appreciate your feedback. Thank you for testing! -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 09:33:56 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87E041065674 for ; Wed, 29 Sep 2010 09:33:56 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from mail.tdx.com (mail.tdx.com [62.13.128.18]) by mx1.freebsd.org (Postfix) with ESMTP id 2BF758FC18 for ; Wed, 29 Sep 2010 09:33:55 +0000 (UTC) Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk [62.13.130.30]) (authenticated bits=0) by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8T9L1ww029415 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO) for ; Wed, 29 Sep 2010 10:21:02 +0100 (BST) Date: Wed, 29 Sep 2010 10:20:22 +0100 From: Karl Pielorz To: freebsd-fs@freebsd.org Message-ID: X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 09:33:56 -0000 Hi All, I moved my machine from FreeBSD 7.2-S/amd64 to 8.1-R/amd64 about a week ago. Since then I've noticed that ZFS just 'hangs' - e.g. it'll work fine for a few days, then a process will get 'hung up' waiting on ZFS. The machine is a Tyan motherboard (dual Opteron, dual cores, w/10Gb of RAM). 7.2-S & ZFS ran perfectly under it. Anything else then that touches the pools, also 'hangs' - in top the original process shows as: " 1927 root 1 44 0 8224K 1544K zio->i 0 0:00 0.00% ls " Anything else that touches the ZFS pools, ends up like: " 2082 root 1 44 0 10284K 2976K zfs 3 0:00 0.00% csh " I saw a while ago a command under 8.1 to get 'more info' for these stuck processes, but can't for the life of me remember it? If someone can give me some pointers to try and track down what's hanging? The drives are spread over two Marvell 88SX6081's. I've tried the mvs driver for that controller, which gave me a bucket load of errors, and data corruption :( Switching back to the standard ATA drivers for that card, I just get hangs :( - nothing is logged on the console, or syslog when this happens. Thanks, -Karl From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 10:24:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A120C10656C3 for ; Wed, 29 Sep 2010 10:24:52 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id 27AAA8FC1D for ; Wed, 29 Sep 2010 10:24:51 +0000 (UTC) Received: from higson.cam.lispworks.com (IDENT:U2FsdGVkX19XpWCdIITMFBIhEBJ0bWFJ98KLHWFUmcA@higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id o8TAOnui064769; Wed, 29 Sep 2010 11:24:49 +0100 (BST) (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com by higson.cam.lispworks.com (8.13.1) id o8TAOn1N013733; Wed, 29 Sep 2010 11:24:49 +0100 Received: (from martin@localhost) by higson.cam.lispworks.com (8.13.1/8.13.1/Submit) id o8TAOnph013730; Wed, 29 Sep 2010 11:24:49 +0100 Date: Wed, 29 Sep 2010 11:24:49 +0100 Message-Id: <201009291024.o8TAOnph013730@higson.cam.lispworks.com> From: Martin Simmons To: freebsd-fs@freebsd.org In-reply-to: (message from Karl Pielorz on Wed, 29 Sep 2010 10:20:22 +0100) References: Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 10:24:52 -0000 >>>>> On Wed, 29 Sep 2010 10:20:22 +0100, Karl Pielorz said: > > I saw a while ago a command under 8.1 to get 'more info' for these stuck > processes, but can't for the life of me remember it? Maybe procstat -k -k $pid is what you are looking for (i.e. a kernel backtrace)? Use -a instead of $pid to get all processes. __Martin From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 10:31:12 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9F37106564A for ; Wed, 29 Sep 2010 10:31:12 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from mail.tdx.com (mail.tdx.com [62.13.128.18]) by mx1.freebsd.org (Postfix) with ESMTP id 4B5E38FC14 for ; Wed, 29 Sep 2010 10:31:11 +0000 (UTC) Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk [62.13.130.30]) (authenticated bits=0) by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8TAV90r035552 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Wed, 29 Sep 2010 11:31:10 +0100 (BST) Date: Wed, 29 Sep 2010 11:30:29 +0100 From: Karl Pielorz To: Martin Simmons , freebsd-fs@freebsd.org Message-ID: <8CF1F1F15531907E2F8DC2A2@HexaDeca64.dmpriest.net.uk> In-Reply-To: <201009291024.o8TAOnph013730@higson.cam.lispworks.com> References: <201009291024.o8TAOnph013730@higson.cam.lispworks.com> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 10:31:12 -0000 --On 29 September 2010 11:24 +0100 Martin Simmons wrote: >> I saw a while ago a command under 8.1 to get 'more info' for these stuck >> processes, but can't for the life of me remember it? > > Maybe procstat -k -k $pid is what you are looking for (i.e. a kernel > backtrace)? Use -a instead of $pid to get all processes. Yup, that's it - thanks! Having run it I get: procstat -k -k 1927 (PID 1927 is the 'ls' that's locked up) PID TID COMM TDNAME KSTACK 1927 100206 ls - mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 zio_wait+0x61 dbuf_read+0x39a dnode_hold_impl+0xe7 dmu_bonus_hold+0x2a zfs_zget+0x227 zfs_dirent_lock+0x4e3 zfs_dirlook+0x69 zfs_lookup+0x1f0 zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf0 VOP_LOOKUP_APV+0x40 lookup+0x40a namei+0x52b kern_statat_vnhook+0x8f kern_statat+0x15 Which will hopefully mean something more to someone here than it does me at the moment ;) -Karl From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 13:26:06 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 878E710656B9 for ; Wed, 29 Sep 2010 13:26:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0947A8FC17 for ; Wed, 29 Sep 2010 13:26:06 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 99CBA46C12; Wed, 29 Sep 2010 09:26:05 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 236898A04E; Wed, 29 Sep 2010 09:26:04 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Wed, 29 Sep 2010 09:17:04 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> In-Reply-To: <20100929041650.GA1553@aditya> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_QxzoMs/Iug8+N80" Message-Id: <201009290917.05269.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 29 Sep 2010 09:26:04 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 13:26:06 -0000 --Boundary-00=_QxzoMs/Iug8+N80 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote: > On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote: > > On Wed, 29 Sep 2010, Bruce Evans wrote: > > > > > On Wed, 29 Sep 2010, Bruce Evans wrote: > > > > > >> For benchmarks on ext2fs: > > >> > > >> Under FreeBSD-~5.2 rerun today: > > >> untar: 59.17 real > > >> tar: 19.52 real > > >> > > >> Under -current run today: > > >> untar: 101.16 real > > >> tar: 172.03 real > > >> > > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for > > >> untar. > > >> ... > > >> So it seems that only 1 block in every 8 is used, and there is a seek > > >> after every block. This asks for an 8-fold reduction in throughput, > > >> and it seems to have got that and a bit more for reading although not > > >> for writing. Even (or especially) with perfect hardware, it must give > > >> an 8-fold reduction. And it is likely to give more, since it defeats > > >> vfs clustering by making all runs of contiguous blocks have length 1. > > >> > > >> Simple sequential allocation should be used unless the allocation policy > > >> and implementation are very good. > > > > > > This work a bit better after zapping the 8-fold way: > > Things > > > ... > > > This gives an improvement of: > > > > > > untar: 101.16 real -> 63.46 > > > tar: 172.03 real -> 50.70 > > > > > > Now -current is only 1.1 times slower for untar and 2.6 times slower for > > > tar. > > > > > > There must be a problem with bpref for things to have been so bad. There > > > is some point to leaving a gap of 7 blocks for expansion, but the gap was > > > left even between blocks in a single file. > > > ... > > > I haven't tried the bde_blkpref hack in the above. It should kill bpref > > > completely so that there is no jump between lbn0 and lbn1, and break > > > cylinder group based allocation even better. Setting bde_blkpref to 1 > > > restores the bug that was present in ext2fs in FreeBSD between 1995 and > > > 2010. This bug gave seqential allocation starting at the beginning of > > > the disk in almost all cases, so map searches were slow and early groups > > > filled up before later groups were used at all. > > > > Tried this (patch repeated below), and it gave essentially the same > > speed as old versions. > > > > The main problem seems to be that the `goal' variables aren't initialized. > > After restoring bits verbatim from an old version, things seem to work as > > expected: > > > > % Index: ext2_alloc.c > > % =================================================================== > > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v > > % retrieving revision 1.2 > > % diff -u -2 -r1.2 ext2_alloc.c > > % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2 > > % +++ ext2_alloc.c 28 Sep 2010 21:08:42 -0000 > > % @@ -1,2 +1,5 @@ > > % +int bde_blkpref = 0; > > % +int bde_alloc8 = 0; > > % + > > % /*- > > % * modified for Lites 1.1 > > % @@ -117,4 +120,8 @@ > > % ext2_alloccg); > > % if (bno > 0) { > > % + /* set next_alloc fields as done in block_getblk */ > > % + ip->i_next_alloc_block = lbn; > > % + ip->i_next_alloc_goal = bno; > > % + > > % ip->i_blocks += btodb(fs->e2fs_bsize); > > % ip->i_flag |= IN_CHANGE | IN_UPDATE; > > > > The only things that changed recently in this block were the 4 deleted > > lines and 4 lines with tabs corrupted to spaces. Perhaps an editing > > error. > > > > % @@ -542,6 +549,12 @@ > > % then set the goal to what we thought it should be > > % */ > > % +if (bde_blkpref == 0) { > > % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) > > % return ip->i_next_alloc_goal; > > % +} else if (bde_blkpref == 1) { > > % + if(ip->i_next_alloc_block == lbn) > > % + return ip->i_next_alloc_goal; > > % +} else > > % + return 0; > > % > > % /* now check whether we were provided with an array that basically > > > > Not needed now. > > > > % @@ -662,4 +675,5 @@ > > % * block. > > % */ > > % +if (bde_alloc8 == 0) { > > % if (bpref) > > % start = dtogd(fs, bpref) / NBBY; > > % @@ -679,4 +693,5 @@ > > % } > > % } > > % +} > > % > > % bno = ext2_mapsearch(fs, bbp, bpref); > > > > The code to skip to the next 8-block boundary should be removed permanently. > > After fixing the initialization, it doesn't generate holes inside files but > > it still generates holes between files. The holes are quite large with > > 4K-blocks. > > > > Benchmark results with just the initialization of `goal' variables restored: > > > > %%% > > ext2fs-1024-1024: > > tarcp /f srcs: 78.79 real 0.31 user 4.94 sys > > tar cf /dev/zero srcs: 24.62 real 0.19 user 1.82 sys > > ext2fs-1024-1024-as: > > tarcp /f srcs: 52.07 real 0.26 user 4.95 sys > > tar cf /dev/zero srcs: 24.80 real 0.10 user 1.93 sys > > ext2fs-4096-4096: > > tarcp /f srcs: 74.14 real 0.34 user 3.96 sys > > tar cf /dev/zero srcs: 33.82 real 0.10 user 1.19 sys > > ext2fs-4096-4096-as: > > tarcp /f srcs: 53.54 real 0.36 user 3.87 sys > > tar cf /dev/zero srcs: 33.91 real 0.14 user 1.15 sys > > %%% > > > > The much larger holes between the files are apparently responsible for the > > decreased speed with 4K-blocks. 1K-blocks are really too small, so 4K-blocks > > should be faster. > > > > Benchmark results with the fix and bde_alloc8 = 1. > > > > ext2fs-1024-1024: > > tarcp /f srcs: 71.60 real 0.15 user 2.04 sys > > tar cf /dev/zero srcs: 22.34 real 0.05 user 0.79 sys > > ext2fs-1024-1024-as: > > tarcp /f srcs: 46.03 real 0.14 user 2.02 sys > > tar cf /dev/zero srcs: 21.97 real 0.05 user 0.80 sys > > ext2fs-4096-4096: > > tarcp /f srcs: 59.66 real 0.13 user 1.63 sys > > tar cf /dev/zero srcs: 19.88 real 0.07 user 0.46 sys > > ext2fs-4096-4096-as: > > tarcp /f srcs: 37.30 real 0.12 user 1.60 sys > > tar cf /dev/zero srcs: 19.93 real 0.05 user 0.49 sys > > > > Bruce > > Hi, > > I see what you are saying. The gap of 8 block between the files > is due to the old preallocation which used to allocate additional > 8 blocks in advance for a particular inode when allocating a block > for it. The gap between blocks of the same file shouldn't be there > too. Both of these cases should be removed. I will look into this > during this week. The slowness is also due to lack of preallocation > in the new code. One of the GSoC students worked on a patch to add preallocation back to ext2fs this summer. Would you be interested in reviewing and/or testing that patch? (I've attached it). Here is his original e-mail: Hi all, There is a patch in attachment which implements a preallocation algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010. This patch implements the in-memory ext2/3 block preallocation algorithm from reservation window. It uses a RB-tree to index block allocation request and reserve a number of blocks for each file which has requested to allocate a block. When a file request to allocate a block, it will find a block to allocate to this file. When it find the block to allocate, it will try to allocate a block, which is in the same cylinder group with inode and is not in other reservation window in RB-tree. Meanwhile there are some contiguous free blocks after this block. It uses a data structure to store this block's position and the length of contiguous free blocks. Then it inserts this data structure into RB-tree. When this file request to allocate a block again, It will find corresponding data structure in RB-tree. If it can find, the next free block will be allocated to this file directly. Otherwise, it will search a new block again. I have run some benchmarks to test this algorithm. Please review it in wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance is better when the number of threads is smaller than 4. When the number of threads is greater than 4, the performance can be increased a little. Please test it. Thanks and best regards, lz -- John Baldwin --Boundary-00=_QxzoMs/Iug8+N80 Content-Type: text/x-patch; charset="UTF-8"; name="ext2fs_prealloc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ext2fs_prealloc.patch" diff -urN /usr/src/sys/fs/ext2fs/ext2_alloc.c new/ext2_alloc.c --- /usr/src/sys/fs/ext2fs/ext2_alloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_alloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -50,6 +50,9 @@ #include #include #include +#include + +#define phy_blk(cg, fs) (((cg) * (fs->e2fs->e2fs_fpg)) + fs->e2fs->e2fs_first_dblock) static daddr_t ext2_alloccg(struct inode *, int, daddr_t, int); static u_long ext2_dirpref(struct inode *); @@ -59,37 +62,524 @@ int)); static daddr_t ext2_nodealloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_mapsearch(struct m_ext2fs *, char *, daddr_t); + +/* For reservation window */ +static u_long ext2_alloc_blk(struct inode *, int, struct buf *, int32_t, struct ext2_rsv_win *); +static int ext2_alloc_new_rsv(struct inode *, int, struct buf *, int32_t); +static int ext2_bpref_in_rsv(struct ext2_rsv_win *, int32_t); +static int ext2_find_rsv(struct ext2_rsv_win *, struct ext2_rsv_win *, + struct m_ext2fs *, int32_t, int); +static void ext2_remove_rsv_win(struct m_ext2fs *, struct ext2_rsv_win *); +static u_long ext2_rsvalloc(struct m_ext2fs *, struct inode *, + int, struct buf *, int32_t, int); +static daddr_t ext2_search_next_block(struct m_ext2fs *, char *, int, int); +static struct ext2_rsv_win *ext2_search_rsv(struct ext2_rsv_win_tree *, int32_t); + +RB_GENERATE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + /* * Allocate a block in the file system. * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. - * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. + * By given preference: + * Check whether inode has a reservation window and preference + * is within it and try to allocate a free block from + * this reservation window. + * If not, traverse RB tree to find a place, which is not in + * any window and insert it to RB tree to try to allocate a + * free block again. + * If it fails, try to allocate a free block in other cylinder + * groups without preference. + */ + +/* + * Allocate a free block. + * + * First check whether reservation window is used. + * If reservation window is used, try to allocate a free + * block from the reservation window. If it fails, traverse + * the bitmap to find a free block. + * If reservation window is not used, try to allocate + * a free block by bpref. If it fails, traverse the bitmap + * to find a free block. */ +static u_long +ext2_alloc_blk(struct inode *ip, int cg, struct buf *bp, + int32_t bpref, struct ext2_rsv_win *rp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + int bno, start, end; + char *bbp; + + fs = ip->i_e2fs; + ump = ip->i_ump; + bbp = (char *)bp->b_data; + + if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0) + return (0); + + if (bpref < 0) + bpref = 0; + + /* Check whether it use reservation window */ + if (rp != NULL) { + /* + * If window's start is not in this cylinder group, + * try to allocate from the beginning, otherwise + * try to allocate from the beginning of the + * window. + */ + if (dtog(fs, rp->rsv_start) < cg) + start = 0; + else + start = rp->rsv_start; + + /* + * If window's end crosses the end of this group, + * set end variable to the end of this group. + * Otherwise, set it to the window's end. + */ + if (dtog(fs, rp->rsv_end) > cg) + end = phy_blk(cg + 1, fs) - 1; + else + end = rp->rsv_end; + + /* If preference block is within the window, try to allocate it. */ + if (start <= bpref && bpref <= end) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + rp->rsv_alloc_hit++; + bno = bpref; + goto gotit; + } + } else + if (dtog(fs, rp->rsv_start) == cg) + bpref = dtogd(fs, rp->rsv_start); + else + bpref = 0; + } else { + if (dtog(fs, bpref) != cg) + bpref = 0; + if (bpref != 0) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + bno = bpref; + goto gotit; + } + } + } + + bno = ext2_mapsearch(fs, bbp, bpref); + if (bno < 0) + return (0); + +gotit: + setbit(bbp, (daddr_t)bno); + EXT2_LOCK(ump); + fs->e2fs->e2fs_fbcount--; + fs->e2fs_gd[cg].ext2bgd_nbfree--; + fs->e2fs_fmod = 1; + EXT2_UNLOCK(ump); + bdwrite(bp); + bno = phy_blk(cg, fs) + bno; + return (bno); +} + +/* + * Initialize reservation window per inode. + */ +void +ext2_init_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + rp = malloc(sizeof(struct ext2_rsv_win), + M_EXT2NODE, M_WAITOK | M_ZERO); + + /* + * If malloc failed, we just do not use the + * reservation window mechanism. + */ + if (rp == NULL) + return; + + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; + rp->rsv_alloc_hit = 0; + + ip->i_rsv = rp; +} + +/* + * Discard reservation window. + * + * It is called during the following situations: + * 1. free an inode + * 2. sync inode + * 3. truncate a file + */ +void +ext2_discard_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + if (ip->i_rsv == NULL) + return; + + rp = ip->i_rsv; + + /* If reservation window is empty, nothing to do */ + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED) + return; + + EXT2_TREE_LOCK(ip->i_e2fs); + ext2_remove_rsv_win(ip->i_e2fs, rp); + EXT2_TREE_UNLOCK(ip->i_e2fs); + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; +} + +/* + * Remove a ext2_rsv_win structure from RB tree. + */ +static void +ext2_remove_rsv_win(struct m_ext2fs *fs, struct ext2_rsv_win *rp) +{ + RB_REMOVE(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_alloc_hit = 0; +} + +/* + * Check bpref is in the reservation window. + */ +static int +ext2_bpref_in_rsv(struct ext2_rsv_win *rp, int32_t bpref) +{ + if (bpref >= 0 && (bpref < rp->rsv_start || bpref > rp->rsv_end)) + return (0); + + return (1); +} + +/* + * Search a tree node from RB tree. It includes the bpref or + * the previous one if bpref is not in any window. + */ +static struct ext2_rsv_win * +ext2_search_rsv(struct ext2_rsv_win_tree *root, int32_t start) +{ + struct ext2_rsv_win *prev, *next; + + if (RB_EMPTY(root)) + return (NULL); + + next = RB_ROOT(root); + do { + prev = next; + if (start < next->rsv_start) + next = RB_LEFT(next, rsv_link); + else if (start > next->rsv_end) + next = RB_RIGHT(next, rsv_link); + else + return (next); + } while (next != NULL); + + if (prev->rsv_start > start) { + next = RB_PREV(ext2_rsv_win_tree, root, prev); + if (next != NULL) + prev = next; + } + + return (prev); +} + +/* + * Find a reservation window by given range from start to + * the end of this cylinder group. + */ +static int +ext2_find_rsv(struct ext2_rsv_win *search, struct ext2_rsv_win *rp, + struct m_ext2fs *fs, int32_t start, int cg) +{ + struct ext2_rsv_win *rsv, *prev; + int32_t cur; + int size = rp->rsv_goal_size; + + if (search == NULL) { + rp->rsv_start = start & ~7; + rp->rsv_end = start + size - 1; + rp->rsv_alloc_hit = 0; + + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); + } + + /* + * Make the start of reservation window byte-aligned + * in order to can find a free block with bit operations + * in the ext2_search_next_block() function. + */ + cur = start & ~7; + rsv = search; + prev = NULL; + + while (1) { + if (cur <= rsv->rsv_end) + cur = rsv->rsv_end + 1; + + if (dtog(fs, cur) != cg) + return (-1); + + prev = rsv; + rsv = RB_NEXT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rsv); + + if (rsv == NULL) + break; + + if (cur + size <= rsv->rsv_start) + break; + } + + if (prev != rp && rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + + rp->rsv_start = cur; + rp->rsv_end = cur + size - 1; + rp->rsv_alloc_hit = 0; + + if (prev != rp) + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); +} + +/* + * Find a free block by given range from bpref to + * the end of this cylinder group. + */ +static daddr_t +ext2_search_next_block(struct m_ext2fs *fs, char *bbp, int bpref, int cg) +{ + daddr_t bno; + int start, loc, len, map, i; + + start = bpref / NBBY; + len = howmany(fs->e2fs->e2fs_fpg, NBBY) - start; + loc = skpc(0xff, len, &bbp[start]); + if (loc == 0) + return (-1); + + i = start + len - loc; + map = bbp[i]; + bno = i * NBBY; + for (i = 1; i < (1 << NBBY); i <<= 1, bno++) { + if ((map & i) == 0) + return (bno); + } + + return (-1); +} + +/* + * Allocate a new reservation window. + */ +static int +ext2_alloc_new_rsv(struct inode *ip, int cg, struct buf *bp, int32_t bpref) +{ + struct m_ext2fs *fs; + struct ext2_rsv_win *rp, *search; + char *bbp; + int start, size, ret; + + fs = ip->i_e2fs; + rp = ip->i_rsv; + bbp = bp->b_data; + size = rp->rsv_goal_size; + + if (bpref <= 0) + start = phy_blk(cg, fs); + else + start = bpref; + + /* Dynamically increase the size of window */ + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) { + if (rp->rsv_alloc_hit > + ((rp->rsv_end - rp->rsv_start + 1) / 2)) { + size = size * 2; + if (size > EXT2_RSV_MAX_RESERVE_BLKS) + size = EXT2_RSV_MAX_RESERVE_BLKS; + rp->rsv_goal_size = size; + } + } + + EXT2_TREE_LOCK(fs); + + search = ext2_search_rsv(fs->e2fs_rsv_tree, start); + +repeat: + ret = ext2_find_rsv(search, rp, fs, start, cg); + if (ret < 0) { + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + EXT2_TREE_UNLOCK(fs); + + start = dtogd(fs, rp->rsv_start); + start = ext2_search_next_block(fs, bbp, start, cg); + if (start < 0) { + EXT2_TREE_LOCK(fs); + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + + start = phy_blk(cg, fs) + start; + if (start >= rp->rsv_start && start <= rp->rsv_end) + return (0); + + search = rp; + EXT2_TREE_LOCK(fs); + goto repeat; +} + +/* + * Allocate a free block from reservation window. + */ +static u_long +ext2_rsvalloc(struct m_ext2fs *fs, struct inode *ip, int cg, + struct buf *bp, int32_t bpref, int size) +{ + struct ext2_rsv_win *rp; + int ret; + + rp = ip->i_rsv; + if (rp == NULL) + return (ext2_alloc_blk(ip, cg, bp, bpref, NULL)); + + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED || + !ext2_bpref_in_rsv(rp, bpref)) { + ret = ext2_alloc_new_rsv(ip, cg, bp, bpref); + if (ret < 0) + return (0); + } + + return (ext2_alloc_blk(ip, cg, bp, bpref, rp)); +} + +/* + * Allocate a block using reservation window in ext2 file system. + * + * NOTE: This function will replace the ext2_alloc() function. + */ +int +ext2_alloc_rsv(struct inode *ip, int32_t lbn, int32_t bpref, + int size, struct ucred *cred, int32_t *bnp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + struct buf *bp; + int32_t bno = 0; + int i, cg, error; + + *bnp = 0; + fs = ip->i_e2fs; + ump = ip->i_ump; + mtx_assert(EXT2_MTX(ump), MA_OWNED); + + if (size == fs->e2fs_bsize && fs->e2fs->e2fs_fbcount == 0) + goto nospace; + if (cred->cr_uid != 0 && + fs->e2fs->e2fs_fbcount < fs->e2fs->e2fs_rbcount) + goto nospace; + + if (bpref >= fs->e2fs->e2fs_bcount) + bpref = 0; + if (bpref == 0) + cg = ino_to_cg(fs, ip->i_number); + else + cg = dtog(fs, bpref); + + /* If cg has some free blocks, then try to allocate a free block from this cg */ + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + /* Try to allocate from reservation window */ + bno = ext2_rsvalloc(fs, ip, cg, bp, bpref, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + /* Just need to try to allocate a free block from rest groups. */ + cg = (cg + 1) % fs->e2fs_gcount; + for (i = 1; i < fs->e2fs_gcount; i++) { + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + bno = ext2_rsvalloc(fs, ip, cg, bp, -1, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + cg++; + if (cg == fs->e2fs_gcount) + cg = 0; + } + +allocated: + if (bno > 0) { + ip->i_next_alloc_block = lbn; + ip->i_next_alloc_goal = bno; + + ip->i_blocks += btodb(fs->e2fs_bsize); + ip->i_flag |= IN_CHANGE | IN_UPDATE; + *bnp = bno; + return (0); + } + +nospace: + EXT2_UNLOCK(ump); + ext2_fserr(fs, cred->cr_uid, "file system full"); + uprintf("\n%s: write failed, file system is full\n", fs->e2fs_fsmnt); + return (ENOSPC); + +ioerror: + ext2_fserr(fs, cred->cr_uid, "file system IO error"); + uprintf("\n%s: write failed, file system IO error\n", fs->e2fs_fsmnt); + return (EIO); +} int ext2_alloc(ip, lbn, bpref, size, cred, bnp) @@ -923,9 +1413,11 @@ start = 0; loc = skpc(0xff, len, &bbp[start]); if (loc == 0) { - printf("start = %d, len = %d, fs = %s\n", - start, len, fs->e2fs_fsmnt); - panic("ext2fs_alloccg: map corrupted"); + /* XXX: just for reservation window */ + return -1; + /*printf("start = %d, len = %d, fs = %s\n",*/ + /*start, len, fs->e2fs_fsmnt);*/ + /*panic("ext2fs_alloccg: map corrupted");*/ /* NOTREACHED */ } } diff -urN /usr/src/sys/fs/ext2fs/ext2_balloc.c new/ext2_balloc.c --- /usr/src/sys/fs/ext2fs/ext2_balloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_balloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -49,6 +49,7 @@ #include #include #include +#include /* * Balloc defines the structure of file system storage * by allocating the physical blocks on a device given @@ -78,6 +79,9 @@ fs = ip->i_e2fs; ump = ip->i_ump; + if (ip->i_rsv == NULL) + ext2_init_rsv(ip); + /* * check if this is a sequential block allocation. * If so, increment next_alloc fields to allow ext2_blkpref @@ -136,9 +140,9 @@ else nsize = fs->e2fs_bsize; EXT2_LOCK(ump); - error = ext2_alloc(ip, lbn, - ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), - nsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, + ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), + nsize, cred, &newb); if (error) return (error); bp = getblk(vp, lbn, nsize, 0, 0, 0); @@ -170,9 +174,9 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[0].in_off + EXT2_NDIR_BLOCKS, &ip->i_db[0], 0); - if ((error = ext2_alloc(ip, lbn, pref, - (int)fs->e2fs_bsize, cred, &newb))) - return (error); + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb))) + return (error); nb = newb; bp = getblk(vp, indirs[1].in_lbn, fs->e2fs_bsize, 0, 0, 0); bp->b_blkno = fsbtodb(fs, newb); @@ -211,7 +215,7 @@ if (pref == 0) pref = ext2_blkpref(ip, lbn, indirs[i].in_off, bap, bp->b_lblkno); - error = ext2_alloc(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); if (error) { brelse(bp); return (error); @@ -250,8 +254,8 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[i].in_off, &bap[0], bp->b_lblkno); - if ((error = ext2_alloc(ip, - lbn, pref, (int)fs->e2fs_bsize, cred, &newb)) != 0) { + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb)) != 0) { brelse(bp); return (error); } diff -urN /usr/src/sys/fs/ext2fs/ext2_inode.c new/ext2_inode.c --- /usr/src/sys/fs/ext2fs/ext2_inode.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_inode.c 2010-08-19 02:47:29.000000000 +0800 @@ -52,6 +52,7 @@ #include #include #include +#include static int ext2_indirtrunc(struct inode *, int32_t, int32_t, int32_t, int, long *); @@ -153,6 +154,11 @@ } fs = oip->i_e2fs; osize = oip->i_size; + + EXT2_RSV_LOCK(oip); + ext2_discard_rsv(oip); + EXT2_RSV_UNLOCK(oip); + /* * Lengthen the size of the file. We must ensure that the * last byte of the file is allocated. Since the smallest @@ -484,6 +490,10 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ext2_inactive: pushing active", vp); + EXT2_RSV_LOCK(ip); + ext2_discard_rsv(ip); + EXT2_RSV_UNLOCK(ip); + /* * Ignore inodes related to stale file handles. */ @@ -525,11 +535,21 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ufs_reclaim: pushing active", vp); ip = VTOI(vp); + if (ip->i_flag & IN_LAZYMOD) { ip->i_flag |= IN_MODIFIED; ext2_update(vp, 0); } vfs_hash_remove(vp); + + EXT2_RSV_LOCK(ip); + if (ip->i_rsv != NULL) { + free(ip->i_rsv, M_EXT2NODE); + ip->i_rsv = NULL; + } + EXT2_RSV_UNLOCK(ip); + mtx_destroy(&ip->i_rsv_lock); + free(vp->v_data, M_EXT2NODE); vp->v_data = 0; vnode_destroy_vobject(vp); diff -urN /usr/src/sys/fs/ext2fs/ext2_rsv_win.h new/ext2_rsv_win.h --- /usr/src/sys/fs/ext2fs/ext2_rsv_win.h 1970-01-01 08:00:00.000000000 +0800 +++ new/ext2_rsv_win.h 2010-08-19 02:47:29.000000000 +0800 @@ -0,0 +1,78 @@ +/*- + * Copyright (c) 2010, 2010 Zheng Liu + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: src/sys/fs/ext2fs/ext2_rsv_win.h,v 0.1 2010/05/08 12:41:51 lz Exp $ + */ +#ifndef _FS_EXT2FS_EXT2_RSV_WIN_H_ +#define _FS_EXT2FS_EXT2_RSV_WIN_H_ + +#include + +#define EXT2_RSV_DEFAULT_RESERVE_BLKS 8 +#define EXT2_RSV_MAX_RESERVE_BLKS 1024 +#define EXT2_RSV_NOT_ALLOCATED 0 + +#define EXT2_RSV_LOCK(ip) mtx_lock(&ip->i_rsv_lock) +#define EXT2_RSV_UNLOCK(ip) mtx_unlock(&ip->i_rsv_lock) + +#define EXT2_TREE_LOCK(fs) mtx_lock(&fs->e2fs_rsv_lock); +#define EXT2_TREE_UNLOCK(fs) mtx_unlock(&fs->e2fs_rsv_lock); + +/* + * Reservation window entry + */ +struct ext2_rsv_win { + RB_ENTRY(ext2_rsv_win) rsv_link; /* RB tree links */ + + int32_t rsv_goal_size; /* Default reservation window size */ + int32_t rsv_alloc_hit; /* Number of allocated windows */ + + int32_t rsv_start; /* First bytes of window */ + int32_t rsv_end; /* End bytes of window */ +}; + +RB_HEAD(ext2_rsv_win_tree, ext2_rsv_win); + +static __inline int +ext2_rsv_win_cmp(const struct ext2_rsv_win *a, + const struct ext2_rsv_win *b) +{ + if (a->rsv_start < b->rsv_start) + return (-1); + if (a->rsv_start == b->rsv_start) + return (0); + + return (1); +} +RB_PROTOTYPE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + +/* predefine */ +struct inode; +/* ext2_alloc.c */ +void ext2_init_rsv(struct inode *ip); +void ext2_discard_rsv(struct inode *ip); +int ext2_alloc_rsv(struct inode *, int32_t, int32_t, int, struct ucred *, int32_t *); + +#endif /* !_FS_EXT2FS_EXT2_RSV_WIN_H_ */ diff -urN /usr/src/sys/fs/ext2fs/ext2_vfsops.c new/ext2_vfsops.c --- /usr/src/sys/fs/ext2fs/ext2_vfsops.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_vfsops.c 2010-08-19 02:47:29.000000000 +0800 @@ -1,4 +1,4 @@ -/*- +/* * modified for EXT2FS support in Lites 1.1 * * Aug 1995, Godmar Back (gback@cs.utah.edu) @@ -61,6 +61,7 @@ #include #include #include +#include static int ext2_flushfiles(struct mount *mp, int flags, struct thread *td); static int ext2_mountfs(struct vnode *, struct mount *); @@ -95,9 +96,9 @@ static int compute_sb_data(struct vnode * devvp, struct ext2fs * es, struct m_ext2fs * fs); -static const char *ext2_opts[] = { "from", "export", "acls", "noexec", - "noatime", "union", "suiddir", "multilabel", "nosymfollow", - "noclusterr", "noclusterw", "force", NULL }; +static const char *ext2_opts[] = { "acls", "async", "export", "force", + "from", "multilabel", "noatime", "noclusterr", "noclusterw", + "noexec", "nosymfollow", "suiddir", "union", NULL }; /* * VFS Operations. @@ -581,6 +582,14 @@ if ((error = compute_sb_data(devvp, ump->um_e2fs->e2fs, ump->um_e2fs))) goto out; + /* Initial reservation window index and lock */ + bzero(&ump->um_e2fs->e2fs_rsv_lock, sizeof(struct mtx)); + mtx_init(&ump->um_e2fs->e2fs_rsv_lock, + "rsv tree lock", NULL, MTX_DEF); + ump->um_e2fs->e2fs_rsv_tree = malloc(sizeof(struct ext2_rsv_win_tree), + M_EXT2MNT, M_WAITOK | M_ZERO); + RB_INIT(ump->um_e2fs->e2fs_rsv_tree); + brelse(bp); bp = NULL; fs = ump->um_e2fs; @@ -680,6 +689,8 @@ g_topology_unlock(); PICKUP_GIANT(); vrele(ump->um_devvp); + free(fs->e2fs_rsv_tree, M_EXT2MNT); + mtx_destroy(&fs->e2fs_rsv_lock); free(fs->e2fs_gd, M_EXT2MNT); free(fs->e2fs_contigdirs, M_EXT2MNT); free(fs->e2fs, M_EXT2MNT); @@ -919,6 +930,10 @@ ip->i_prealloc_count = 0; ip->i_prealloc_block = 0; + bzero(&ip->i_rsv_lock, sizeof(struct mtx)); + mtx_init(&ip->i_rsv_lock, "inode rsv lock", NULL, MTX_DEF); + ip->i_rsv = NULL; + /* * Now we want to make sure that block pointers for unused * blocks are zeroed out - ext2_balloc depends on this diff -urN /usr/src/sys/fs/ext2fs/ext2fs.h new/ext2fs.h --- /usr/src/sys/fs/ext2fs/ext2fs.h 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2fs.h 2010-08-19 02:47:29.000000000 +0800 @@ -38,6 +38,7 @@ #define _FS_EXT2FS_EXT2_FS_H #include +#include /* * Special inode numbers @@ -174,6 +175,9 @@ char e2fs_wasvalid; /* valid at mount time */ off_t e2fs_maxfilesize; struct ext2_gd *e2fs_gd; /* Group Descriptors */ + + struct mtx e2fs_rsv_lock; /* Protect reservation window RB tree */ + struct ext2_rsv_win_tree *e2fs_rsv_tree; /* Reservation window index */ }; /* diff -urN /usr/src/sys/fs/ext2fs/inode.h new/inode.h --- /usr/src/sys/fs/ext2fs/inode.h 2010-01-14 22:30:54.000000000 +0800 +++ new/inode.h 2010-08-19 02:47:29.000000000 +0800 @@ -100,6 +100,10 @@ int32_t i_gen; /* Generation number. */ u_int32_t i_uid; /* File owner. */ u_int32_t i_gid; /* File group. */ + + /* Fields for reservation window */ + struct mtx i_rsv_lock; /* Protects i_rsv */ + struct ext2_rsv_win *i_rsv; /* Reservation window */ }; /* --Boundary-00=_QxzoMs/Iug8+N80-- From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 15:03:53 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2CF31065674 for ; Wed, 29 Sep 2010 15:03:53 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 450D88FC14 for ; Wed, 29 Sep 2010 15:03:52 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA29301; Wed, 29 Sep 2010 18:03:48 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA35553.1080804@icyb.net.ua> Date: Wed, 29 Sep 2010 18:03:47 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Karl Pielorz References: <201009291024.o8TAOnph013730@higson.cam.lispworks.com> <8CF1F1F15531907E2F8DC2A2@HexaDeca64.dmpriest.net.uk> In-Reply-To: <8CF1F1F15531907E2F8DC2A2@HexaDeca64.dmpriest.net.uk> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 15:03:54 -0000 on 29/09/2010 13:30 Karl Pielorz said the following: > --On 29 September 2010 11:24 +0100 Martin Simmons wrote: > >>> I saw a while ago a command under 8.1 to get 'more info' for these stuck >>> processes, but can't for the life of me remember it? >> >> Maybe procstat -k -k $pid is what you are looking for (i.e. a kernel >> backtrace)? Use -a instead of $pid to get all processes. > > Yup, that's it - thanks! > > Having run it I get: > > procstat -k -k 1927 (PID 1927 is the 'ls' that's locked up) > > PID TID COMM TDNAME KSTACK > 1927 100206 ls - mi_switch+0x16f sleepq_wait+0x42 > _cv_wait+0x111 zio_wait+0x61 dbuf_read+0x39a dnode_hold_impl+0xe7 > dmu_bonus_hold+0x2a zfs_zget+0x227 zfs_dirent_lock+0x4e3 zfs_dirlook+0x69 > zfs_lookup+0x1f0 zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf0 VOP_LOOKUP_APV+0x40 > lookup+0x40a namei+0x52b kern_statat_vnhook+0x8f kern_statat+0x15 > > > Which will hopefully mean something more to someone here than it does me at the > moment ;) This looks like the process is stuck waiting for I/O completion. Can't tell whether it's an I/O problem, or perhaps the I/O operation has long completed but wakeup from it was lost... -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 15:11:13 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D65371065673 for ; Wed, 29 Sep 2010 15:11:13 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id BCBBD8FC13 for ; Wed, 29 Sep 2010 15:11:13 +0000 (UTC) Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta03.emeryville.ca.mail.comcast.net with comcast id CdRT1f0061vN32cA3fBDjw; Wed, 29 Sep 2010 15:11:13 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta22.emeryville.ca.mail.comcast.net with comcast id CfBB1f00R3LrwQ28ifBCu6; Wed, 29 Sep 2010 15:11:13 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 602A39B418; Wed, 29 Sep 2010 08:11:11 -0700 (PDT) Date: Wed, 29 Sep 2010 08:11:11 -0700 From: Jeremy Chadwick To: Karl Pielorz Message-ID: <20100929151111.GA91705@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 15:11:13 -0000 On Wed, Sep 29, 2010 at 10:20:22AM +0100, Karl Pielorz wrote: > > Hi All, > > I moved my machine from FreeBSD 7.2-S/amd64 to 8.1-R/amd64 about a > week ago. Since then I've noticed that ZFS just 'hangs' - e.g. it'll > work fine for a few days, then a process will get 'hung up' waiting > on ZFS. > > The machine is a Tyan motherboard (dual Opteron, dual cores, w/10Gb > of RAM). 7.2-S & ZFS ran perfectly under it. > > Anything else then that touches the pools, also 'hangs' - in top the > original process shows as: > > " > 1927 root 1 44 0 8224K 1544K zio->i 0 0:00 0.00% ls > " > > Anything else that touches the ZFS pools, ends up like: > > " > 2082 root 1 44 0 10284K 2976K zfs 3 0:00 0.00% csh > " > > I saw a while ago a command under 8.1 to get 'more info' for these > stuck processes, but can't for the life of me remember it? > > If someone can give me some pointers to try and track down what's hanging? > > The drives are spread over two Marvell 88SX6081's. I've tried the > mvs driver for that controller, which gave me a bucket load of > errors, and data corruption :( > > Switching back to the standard ATA drivers for that card, I just get > hangs :( - nothing is logged on the console, or syslog when this > happens. Can you provide (with the disks hooked to an ATA/SATA controller and the system not wedged waiting for ZFS I/O) the output from smartctl (you'll probably have to install ports/sysutils/smartmontools) as: smartctl -a /dev/adXX (one command per device) I can review these statistics to see if any of the disks look like they may be misbehaving. It would also help if you could provide dmesg output associated with the ATA/SATA controller which they're hooked to. Thanks! -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 15:19:10 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 07EDA106566C for ; Wed, 29 Sep 2010 15:19:10 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from mail.tdx.com (mail.tdx.com [62.13.128.18]) by mx1.freebsd.org (Postfix) with ESMTP id 983FA8FC0C for ; Wed, 29 Sep 2010 15:19:09 +0000 (UTC) Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk [62.13.130.30]) (authenticated bits=0) by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8TFJ74e061126 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Wed, 29 Sep 2010 16:19:08 +0100 (BST) Date: Wed, 29 Sep 2010 16:18:21 +0100 From: Karl Pielorz To: Jeremy Chadwick Message-ID: <24249EBE70346EDE973308F6@HexaDeca64.dmpriest.net.uk> In-Reply-To: <20100929151111.GA91705@icarus.home.lan> References: <20100929151111.GA91705@icarus.home.lan> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 15:19:10 -0000 --On 29 September 2010 08:11 -0700 Jeremy Chadwick wrote: > Can you provide (with the disks hooked to an ATA/SATA controller and > the system not wedged waiting for ZFS I/O) the output from smartctl > (you'll probably have to install ports/sysutils/smartmontools) as: > > smartctl -a /dev/adXX (one command per device) > > I can review these statistics to see if any of the disks look like they > may be misbehaving. > > It would also help if you could provide dmesg output associated with the > ATA/SATA controller which they're hooked to. I can do - I'll try to get all the relevant stuff up on a site and send you the URL (i.e. dmesg output, zpool status output, smartctl output etc.) The same system works fine under 7.2-Stable with the same drives, hardware etc. - But under 8.1 anything more than "a ZFS little I/O" (e.g. scrubs, copying lots of files etc.) - and it just hangs at a random point, with no error given - and anything that touches the zpool 'hangs next'. I already run smartmontools - though it's currently disabled (incase it was that causing issues). The drives on the whole look OK, I think two have picked up a couple of reallocated blocks (and failed reads) - but it's nothing that's increasing or anything - and like I said, 7.2-S is fine with the same setup... If I break to KDB with the current system, is there anything I can do to get a better look at what's hung up/where? -Karl From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 16:43:07 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92E6B1065673 for ; Wed, 29 Sep 2010 16:43:07 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id 327FD8FC1D for ; Wed, 29 Sep 2010 16:43:06 +0000 (UTC) Received: from higson.cam.lispworks.com (IDENT:U2FsdGVkX1/DDsrOp/fc0hHDkudR8RwKYqZIPzbTCBA@higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id o8TGh4Sg094770; Wed, 29 Sep 2010 17:43:04 +0100 (BST) (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com by higson.cam.lispworks.com (8.13.1) id o8TGh41B016652; Wed, 29 Sep 2010 17:43:04 +0100 Received: (from martin@localhost) by higson.cam.lispworks.com (8.13.1/8.13.1/Submit) id o8TGh4hx016649; Wed, 29 Sep 2010 17:43:04 +0100 Date: Wed, 29 Sep 2010 17:43:04 +0100 Message-Id: <201009291643.o8TGh4hx016649@higson.cam.lispworks.com> From: Martin Simmons To: freebsd-fs@freebsd.org In-reply-to: <707981.30589.qm@web110702.mail.gq1.yahoo.com> (message from Scott Johnson on Tue, 28 Sep 2010 17:01:47 -0700 (PDT)) References: <707981.30589.qm@web110702.mail.gq1.yahoo.com> Subject: Re: zfs+smb checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 16:43:07 -0000 >>>>> On Tue, 28 Sep 2010 17:01:47 -0700 (PDT), Scott Johnson said: > > I've been running FBSD 8.0-rel and now 8.1-rel for 6 months or so, with a 4 disk > raidz, weekly scrubs. Never saw a checksum error until last week when I got > about 5 during operation, followed by another 10 or so on the next scrub. > > They all occurred when I was accessing the server through SMB from my WinXP > desktop. I've been doing this for months, transferring files to and from > regularly, but this was the first time I'd been doing simultaneous heavy reading > & writing. > > I was running Imgburn on WinXP, creating an iso file from folders. Both source > files and destination iso file were on the same SMB share on the FBSD server. > > After all the checksum errors, there was 1 unrecoverable error, on one of the > destination iso files. > > There are 0 read errors, 0 write errors, and 0 new SMART errors on the 4 disks. > The checksum errors were spread roughly equally across the 4 disks. All of which > leads me to believe this is a software problem at the filesystem level. Or maybe it is some other hardware problem, such as the disk controller or RAM? __Martin From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 18:10:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EBA5B106564A for ; Wed, 29 Sep 2010 18:10:49 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 304588FC17 for ; Wed, 29 Sep 2010 18:10:48 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA02414; Wed, 29 Sep 2010 21:10:45 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4CA38124.60902@freebsd.org> Date: Wed, 29 Sep 2010 21:10:44 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Steven Hartland References: <5DB6E7C798E44D33A05673F4B773405E@multiplay.co.uk><4C8D087B.5040404@freebsd.org><03537796FAB54E02959E2D64FC83004F@multiplay.co.uk><4C8D280F.3040803@freebsd.org><3FBF66BF11AA4CBBA6124CA435A4A31B@multiplay.co.uk><4C8E4212.30000@freebsd.org> <4C90B4C8.90203@freebsd.org> <6DFACB27CA8A4A22898BC81E55C4FD36@multiplay.co.uk> <4C90D3A1.7030008@freebsd.org> <0B1A90A08DFE4ADA9540F9F3846FDF38@multiplay.co.uk> <4C90EDB8.3040709@freebsd.org> <3F29E8CED7B24805B2D93F62A4EC9559@multiplay.co.uk> <4C9126FB.2020707@freebsd.org> <1E0B9C1145784776A773B99FC1139CD5@multiplay.co.uk> <4C987F90.6000006@freebsd.org> <4C98803F.7000901@freebsd.org> <879BF5981D1B4C7290BDF18286BA1EEC@multiplay.co.uk> <4C989201.2 0506@freebsd.org> <4C98A2BA.1080004@freebsd.org> <4C98BFCE.2020202@freebsd.org> In-Reply-To: <4C98BFCE.2020202@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs very poor performance compared to ufs due to lack of cache? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 18:10:50 -0000 [ping] on 21/09/2010 17:23 Andriy Gapon said the following: > on 21/09/2010 16:53 Steven Hartland said the following: >> That's what I thought you where saying. Is there a test you would suggest to confirm >> either way more accurately? > > Perhaps you can try the test scenario that you described and monitor parameters > suggested by Wiktor in this thread. > > That is, have two large files and set arc max size such that one of them can fit > in ARC readily, but two of them won't fit by a large margin. Make sure that > remaining RAM is large enough to hold both files in page cache. > > 1. sendfile one file, then the other > 2. record kstat.zfs.misc.arcstats values > 3. sendfile the first file again > 4. record kstat.zfs.misc.arcstats values > > If the first file data was re-used from page cache, then you won't see much > changes in kstat.zfs.misc.arcstats. If it had to be taken from ARC or from disk, > then either ARC hits or ARC misses will grow noticeably. > > Make sure to not have any parallel activity that could affect kstat.zfs.misc.arcstats. > > I think kstat.zfs.misc.arcstats.hits and kstat.zfs.misc.arcstats.misses are two > primary indicators in this test. > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 18:44:28 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 090A310656AA; Wed, 29 Sep 2010 18:44:28 +0000 (UTC) (envelope-from prvs=1888902647=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5E1388FC0A; Wed, 29 Sep 2010 18:44:27 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 29 Sep 2010 19:33:27 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 29 Sep 2010 19:33:27 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 by mail1.multiplay.co.uk (MDaemon PRO v10.0.4) with ESMTP id md50011327826.msg; Wed, 29 Sep 2010 19:33:26 +0100 X-Authenticated-Sender: Killing@multiplay.co.uk X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1888902647=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <2A903E7281D340BBA77338F714AACB84@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <5DB6E7C798E44D33A05673F4B773405E@multiplay.co.uk><4C8D087B.5040404@freebsd.org><03537796FAB54E02959E2D64FC83004F@multiplay.co.uk><4C8D280F.3040803@freebsd.org><3FBF66BF11AA4CBBA6124CA435A4A31B@multiplay.co.uk><4C8E4212.30000@freebsd.org> <4C90B4C8.90203@freebsd.org> <6DFACB27CA8A4A22898BC81E55C4FD36@multiplay.co.uk> <4C90D3A1.7030008@freebsd.org> <0B1A90A08DFE4ADA9540F9F3846FDF38@multiplay.co.uk> <4C90EDB8.3040709@freebsd.org> <3F29E8CED7B24805B2D93F62A4EC9559@multiplay.co.uk> <4C9126FB.2020707@freebsd.org> <1E0B9C1145784776A773B99FC1139CD5@multiplay.co.uk> <4C987F90.6000006@freebsd.org> <4C98803F.7000901@freebsd.org> <879BF5981D1B4C7290BDF18286BA1EEC@multiplay.co.uk> <4C989201.2 0506@freebsd.org> <4C98A2BA.1080004@freebsd.org> <4C98BFCE.2020202@freebsd.org> <4CA38124.60902@freebsd.org> Date: Wed, 29 Sep 2010 19:33:32 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5994 Cc: freebsd-fs@freebsd.org Subject: Re: zfs very poor performance compared to ufs due to lack of cache? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 18:44:28 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Wednesday, September 29, 2010 7:10 PM Subject: Re: zfs very poor performance compared to ufs due to lack of cache? > > [ping] Sorry Andriy not had chance to pick this back up yet, day to day getting in the way. Don't worry not forgotten ;-) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 19:13:55 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC926106564A for ; Wed, 29 Sep 2010 19:13:55 +0000 (UTC) (envelope-from crossd@cs.rpi.edu) Received: from newman.cs.rpi.edu (newman.cs.rpi.edu [128.113.126.12]) by mx1.freebsd.org (Postfix) with ESMTP id 6D36D8FC18 for ; Wed, 29 Sep 2010 19:13:54 +0000 (UTC) X-Hash: SCt|9a86925a1455fd7f633a0196eb4c71511e3200b9|0a17d6733c62c0f80357da99bdbe33ee X-Countries: United States X-SMTP-From: accepted monica.cs.rpi.edu [128.213.56.13] (monica.cs.rpi.edu) {United States} DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=cs.rpi.edu; h=date :from:to:subject:message-id:mime-version:content-type; s= default; i=128.213.56.13@cs.rpi.edu; t=1285786090; x=1286390890; l=1709; bh=IYFtVGeVOpLTMFPRtT2ctEv3dRE=; b=BZD1T3aQGYhUUz1J0uK1 vCBwnCmZYPrY8wvb5npEct+/6wiwrbfapQLTT8sArYq/NYKILJ00I0vpV714qInn oeTc1b7c0HdsarwWAZDGosa3RvxkF7fDJvr7eg2z5F0uWIDCFuS0tWEkUnhkK0Ma LKunmEhZ38Hvs6pleYGdEqU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=cs.rpi.edu; h=date:from:to :subject:message-id:mime-version:content-type; q=dns; s=default; b= Q/OxuGh0WcE0NRIz7VMJf4ZwrTOZNcW5eCQp8wYHKewMgeTUs1LWzYAd/65wpG8t i8c73rwh3QEmDOEc+KsMFWrqBmGqNBkWL2BPL0coSNa/uh+BQHAx0BN6+Kck6Be1 9EdiwxPEu2Dw+jX1JTxMPggZVDqhape//Om3aHiqMn4= X-Spam-Report: Spam Report from newman.cs.rpi.edu (SA:3.2.5): -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP -0.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] 0.0 AWL AWL: From: address is in the auto white-list X-Spam-Info: -2.7, local; ALL_TRUSTED,AWL,BAYES_00 X-Spam-Scanned-By: newman.cs.rpi.edu using SpamAssassin 3.2.5 X-Virus-Scanned-By: newman.cs.rpi.edu Received: from monica.cs.rpi.edu (root@monica.cs.rpi.edu [128.213.56.13]) by newman.cs.rpi.edu (8.14.3/8.14.3) with ESMTP id o8TIm83O053818 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 29 Sep 2010 14:48:08 -0400 (EDT) (envelope-from crossd@cs.rpi.edu) Received: from monica.cs.rpi.edu (crossd@localhost [127.0.0.1]) by monica.cs.rpi.edu (8.14.3/8.12.6) with ESMTP id o8TIm8xG008130 for ; Wed, 29 Sep 2010 14:48:08 -0400 (EDT) (envelope-from crossd@monica.cs.rpi.edu) Received: (from crossd@localhost) by monica.cs.rpi.edu (8.14.3/8.12.6/Submit) id o8TIlwVn008122; Wed, 29 Sep 2010 14:47:58 -0400 (EDT) (envelope-from crossd) Date: Wed, 29 Sep 2010 14:47:58 -0400 (EDT) From: "David E. Cross" To: fs@freebsd.org Message-ID: <20100929144527.Q7702@monica.cs.rpi.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.67 on 128.113.126.12 Cc: Subject: Unnecessary reads on write load X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 19:13:55 -0000 (redirected from hackers, this is on both 8.0-RELEASE with GENERIC and 8.1-RELEASE with GENERIC on amd64) Tracking down a performance issue where the system apparently needlessly reads on a 100% write load... consider the following C test case: (more after the code) #include #include #include #include int main(int argc, char **argv) { unsigned char dir1[4], dir2[4], filename[15], pathname[1024], buffer[130944]; unsigned int filenum, count, filesize; int fd; arc4random_buf(buffer, 131072); count=atoi(argv[1]); for(;count>0;count--) { filenum=arc4random(); filesize=arc4random_uniform(130944) + 128; sprintf(filename, "%08x", filenum); strncpy(dir1,filename,3); strncpy(dir2,filename+3, 3); dir1[3]=dir2[3]=0; sprintf(pathname, "%s/%s/%s", dir1, dir2, filename); fd=open(pathname, O_CREAT | O_WRONLY, 0644); if (fd < 0) { sprintf(pathname, "%s/%s", dir1, dir2); if (mkdir(pathname, 0755) < 0) { mkdir(dir1, 0755); mkdir(pathname, 0755); }; sprintf(pathname, "%s/%s/%s", dir1, dir2, filename); fd=open(pathname, O_CREAT | O_WRONLY, 0644); } if (fd <0) continue; write(fd,buffer, filesize); close(fd); } return 0; } In running that in an empty directory (it takes one argument, the number of files to create), I see that it spends most of its time in BIORD?!. If I have a debugging kernel I can see that its all in NAMI cache misses, and doing block fetches... but "why?" the only directories that exist are ones that it just created, and should therefore be in the cache, right? Any ideas? Give it a try yourself and tell me what you get. Thank you. -- David E. Cross From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 19:16:04 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 412BC10656E8 for ; Wed, 29 Sep 2010 19:16:04 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id BD5DD8FC1E for ; Wed, 29 Sep 2010 19:16:03 +0000 (UTC) Received: by bwz15 with SMTP id 15so1072774bwz.13 for ; Wed, 29 Sep 2010 12:16:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=XUH5HffCLJhDQyb5xz7dTxdRBBpyL3nKpOelLEq5dIA=; b=P0BtylrISKyE1al/fR6xvkpdAPc2dYJaM5TIIL7Y2ZiaL3yGb/2XeA2YfyK3Ncmc5p g9Q2kdYDGr1pHnbM02HmxDuWX6LzygQ+zQ23Av3hVWUG9jDo2BGl9dbJDMPIPz8KXH4d hdjucdxQ2l1dyKkL4ZNdvZKr9u89w4eUJHfEc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=v/fKO7Rpv8ITm+SZGhzCEbt/X5NKX20KxHFY1ZYs+O36o/XViXi95fWfnBRR+72g36 CMZzOD/5nQXM5ZqSfaWShQOMyO+2ElDlUTmXi7hdIA0GZZWej1MlimWJY6macyohHf3m iitFQewETX1BJlJGmR0edKvNn7RyK+uOjZMkE= MIME-Version: 1.0 Received: by 10.204.65.145 with SMTP id j17mr1451833bki.209.1285785998249; Wed, 29 Sep 2010 11:46:38 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Wed, 29 Sep 2010 11:46:38 -0700 (PDT) Date: Wed, 29 Sep 2010 20:46:38 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 19:16:04 -0000 I have a ZFS "tank" called tpool, the server runs a couple of jails (each with a zfs filesystem). There is a problem with one of these filesystems. First, its disk usage as shown in ``df -h'': ... tpool/rb.org 100G 95G 4.6G 95% /jails/rb.org ... The command ``zfs list'' shows the same: .. tpool/rb.org 95.4G 4.56G 95.4G /jails/rb.org .. However, there is a very mysterious problem somewhere. Something inside this jail is eating diskspace, but we can't find any directories that is actually taking the diskspace. We first suspected either fetchmail or spamassassin of causing a lot of space to be used, since some of their directories were huge. (These were later deleted, and which is why you see that 4.6GB is now available, before that 0GB was available). However, we can't find *any trace* of an actual directory or file that is taking all the spac.e Take this for instance: outsidejail# du -sh rb.org 43G rb.org How can this be? df and zfs are showing that the entire drive is nearly full, yet I can't find any directory that is actually taking all this space. I've carefully looked through every single directory within the jail trying to find something that's taking all that space, but to no avail. ---- My system stats: # uname -a FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 # zpool get version tpool NAME PROPERTY VALUE SOURCE tpool version 14 default # zpool status pool: tpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tpool ONLINE 0 0 0 mirror ONLINE 0 0 0 ad4s1d ONLINE 0 0 0 ad6s1d ONLINE 0 0 0 errors: No known data errors [ Note that I've also done a scrub recently ] ---- From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 19:20:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD979106566C; Wed, 29 Sep 2010 19:20:49 +0000 (UTC) (envelope-from jamesbrandongooch@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 160AA8FC2A; Wed, 29 Sep 2010 19:20:48 +0000 (UTC) Received: by wyb32 with SMTP id 32so76104wyb.13 for ; Wed, 29 Sep 2010 12:20:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=0vw3NwTXKwU/zjRZjXzMAHLVGVoBt5XY0IUfjigoMdc=; b=sjddUp9XeYJNKF1LPjQXwR4I2T76eeG9vKd54quNTZYw4EuZyXJ+fKSwzH8YBXpFX2 DNw4WbJqOg/k1Ys7R7xLWGhXMZqte1mKZwIK4H0vZarsExnJrPUJKrVSbH8bzCM8mcjD j9wXs8u6qhv4qy/fIpB2S/ZVBpk9R3hJE6Wzw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=JBpzGXlDBt3wPzkYa5lFkb+R3OPgDWlcpom97pgP12yHo+qC9Uo452jlM2fNEPVwPZ lNdOQApf2no4rwUcHwzCPT32mjYMMWR//ysd4r+DSfD7ESn9w4WQ8yxmJ2WgVHNW1irz fOZVLyVDj6nQn1Xa6IR8mXZpGYhcJZlcJq6CQ= MIME-Version: 1.0 Received: by 10.216.156.21 with SMTP id l21mr2888856wek.83.1285786216920; Wed, 29 Sep 2010 11:50:16 -0700 (PDT) Received: by 10.216.133.133 with HTTP; Wed, 29 Sep 2010 11:50:16 -0700 (PDT) In-Reply-To: <201009290917.05269.jhb@freebsd.org> References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org> Date: Wed, 29 Sep 2010 13:50:16 -0500 Message-ID: From: Brandon Gooch To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 19:20:49 -0000 On Wed, Sep 29, 2010 at 8:17 AM, John Baldwin wrote: > On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote: >> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote: >> > On Wed, 29 Sep 2010, Bruce Evans wrote: >> > >> > > On Wed, 29 Sep 2010, Bruce Evans wrote: >> > > >> > >> For benchmarks on ext2fs: >> > >> >> > >> Under FreeBSD-~5.2 rerun today: >> > >> untar: =A0 =A0 59.17 real >> > >> tar: =A0 =A0 =A0 19.52 real >> > >> >> > >> Under -current run today: >> > >> untar: =A0 =A0101.16 real >> > >> tar: =A0 =A0 =A0172.03 real >> > >> >> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower= for >> > >> untar. >> > >> ... >> > >> So it seems that only 1 block in every 8 is used, and there is a se= ek >> > >> after every block. =A0This asks for an 8-fold reduction in throughp= ut, >> > >> and it seems to have got that and a bit more for reading although n= ot >> > >> for writing. =A0Even (or especially) with perfect hardware, it must= give >> > >> an 8-fold reduction. =A0And it is likely to give more, since it def= eats >> > >> vfs clustering by making all runs of contiguous blocks have length = 1. >> > >> >> > >> Simple sequential allocation should be used unless the allocation p= olicy >> > >> and implementation are very good. >> > > >> > > This work a bit better after zapping the 8-fold way: >> > =A0 =A0Things >> > > ... >> > > This gives an improvement of: >> > > >> > > untar: =A0 =A0101.16 real -> 63.46 >> > > tar: =A0 =A0 =A0172.03 real -> 50.70 >> > > >> > > Now -current is only 1.1 times slower for untar and 2.6 times slower= for >> > > tar. >> > > >> > > There must be a problem with bpref for things to have been so bad. = =A0There >> > > is some point to leaving a gap of 7 blocks for expansion, but the ga= p was >> > > left even between blocks in a single file. >> > > ... >> > > I haven't tried the bde_blkpref hack in the above. =A0It should kill= bpref >> > > completely so that there is no jump between lbn0 and lbn1, and break >> > > cylinder group based allocation even better. =A0Setting bde_blkpref = to 1 >> > > restores the bug that was present in ext2fs in FreeBSD between 1995 = and >> > > 2010. =A0This bug gave seqential allocation starting at the beginnin= g of >> > > the disk in almost all cases, so map searches were slow and early gr= oups >> > > filled up before later groups were used at all. >> > >> > Tried this (patch repeated below), and it gave essentially the same >> > speed as old versions. >> > >> > The main problem seems to be that the `goal' variables aren't initiali= zed. >> > After restoring bits verbatim from an old version, things seem to work= as >> > expected: >> > >> > % Index: ext2_alloc.c >> > % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v >> > % retrieving revision 1.2 >> > % diff -u -2 -r1.2 ext2_alloc.c >> > % --- ext2_alloc.c =A01 Sep 2010 05:34:17 -0000 =A0 =A0 =A0 1.2 >> > % +++ ext2_alloc.c =A028 Sep 2010 21:08:42 -0000 >> > % @@ -1,2 +1,5 @@ >> > % +int bde_blkpref =3D 0; >> > % +int bde_alloc8 =3D 0; >> > % + >> > % =A0/*- >> > % =A0 * =A0modified for Lites 1.1 >> > % @@ -117,4 +120,8 @@ >> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ext2_alloccg); >> > % =A0 =A0 =A0 =A0 =A0if (bno > 0) { >> > % + =A0 =A0 =A0 =A0 /* set next_alloc fields as done in block_getblk *= / >> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_block =3D lbn; >> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_goal =3D bno; >> > % + >> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_blocks +=3D btodb(fs->e2fs_= bsize); >> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_flag |=3D IN_CHANGE | IN_UP= DATE; >> > >> > The only things that changed recently in this block were the 4 deleted >> > lines and 4 lines with tabs corrupted to spaces. =A0Perhaps an editing >> > error. >> > >> > % @@ -542,6 +549,12 @@ >> > % =A0 =A0 =A0then set the goal to what we thought it should be >> > % =A0 */ >> > % +if (bde_blkpref =3D=3D 0) { >> > % =A0 if(ip->i_next_alloc_block =3D=3D lbn && ip->i_next_alloc_goal != =3D 0) >> > % =A0 =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal; >> > % +} else if (bde_blkpref =3D=3D 1) { >> > % + if(ip->i_next_alloc_block =3D=3D lbn) >> > % + =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal; >> > % +} else >> > % + return 0; >> > % >> > % =A0 /* now check whether we were provided with an array that basical= ly >> > >> > Not needed now. >> > >> > % @@ -662,4 +675,5 @@ >> > % =A0 =A0* block. >> > % =A0 =A0*/ >> > % +if (bde_alloc8 =3D=3D 0) { >> > % =A0 if (bpref) >> > % =A0 =A0 =A0 =A0 =A0 start =3D dtogd(fs, bpref) / NBBY; >> > % @@ -679,4 +693,5 @@ >> > % =A0 =A0 =A0 =A0 =A0 } >> > % =A0 } >> > % +} >> > % >> > % =A0 bno =3D ext2_mapsearch(fs, bbp, bpref); >> > >> > The code to skip to the next 8-block boundary should be removed perman= ently. >> > After fixing the initialization, it doesn't generate holes inside file= s but >> > it still generates holes between files. =A0The holes are quite large w= ith >> > 4K-blocks. >> > >> > Benchmark results with just the initialization of `goal' variables res= tored: >> > >> > %%% >> > ext2fs-1024-1024: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 78.79 real =A0 =A0 =A0 = =A0 0.31 user =A0 =A0 =A0 =A0 4.94 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.62 real =A0 =A0 =A0 =A0 0.19= user =A0 =A0 =A0 =A0 1.82 sys >> > ext2fs-1024-1024-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 52.07 real =A0 =A0 =A0 = =A0 0.26 user =A0 =A0 =A0 =A0 4.95 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.80 real =A0 =A0 =A0 =A0 0.10= user =A0 =A0 =A0 =A0 1.93 sys >> > ext2fs-4096-4096: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 74.14 real =A0 =A0 =A0 = =A0 0.34 user =A0 =A0 =A0 =A0 3.96 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.82 real =A0 =A0 =A0 =A0 0.10= user =A0 =A0 =A0 =A0 1.19 sys >> > ext2fs-4096-4096-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 53.54 real =A0 =A0 =A0 = =A0 0.36 user =A0 =A0 =A0 =A0 3.87 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.91 real =A0 =A0 =A0 =A0 0.14= user =A0 =A0 =A0 =A0 1.15 sys >> > %%% >> > >> > The much larger holes between the files are apparently responsible for= the >> > decreased speed with 4K-blocks. =A01K-blocks are really too small, so = 4K-blocks >> > should be faster. >> > >> > Benchmark results with the fix and bde_alloc8 =3D 1. >> > >> > ext2fs-1024-1024: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 71.60 real =A0 =A0 =A0 = =A0 0.15 user =A0 =A0 =A0 =A0 2.04 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 22.34 real =A0 =A0 =A0 =A0 0.05= user =A0 =A0 =A0 =A0 0.79 sys >> > ext2fs-1024-1024-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 46.03 real =A0 =A0 =A0 = =A0 0.14 user =A0 =A0 =A0 =A0 2.02 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 21.97 real =A0 =A0 =A0 =A0 0.05= user =A0 =A0 =A0 =A0 0.80 sys >> > ext2fs-4096-4096: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 59.66 real =A0 =A0 =A0 = =A0 0.13 user =A0 =A0 =A0 =A0 1.63 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.88 real =A0 =A0 =A0 =A0 0.07= user =A0 =A0 =A0 =A0 0.46 sys >> > ext2fs-4096-4096-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 37.30 real =A0 =A0 =A0 = =A0 0.12 user =A0 =A0 =A0 =A0 1.60 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.93 real =A0 =A0 =A0 =A0 0.05= user =A0 =A0 =A0 =A0 0.49 sys >> > >> > Bruce >> >> Hi, >> >> I see what you are saying. The gap of 8 block between the files >> is due to the old preallocation which used to allocate additional >> 8 blocks in advance for a particular inode when allocating a block >> for it. The gap between blocks of the same file shouldn't be there >> too. Both of these cases should be removed. I will look into this >> during this week. The slowness is also due to lack of preallocation >> in the new code. > > One of the GSoC students worked on a patch to add preallocation back to > ext2fs this summer. =A0Would you be interested in reviewing and/or testin= g > that patch? =A0(I've attached it). =A0Here is his original e-mail: > > > Hi all, > > There is a patch in attachment which implements a preallocation > algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010. > > This patch implements the in-memory ext2/3 block preallocation algorithm > from reservation window. It uses a RB-tree to index block allocation > request and reserve a number of blocks for each file which has requested > to allocate a block. When a file request to allocate a block, it will > find a block to allocate to this file. When it find the block to > allocate, it will try to allocate a block, which is in the same cylinder > group with inode and is not in other reservation window in RB-tree. > Meanwhile there are some contiguous free blocks after this block. It > uses a data structure to store this block's position and the length of > contiguous free blocks. Then it inserts this data structure into > RB-tree. When this file request to allocate a block again, It will find > corresponding data structure in RB-tree. If it can find, the next free > block will be allocated to this file directly. Otherwise, it will search > a new block again. > > I have run some benchmarks to test this algorithm. Please review it in > wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance > is better when the number of threads is smaller than 4. When the number > of threads is greater than 4, the performance can be increased a little. > > Please test it. > > > Thanks and best regards, > > lz > Wow, this is really awesome! What are the chances of this code being committed before a 9.0 release (assuming we have enough user testing)? -Brandon From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 19:25:36 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AFE89106566B for ; Wed, 29 Sep 2010 19:25:36 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [76.96.30.56]) by mx1.freebsd.org (Postfix) with ESMTP id 959D78FC1A for ; Wed, 29 Sep 2010 19:25:36 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta06.emeryville.ca.mail.comcast.net with comcast id CcTG1f0070mlR8UA6jRcVP; Wed, 29 Sep 2010 19:25:36 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta11.emeryville.ca.mail.comcast.net with comcast id CjRa1f00T3LrwQ28XjRbow; Wed, 29 Sep 2010 19:25:35 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id C6B8F9B418; Wed, 29 Sep 2010 12:25:34 -0700 (PDT) Date: Wed, 29 Sep 2010 12:25:34 -0700 From: Jeremy Chadwick To: Torbjorn Kristoffersen Message-ID: <20100929192534.GA97031@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 19:25:36 -0000 On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote: > I have a ZFS "tank" called tpool, the server runs a couple of jails (each > with a zfs filesystem). There is a problem with one of these filesystems. > First, its disk usage as shown in ``df -h'': > ... > tpool/rb.org 100G 95G 4.6G 95% /jails/rb.org > ... > > The command ``zfs list'' shows the same: > .. > tpool/rb.org 95.4G 4.56G 95.4G /jails/rb.org > .. > > However, there is a very mysterious problem somewhere. > Something inside this jail is eating diskspace, but we can't find any > directories that is actually taking the diskspace. We first suspected either > fetchmail or spamassassin of causing a lot of space to be used, since some > of their directories were huge. (These were later deleted, and which is why > you see that 4.6GB is now available, before that 0GB was available). > > However, we can't find *any trace* of an actual directory or file that is > taking all the spac.e > > Take this for instance: > > outsidejail# du -sh rb.org > 43G rb.org > > How can this be? df and zfs are showing that the entire drive is nearly > full, yet I can't find any directory that is actually taking all this space. > I've carefully looked through every single directory within the jail trying > to find something that's taking all that space, but to no avail. > > ---- > My system stats: > # uname -a > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC > 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 > # zpool get version tpool > NAME PROPERTY VALUE SOURCE > tpool version 14 default > # zpool status > pool: tpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > ad4s1d ONLINE 0 0 0 > ad6s1d ONLINE 0 0 0 > > errors: No known data errors > > [ Note that I've also done a scrub recently ] 1) Have you checked using fstat to ensure that no file descriptors remain open on any of your ZFS filesystems (not pools)? 2) Are you using compression on any of your ZFS filesystems? -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 19:30:25 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27CD01065670 for ; Wed, 29 Sep 2010 19:30:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E9B368FC15 for ; Wed, 29 Sep 2010 19:30:24 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 9782D46B8B; Wed, 29 Sep 2010 15:30:24 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6A22B8A03C; Wed, 29 Sep 2010 15:30:23 -0400 (EDT) From: John Baldwin To: Brandon Gooch Date: Wed, 29 Sep 2010 15:30:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100929031825.L683@besplex.bde.org> <201009290917.05269.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009291530.13434.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 29 Sep 2010 15:30:23 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 19:30:25 -0000 On Wednesday, September 29, 2010 2:50:16 pm Brandon Gooch wrote: > On Wed, Sep 29, 2010 at 8:17 AM, John Baldwin wrote: > > On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote: > >> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote: > >> > On Wed, 29 Sep 2010, Bruce Evans wrote: > >> > > >> > > On Wed, 29 Sep 2010, Bruce Evans wrote: > >> > > > >> > >> For benchmarks on ext2fs: > >> > >> > >> > >> Under FreeBSD-~5.2 rerun today: > >> > >> untar: 59.17 real > >> > >> tar: 19.52 real > >> > >> > >> > >> Under -current run today: > >> > >> untar: 101.16 real > >> > >> tar: 172.03 real > >> > >> > >> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for > >> > >> untar. > >> > >> ... > >> > >> So it seems that only 1 block in every 8 is used, and there is a seek > >> > >> after every block. This asks for an 8-fold reduction in throughput, > >> > >> and it seems to have got that and a bit more for reading although not > >> > >> for writing. Even (or especially) with perfect hardware, it must give > >> > >> an 8-fold reduction. And it is likely to give more, since it defeats > >> > >> vfs clustering by making all runs of contiguous blocks have length 1. > >> > >> > >> > >> Simple sequential allocation should be used unless the allocation policy > >> > >> and implementation are very good. > >> > > > >> > > This work a bit better after zapping the 8-fold way: > >> > Things > >> > > ... > >> > > This gives an improvement of: > >> > > > >> > > untar: 101.16 real -> 63.46 > >> > > tar: 172.03 real -> 50.70 > >> > > > >> > > Now -current is only 1.1 times slower for untar and 2.6 times slower for > >> > > tar. > >> > > > >> > > There must be a problem with bpref for things to have been so bad. There > >> > > is some point to leaving a gap of 7 blocks for expansion, but the gap was > >> > > left even between blocks in a single file. > >> > > ... > >> > > I haven't tried the bde_blkpref hack in the above. It should kill bpref > >> > > completely so that there is no jump between lbn0 and lbn1, and break > >> > > cylinder group based allocation even better. Setting bde_blkpref to 1 > >> > > restores the bug that was present in ext2fs in FreeBSD between 1995 and > >> > > 2010. This bug gave seqential allocation starting at the beginning of > >> > > the disk in almost all cases, so map searches were slow and early groups > >> > > filled up before later groups were used at all. > >> > > >> > Tried this (patch repeated below), and it gave essentially the same > >> > speed as old versions. > >> > > >> > The main problem seems to be that the `goal' variables aren't initialized. > >> > After restoring bits verbatim from an old version, things seem to work as > >> > expected: > >> > > >> > % Index: ext2_alloc.c > >> > % =================================================================== > >> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v > >> > % retrieving revision 1.2 > >> > % diff -u -2 -r1.2 ext2_alloc.c > >> > % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2 > >> > % +++ ext2_alloc.c 28 Sep 2010 21:08:42 -0000 > >> > % @@ -1,2 +1,5 @@ > >> > % +int bde_blkpref = 0; > >> > % +int bde_alloc8 = 0; > >> > % + > >> > % /*- > >> > % * modified for Lites 1.1 > >> > % @@ -117,4 +120,8 @@ > >> > % ext2_alloccg); > >> > % if (bno > 0) { > >> > % + /* set next_alloc fields as done in block_getblk */ > >> > % + ip->i_next_alloc_block = lbn; > >> > % + ip->i_next_alloc_goal = bno; > >> > % + > >> > % ip->i_blocks += btodb(fs->e2fs_bsize); > >> > % ip->i_flag |= IN_CHANGE | IN_UPDATE; > >> > > >> > The only things that changed recently in this block were the 4 deleted > >> > lines and 4 lines with tabs corrupted to spaces. Perhaps an editing > >> > error. > >> > > >> > % @@ -542,6 +549,12 @@ > >> > % then set the goal to what we thought it should be > >> > % */ > >> > % +if (bde_blkpref == 0) { > >> > % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) > >> > % return ip->i_next_alloc_goal; > >> > % +} else if (bde_blkpref == 1) { > >> > % + if(ip->i_next_alloc_block == lbn) > >> > % + return ip->i_next_alloc_goal; > >> > % +} else > >> > % + return 0; > >> > % > >> > % /* now check whether we were provided with an array that basically > >> > > >> > Not needed now. > >> > > >> > % @@ -662,4 +675,5 @@ > >> > % * block. > >> > % */ > >> > % +if (bde_alloc8 == 0) { > >> > % if (bpref) > >> > % start = dtogd(fs, bpref) / NBBY; > >> > % @@ -679,4 +693,5 @@ > >> > % } > >> > % } > >> > % +} > >> > % > >> > % bno = ext2_mapsearch(fs, bbp, bpref); > >> > > >> > The code to skip to the next 8-block boundary should be removed permanently. > >> > After fixing the initialization, it doesn't generate holes inside files but > >> > it still generates holes between files. The holes are quite large with > >> > 4K-blocks. > >> > > >> > Benchmark results with just the initialization of `goal' variables restored: > >> > > >> > %%% > >> > ext2fs-1024-1024: > >> > tarcp /f srcs: 78.79 real 0.31 user 4.94 sys > >> > tar cf /dev/zero srcs: 24.62 real 0.19 user 1.82 sys > >> > ext2fs-1024-1024-as: > >> > tarcp /f srcs: 52.07 real 0.26 user 4.95 sys > >> > tar cf /dev/zero srcs: 24.80 real 0.10 user 1.93 sys > >> > ext2fs-4096-4096: > >> > tarcp /f srcs: 74.14 real 0.34 user 3.96 sys > >> > tar cf /dev/zero srcs: 33.82 real 0.10 user 1.19 sys > >> > ext2fs-4096-4096-as: > >> > tarcp /f srcs: 53.54 real 0.36 user 3.87 sys > >> > tar cf /dev/zero srcs: 33.91 real 0.14 user 1.15 sys > >> > %%% > >> > > >> > The much larger holes between the files are apparently responsible for the > >> > decreased speed with 4K-blocks. 1K-blocks are really too small, so 4K- blocks > >> > should be faster. > >> > > >> > Benchmark results with the fix and bde_alloc8 = 1. > >> > > >> > ext2fs-1024-1024: > >> > tarcp /f srcs: 71.60 real 0.15 user 2.04 sys > >> > tar cf /dev/zero srcs: 22.34 real 0.05 user 0.79 sys > >> > ext2fs-1024-1024-as: > >> > tarcp /f srcs: 46.03 real 0.14 user 2.02 sys > >> > tar cf /dev/zero srcs: 21.97 real 0.05 user 0.80 sys > >> > ext2fs-4096-4096: > >> > tarcp /f srcs: 59.66 real 0.13 user 1.63 sys > >> > tar cf /dev/zero srcs: 19.88 real 0.07 user 0.46 sys > >> > ext2fs-4096-4096-as: > >> > tarcp /f srcs: 37.30 real 0.12 user 1.60 sys > >> > tar cf /dev/zero srcs: 19.93 real 0.05 user 0.49 sys > >> > > >> > Bruce > >> > >> Hi, > >> > >> I see what you are saying. The gap of 8 block between the files > >> is due to the old preallocation which used to allocate additional > >> 8 blocks in advance for a particular inode when allocating a block > >> for it. The gap between blocks of the same file shouldn't be there > >> too. Both of these cases should be removed. I will look into this > >> during this week. The slowness is also due to lack of preallocation > >> in the new code. > > > > One of the GSoC students worked on a patch to add preallocation back to > > ext2fs this summer. Would you be interested in reviewing and/or testing > > that patch? (I've attached it). Here is his original e-mail: > > > > > > Hi all, > > > > There is a patch in attachment which implements a preallocation > > algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010. > > > > This patch implements the in-memory ext2/3 block preallocation algorithm > > from reservation window. It uses a RB-tree to index block allocation > > request and reserve a number of blocks for each file which has requested > > to allocate a block. When a file request to allocate a block, it will > > find a block to allocate to this file. When it find the block to > > allocate, it will try to allocate a block, which is in the same cylinder > > group with inode and is not in other reservation window in RB-tree. > > Meanwhile there are some contiguous free blocks after this block. It > > uses a data structure to store this block's position and the length of > > contiguous free blocks. Then it inserts this data structure into > > RB-tree. When this file request to allocate a block again, It will find > > corresponding data structure in RB-tree. If it can find, the next free > > block will be allocated to this file directly. Otherwise, it will search > > a new block again. > > > > I have run some benchmarks to test this algorithm. Please review it in > > wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance > > is better when the number of threads is smaller than 4. When the number > > of threads is greater than 4, the performance can be increased a little. > > > > Please test it. > > > > > > Thanks and best regards, > > > > lz > > > > Wow, this is really awesome! What are the chances of this code being > committed before a 9.0 release (assuming we have enough user testing)? Good if it gets testing and review. He also worked on read-only support for ext4 (in a second patch). Both patches were posted to this list (fs@) several weeks ago. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 20:52:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38EA01065672; Wed, 29 Sep 2010 20:52:42 +0000 (UTC) (envelope-from sarawgi.aditya@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 03CA38FC18; Wed, 29 Sep 2010 20:52:40 +0000 (UTC) Received: by pzk7 with SMTP id 7so371564pzk.13 for ; Wed, 29 Sep 2010 13:52:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=P/PES2pIGV/yZ0yr5D/nGkQUodVYXjaVN7tf6/D/EvE=; b=ODQV8y+iFjfsczbMuHQZp4r+/4gEDxM7oTlQEWRkLv910xyjaMYwuzZ4TZF3lOPs3+ FZyQBsoHRTve7W9Unf56P1UH47yI7n+KGamQIxbQtDx1yu3UxEYTTkcFJk+rhsiHtk4q dqwUzqpkh5c7GcM/z7PzPmGOryFzYQWk0XGWw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=MM1W3/+bqnduqQgcYtwATTvXzyTWbugtzMRseD/qyiJpgIXoMM+UAGe46jh5lL7Q9s E3JjF2ZLln1pE/6qTKJ6AoPj5tBerv1epcD7vBmTItCqb8SxymaaO1ds1bazm0f65Z9a /lPhbS1z38SANed2QCNaKI1EPzP3G7HH2umtU= Received: by 10.142.191.10 with SMTP id o10mr1994505wff.16.1285791806679; Wed, 29 Sep 2010 13:23:26 -0700 (PDT) Received: from aditya ([183.87.49.240]) by mx.google.com with ESMTPS id o9sm10653265wfd.4.2010.09.29.13.23.24 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 29 Sep 2010 13:23:26 -0700 (PDT) Date: Thu, 30 Sep 2010 01:55:29 +0530 From: Aditya Sarawgi To: John Baldwin Message-ID: <20100929202526.GA1564@aditya> References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201009290917.05269.jhb@freebsd.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 20:52:42 -0000 [snip] > > I see what you are saying. The gap of 8 block between the files > > is due to the old preallocation which used to allocate additional > > 8 blocks in advance for a particular inode when allocating a block > > for it. The gap between blocks of the same file shouldn't be there > > too. Both of these cases should be removed. I will look into this > > during this week. The slowness is also due to lack of preallocation > > in the new code. > > One of the GSoC students worked on a patch to add preallocation back to > ext2fs this summer. Would you be interested in reviewing and/or testing > that patch? (I've attached it). Here is his original e-mail: [snip] Hi John, I did a review of Zheng Liu's reservation window patch last week and I suggested him a few changes. Otherwise the code looks awesome. But it would be great if someone else can review the patch too and if everything goes well, we should merge this to HEAD. For the ext4 part, I still have to review his patches and I am planning to do it soon. Zheng is planning to have a separate module for ext4, and it does make sense. We are aiming at bringing ext4 to a usable state for 9-RELEASE (atleast read-only). Cheers Aditya Sarawgi From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 22:11:12 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5AE8D106566C for ; Wed, 29 Sep 2010 22:11:12 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id C49AB8FC19 for ; Wed, 29 Sep 2010 22:11:11 +0000 (UTC) Received: by bwz15 with SMTP id 15so1261483bwz.13 for ; Wed, 29 Sep 2010 15:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=ITbXkRueR7DEq0EkNFGyTjw/gfv1J1W6lhjlXTA/P0c=; b=txYN2aYlFiRnPRJQlvZXRVuG5HAY68CdQr+MbzIeEtgH3sMbV0+1umWMJPnmzjcOoc jdRn9DxDVvJZT82c9kEVRpVRtRy7rXTCE9pjDIsR0lcrAVsQ+yBeQAAhzU1DJZAMFCbq zCsjlWfwxCQpOQaYPvq0nZynQKd3l+poFOOko= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=TLoivsc+bGi2PM65lHWEcQRlOfxwPkNE9w7pDBl5PH1LSmqW1yJwQQZYuBO/ksQInl BsungBjZ8qkcbTLNSJjFzM/l0L0xr+HtV8AWdNgwf68dEnZJAlfm5xA+H1wTIjgnFYV6 Bpr9AuMb2ZQyTJQOOsDNmid301lUbIvbEBiE0= MIME-Version: 1.0 Received: by 10.204.118.65 with SMTP id u1mr1796491bkq.169.1285798269363; Wed, 29 Sep 2010 15:11:09 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Wed, 29 Sep 2010 15:11:09 -0700 (PDT) In-Reply-To: References: <20100929192534.GA97031@icarus.home.lan> Date: Thu, 30 Sep 2010 00:11:09 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 22:11:12 -0000 I'm at a complete loss here. I shut down the jail completely, and I am watching the jail's ZFS filesystem grow as we speak. No process is using it. It only grows in "df" and "zfs list", I can't find any files that are growing. I have to re-set the quota to be higher and higher to accommodate the space. On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen < torbjoern@gmail.com> wrote: > Hi Jeremy. > > 1) I checked now, and found nothing extraordinary. Just processes that have > been running for a long while, such as screen, cron, sshd, bash, irssi, > syslogd, etc. > > 2) No compression used on this zfs filesystem (or any of the others). > > I completedly stopped the jail now, and removed some of the directories > with the most data in them, but to no avail. > > > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick > wrote: > >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote: >> > I have a ZFS "tank" called tpool, the server runs a couple of jails >> (each >> > with a zfs filesystem). There is a problem with one of these >> filesystems. >> > First, its disk usage as shown in ``df -h'': >> > ... >> > tpool/rb.org 100G 95G 4.6G 95% /jails/rb.org >> > ... >> > >> > The command ``zfs list'' shows the same: >> > .. >> > tpool/rb.org 95.4G 4.56G 95.4G /jails/rb.org >> > .. >> > >> > However, there is a very mysterious problem somewhere. >> > Something inside this jail is eating diskspace, but we can't find any >> > directories that is actually taking the diskspace. We first suspected >> either >> > fetchmail or spamassassin of causing a lot of space to be used, since >> some >> > of their directories were huge. (These were later deleted, and which is >> why >> > you see that 4.6GB is now available, before that 0GB was available). >> > >> > However, we can't find *any trace* of an actual directory or file that >> is >> > taking all the spac.e >> > >> > Take this for instance: >> > >> > outsidejail# du -sh rb.org >> > 43G rb.org >> > >> > How can this be? df and zfs are showing that the entire drive is nearly >> > full, yet I can't find any directory that is actually taking all this >> space. >> > I've carefully looked through every single directory within the jail >> trying >> > to find something that's taking all that space, but to no avail. >> > >> > ---- >> > My system stats: >> > # uname -a >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC >> > 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >> > # zpool get version tpool >> > NAME PROPERTY VALUE SOURCE >> > tpool version 14 default >> > # zpool status >> > pool: tpool >> > state: ONLINE >> > scrub: none requested >> > config: >> > >> > NAME STATE READ WRITE CKSUM >> > tpool ONLINE 0 0 0 >> > mirror ONLINE 0 0 0 >> > ad4s1d ONLINE 0 0 0 >> > ad6s1d ONLINE 0 0 0 >> > >> > errors: No known data errors >> > >> > [ Note that I've also done a scrub recently ] >> >> 1) Have you checked using fstat to ensure that no file descriptors >> remain open on any of your ZFS filesystems (not pools)? >> >> 2) Are you using compression on any of your ZFS filesystems? >> >> -- >> | Jeremy Chadwick jdc@parodius.com | >> | Parodius Networking http://www.parodius.com/ | >> | UNIX Systems Administrator Mountain View, CA, USA | >> | Making life hard for others since 1977. PGP: 4BD6C0CB | >> >> > From owner-freebsd-fs@FreeBSD.ORG Wed Sep 29 22:15:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 910CE1065673 for ; Wed, 29 Sep 2010 22:15:52 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.westchester.pa.mail.comcast.net (qmta03.westchester.pa.mail.comcast.net [76.96.62.32]) by mx1.freebsd.org (Postfix) with ESMTP id 3A0288FC12 for ; Wed, 29 Sep 2010 22:15:51 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta03.westchester.pa.mail.comcast.net with comcast id CbLE1f0061uE5Es53mFsit; Wed, 29 Sep 2010 22:15:52 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta16.westchester.pa.mail.comcast.net with comcast id CmFq1f00E3LrwQ23cmFrtZ; Wed, 29 Sep 2010 22:15:52 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6B6099B418; Wed, 29 Sep 2010 15:15:49 -0700 (PDT) Date: Wed, 29 Sep 2010 15:15:49 -0700 From: Jeremy Chadwick To: Torbjorn Kristoffersen Message-ID: <20100929221549.GA343@icarus.home.lan> References: <20100929192534.GA97031@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek , Andriy Gapon Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2010 22:15:52 -0000 On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote: > I'm at a complete loss here. I shut down the jail completely, and I am > watching the jail's ZFS filesystem grow as we speak. No process is using > it. It only grows in "df" and "zfs list", I can't find any files that are > growing. I have to re-set the quota to be higher and higher to accommodate > the space. > > On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen < > torbjoern@gmail.com> wrote: > > > Hi Jeremy. > > > > 1) I checked now, and found nothing extraordinary. Just processes that have > > been running for a long while, such as screen, cron, sshd, bash, irssi, > > syslogd, etc. > > > > 2) No compression used on this zfs filesystem (or any of the others). > > > > I completedly stopped the jail now, and removed some of the directories > > with the most data in them, but to no avail. > > > > > > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick > > wrote: > > > >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote: > >> > I have a ZFS "tank" called tpool, the server runs a couple of jails > >> (each > >> > with a zfs filesystem). There is a problem with one of these > >> filesystems. > >> > First, its disk usage as shown in ``df -h'': > >> > ... > >> > tpool/rb.org 100G 95G 4.6G 95% /jails/rb.org > >> > ... > >> > > >> > The command ``zfs list'' shows the same: > >> > .. > >> > tpool/rb.org 95.4G 4.56G 95.4G /jails/rb.org > >> > .. > >> > > >> > However, there is a very mysterious problem somewhere. > >> > Something inside this jail is eating diskspace, but we can't find any > >> > directories that is actually taking the diskspace. We first suspected > >> either > >> > fetchmail or spamassassin of causing a lot of space to be used, since > >> some > >> > of their directories were huge. (These were later deleted, and which is > >> why > >> > you see that 4.6GB is now available, before that 0GB was available). > >> > > >> > However, we can't find *any trace* of an actual directory or file that > >> is > >> > taking all the spac.e > >> > > >> > Take this for instance: > >> > > >> > outsidejail# du -sh rb.org > >> > 43G rb.org > >> > > >> > How can this be? df and zfs are showing that the entire drive is nearly > >> > full, yet I can't find any directory that is actually taking all this > >> space. > >> > I've carefully looked through every single directory within the jail > >> trying > >> > to find something that's taking all that space, but to no avail. > >> > > >> > ---- > >> > My system stats: > >> > # uname -a > >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC > >> > 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 > >> > # zpool get version tpool > >> > NAME PROPERTY VALUE SOURCE > >> > tpool version 14 default > >> > # zpool status > >> > pool: tpool > >> > state: ONLINE > >> > scrub: none requested > >> > config: > >> > > >> > NAME STATE READ WRITE CKSUM > >> > tpool ONLINE 0 0 0 > >> > mirror ONLINE 0 0 0 > >> > ad4s1d ONLINE 0 0 0 > >> > ad6s1d ONLINE 0 0 0 > >> > > >> > errors: No known data errors > >> > > >> > [ Note that I've also done a scrub recently ] > >> > >> 1) Have you checked using fstat to ensure that no file descriptors > >> remain open on any of your ZFS filesystems (not pools)? > >> > >> 2) Are you using compression on any of your ZFS filesystems? Andriy and Pawel, Do either of you have ideas as to what could cause the issue Torbjorn's experiencing? I swear I remember some bug or quirk that got fixed with regards to free space on ZFS, but as has been proven time and time again my memory is horrible. His kernel's 8.1-RELEASE dated July 19th. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 06:36:02 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7953C1065670 for ; Thu, 30 Sep 2010 06:36:02 +0000 (UTC) (envelope-from cal@linu.gs) Received: from mxout006.mail.hostpoint.ch (mxout006.mail.hostpoint.ch [217.26.49.185]) by mx1.freebsd.org (Postfix) with ESMTP id 39D068FC1C for ; Thu, 30 Sep 2010 06:36:02 +0000 (UTC) Received: from [10.0.2.10] (helo=asmtp001.mail.hostpoint.ch) by mxout006.mail.hostpoint.ch with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1P1CJV-0000eu-3u for freebsd-fs@freebsd.org; Thu, 30 Sep 2010 08:08:17 +0200 Received: from [46.127.80.198] (helo=helvetia.localnet) by asmtp001.mail.hostpoint.ch with esmtpa (Exim 4.72 (FreeBSD)) (envelope-from ) id 1P1CJV-000CzQ-0L for freebsd-fs@freebsd.org; Thu, 30 Sep 2010 08:08:17 +0200 X-Authenticated-Sender-Id: cal@rubberfrog.net From: Michael Naef To: "freebsd-fs" Date: Thu, 30 Sep 2010 08:08:11 +0200 User-Agent: KMail/1.13.2 (Linux/2.6.32-24-generic; KDE/4.4.2; x86_64; ; ) References: <201009231938.09548.cal@linu.gs> <201009271624.46655.cal@linu.gs> <20100927181233.0e8c2869@ernst.jennejohn.org> In-Reply-To: <20100927181233.0e8c2869@ernst.jennejohn.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009300808.12514.cal@linu.gs> Subject: Re: Strange behaviour with sappend flag set on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 06:36:02 -0000 Hi > Sending a PR is always a good idea - it ends up in the tracking > system and doesn't get lost in the mailing-list noise. It's now (hopefully not) getting lost in the PR system: http://www.freebsd.org/cgi/query-pr.cgi?pr=151082 Thank you all, Michi From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 08:09:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E5451065670 for ; Thu, 30 Sep 2010 08:09:48 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from mail.tdx.com (mail.tdx.com [62.13.128.18]) by mx1.freebsd.org (Postfix) with ESMTP id 0EDFD8FC14 for ; Thu, 30 Sep 2010 08:09:47 +0000 (UTC) Received: from HexaDeca64.dmpriest.net.uk (HPQuadro64.dmpriest.net.uk [62.13.130.30]) (authenticated bits=0) by mail.tdx.com (8.14.3/8.14.3/Kp) with ESMTP id o8U89k47047154 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Thu, 30 Sep 2010 09:09:46 +0100 (BST) Date: Thu, 30 Sep 2010 09:08:39 +0100 From: Karl Pielorz To: Jeremy Chadwick Message-ID: In-Reply-To: <20100929151111.GA91705@icarus.home.lan> References: <20100929151111.GA91705@icarus.home.lan> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 08:09:48 -0000 --On 29 September 2010 08:11 -0700 Jeremy Chadwick wrote: > I can review these statistics to see if any of the disks look like they > may be misbehaving. I had to back off to 7.3-R. Unfortunately the machine is the 'everything' server at home (routing, dhcp, storage, mail etc.) - so it wasn't proving very popular messing around with it :( I put 7.3-R back on - everything works as it did. ZFS re-scrubbed the pools, and I'm good to go. We have a 'very similar' machine at the office (same controllers etc.) - I'll see if I can get enough drives together and run that up. If that fails as well it's a much better platform to debug on :) -Kp From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 08:37:02 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23B07106566B; Thu, 30 Sep 2010 08:37:02 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 979938FC14; Thu, 30 Sep 2010 08:37:01 +0000 (UTC) Received: from outgoing.leidinger.net (p57B3ABE8.dip.t-dialin.net [87.179.171.232]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D986684400A; Thu, 30 Sep 2010 10:36:56 +0200 (CEST) Received: from webmail.leidinger.net (unknown [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id D41A91198; Thu, 30 Sep 2010 10:36:53 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o8U8alK8006368; Thu, 30 Sep 2010 10:36:47 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 30 Sep 2010 10:36:47 +0200 Message-ID: <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> Date: Thu, 30 Sep 2010 10:36:47 +0200 From: Alexander Leidinger To: Jeremy Chadwick References: <20100929192534.GA97031@icarus.home.lan> <20100929221549.GA343@icarus.home.lan> In-Reply-To: <20100929221549.GA343@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D986684400A.A90B1 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=2.028, required 6, autolearn=disabled, J_CHICKENPOX_41 0.60, RDNS_NONE 1.27, TW_JL 0.08, TW_ZF 0.08) X-EBL-MailScanner-SpamScore: ss X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1286440618.16161@dE0iwi07uJgVRYkV8syYQQ X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek , Andriy Gapon Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 08:37:02 -0000 Quoting Jeremy Chadwick (from Wed, 29 Sep 2010 15:15:49 -0700): > On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote: >> I'm at a complete loss here. I shut down the jail completely, and I am >> watching the jail's ZFS filesystem grow as we speak. No process is using >> it. It only grows in "df" and "zfs list", I can't find any files that are >> growing. I have to re-set the quota to be higher and higher to accommodate >> the space. >> >> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen < >> torbjoern@gmail.com> wrote: >> >> > Hi Jeremy. >> > >> > 1) I checked now, and found nothing extraordinary. Just processes >> that have >> > been running for a long while, such as screen, cron, sshd, bash, irssi, >> > syslogd, etc. >> > >> > 2) No compression used on this zfs filesystem (or any of the others). >> > >> > I completedly stopped the jail now, and removed some of the directories >> > with the most data in them, but to no avail. >> > >> > >> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick > > > wrote: >> > >> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen wrote: >> >> > I have a ZFS "tank" called tpool, the server runs a couple of jails >> >> (each >> >> > with a zfs filesystem). There is a problem with one of these >> >> filesystems. >> >> > First, its disk usage as shown in ``df -h'': >> >> > ... >> >> > tpool/rb.org 100G 95G 4.6G 95% /jails/rb.org >> >> > ... >> >> > >> >> > The command ``zfs list'' shows the same: >> >> > .. >> >> > tpool/rb.org 95.4G 4.56G 95.4G /jails/rb.org >> >> > .. >> >> > >> >> > However, there is a very mysterious problem somewhere. >> >> > Something inside this jail is eating diskspace, but we can't find any >> >> > directories that is actually taking the diskspace. We first suspected >> >> either >> >> > fetchmail or spamassassin of causing a lot of space to be used, since >> >> some >> >> > of their directories were huge. (These were later deleted, and which is >> >> why >> >> > you see that 4.6GB is now available, before that 0GB was available). >> >> > >> >> > However, we can't find *any trace* of an actual directory or file that >> >> is >> >> > taking all the spac.e >> >> > >> >> > Take this for instance: >> >> > >> >> > outsidejail# du -sh rb.org >> >> > 43G rb.org >> >> > >> >> > How can this be? df and zfs are showing that the entire drive >> is nearly >> >> > full, yet I can't find any directory that is actually taking all this >> >> space. >> >> > I've carefully looked through every single directory within the jail >> >> trying >> >> > to find something that's taking all that space, but to no avail. >> >> > >> >> > ---- >> >> > My system stats: >> >> > # uname -a >> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 >> 02:36:49 UTC >> >> > 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >> >> > # zpool get version tpool >> >> > NAME PROPERTY VALUE SOURCE >> >> > tpool version 14 default >> >> > # zpool status >> >> > pool: tpool >> >> > state: ONLINE >> >> > scrub: none requested >> >> > config: >> >> > >> >> > NAME STATE READ WRITE CKSUM >> >> > tpool ONLINE 0 0 0 >> >> > mirror ONLINE 0 0 0 >> >> > ad4s1d ONLINE 0 0 0 >> >> > ad6s1d ONLINE 0 0 0 >> >> > >> >> > errors: No known data errors >> >> > >> >> > [ Note that I've also done a scrub recently ] >> >> >> >> 1) Have you checked using fstat to ensure that no file descriptors >> >> remain open on any of your ZFS filesystems (not pools)? >> >> >> >> 2) Are you using compression on any of your ZFS filesystems? > > Andriy and Pawel, > > Do either of you have ideas as to what could cause the issue Torbjorn's > experiencing? I swear I remember some bug or quirk that got fixed with > regards to free space on ZFS, but as has been proven time and time again > my memory is horrible. His kernel's 8.1-RELEASE dated July 19th. IIRC the commit you talk about was by Martin (CCed). I do not know if it is (already) MFCed. I'm not sure the bug you talk about is related to what Torbjorn is talking about. The fact that the free space is going down while the jail is shutdown (and I assume jls does not show his JID anymore, so all of its processes are really gone) points more to some other process (outside of the jail) which is filling some (maybe already deleted, so not visible anymore with du) file. Bye, Alexander. -- A wide-eyed, innocent UNICORN, poised delicately in a MEADOW filled with LILACS, LOLLIPOPS & small CHILDREN at the HUSH of twilight?? http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 09:11:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01F2610656A4 for ; Thu, 30 Sep 2010 09:11:48 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184]) by mx1.freebsd.org (Postfix) with ESMTP id BFF038FC15 for ; Thu, 30 Sep 2010 09:11:47 +0000 (UTC) Received: from [203.206.171.212] (helo=[192.168.10.10]) by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1P1FBZ-0003Ii-Gg for freebsd-fs@freebsd.org; Thu, 30 Sep 2010 19:12:19 +1000 Message-ID: <4CA45444.6070002@dannysplace.net> Date: Thu, 30 Sep 2010 19:11:32 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20100929192534.GA97031@icarus.home.lan> <20100929221549.GA343@icarus.home.lan> <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> In-Reply-To: <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29) X-Date: 2010-09-30 19:12:17 X-Connected-IP: 203.206.171.212:57823 X-Message-Linecount: 159 X-Body-Linecount: 145 X-Message-Size: 5967 X-Body-Size: 5077 X-Received-Count: 1 X-Recipient-Count: 1 X-Local-Recipient-Count: 1 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on damka.dannysplace.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net) Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 09:11:48 -0000 On 30/09/2010 6:36 PM, Alexander Leidinger wrote: > > Quoting Jeremy Chadwick (from Wed, 29 Sep > 2010 15:15:49 -0700): > >> On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote: >>> I'm at a complete loss here. I shut down the jail completely, and I am >>> watching the jail's ZFS filesystem grow as we speak. No process is >>> using >>> it. It only grows in "df" and "zfs list", I can't find any files >>> that are >>> growing. I have to re-set the quota to be higher and higher to >>> accommodate >>> the space. >>> >>> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen < >>> torbjoern@gmail.com> wrote: >>> >>> > Hi Jeremy. >>> > >>> > 1) I checked now, and found nothing extraordinary. Just processes >>> that have >>> > been running for a long while, such as screen, cron, sshd, bash, >>> irssi, >>> > syslogd, etc. >>> > >>> > 2) No compression used on this zfs filesystem (or any of the others). >>> > >>> > I completedly stopped the jail now, and removed some of the >>> directories >>> > with the most data in them, but to no avail. >>> > >>> > >>> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick >>> >> > > wrote: >>> > >>> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen >>> wrote: >>> >> > I have a ZFS "tank" called tpool, the server runs a couple of >>> jails >>> >> (each >>> >> > with a zfs filesystem). There is a problem with one of these >>> >> filesystems. >>> >> > First, its disk usage as shown in ``df -h'': >>> >> > ... >>> >> > tpool/rb.org 100G 95G 4.6G 95% /jails/rb.org >>> >> > ... >>> >> > >>> >> > The command ``zfs list'' shows the same: >>> >> > .. >>> >> > tpool/rb.org 95.4G 4.56G 95.4G /jails/rb.org >>> >> > .. >>> >> > >>> >> > However, there is a very mysterious problem somewhere. >>> >> > Something inside this jail is eating diskspace, but we can't >>> find any >>> >> > directories that is actually taking the diskspace. We first >>> suspected >>> >> either >>> >> > fetchmail or spamassassin of causing a lot of space to be used, >>> since >>> >> some >>> >> > of their directories were huge. (These were later deleted, and >>> which is >>> >> why >>> >> > you see that 4.6GB is now available, before that 0GB was >>> available). >>> >> > >>> >> > However, we can't find *any trace* of an actual directory or >>> file that >>> >> is >>> >> > taking all the spac.e >>> >> > >>> >> > Take this for instance: >>> >> > >>> >> > outsidejail# du -sh rb.org >>> >> > 43G rb.org >>> >> > >>> >> > How can this be? df and zfs are showing that the entire drive >>> is nearly >>> >> > full, yet I can't find any directory that is actually taking >>> all this >>> >> space. >>> >> > I've carefully looked through every single directory within >>> the jail >>> >> trying >>> >> > to find something that's taking all that space, but to no avail. >>> >> > >>> >> > ---- >>> >> > My system stats: >>> >> > # uname -a >>> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 >>> 02:36:49 UTC >>> >> > 2010 >>> root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >>> >> > # zpool get version tpool >>> >> > NAME PROPERTY VALUE SOURCE >>> >> > tpool version 14 default >>> >> > # zpool status >>> >> > pool: tpool >>> >> > state: ONLINE >>> >> > scrub: none requested >>> >> > config: >>> >> > >>> >> > NAME STATE READ WRITE CKSUM >>> >> > tpool ONLINE 0 0 0 >>> >> > mirror ONLINE 0 0 0 >>> >> > ad4s1d ONLINE 0 0 0 >>> >> > ad6s1d ONLINE 0 0 0 >>> >> > >>> >> > errors: No known data errors >>> >> > >>> >> > [ Note that I've also done a scrub recently ] >>> >> >>> >> 1) Have you checked using fstat to ensure that no file descriptors >>> >> remain open on any of your ZFS filesystems (not pools)? >>> >> >>> >> 2) Are you using compression on any of your ZFS filesystems? >> >> Andriy and Pawel, >> >> Do either of you have ideas as to what could cause the issue Torbjorn's >> experiencing? I swear I remember some bug or quirk that got fixed with >> regards to free space on ZFS, but as has been proven time and time again >> my memory is horrible. His kernel's 8.1-RELEASE dated July 19th. > > IIRC the commit you talk about was by Martin (CCed). I do not know if > it is (already) MFCed. > > I'm not sure the bug you talk about is related to what Torbjorn is > talking about. The fact that the free space is going down while the > jail is shutdown (and I assume jls does not show his JID anymore, so > all of its processes are really gone) points more to some other > process (outside of the jail) which is filling some (maybe already > deleted, so not visible anymore with du) file. > It certainly smells like a process still writing to a file that is unlinked. I wonder if it would show up with lsof. If dtrace is enabled on that machine then I think it should be easy to see which process is performing write operations. -D From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 09:17:53 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E08631065693 for ; Thu, 30 Sep 2010 09:17:53 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id C5FD78FC0A for ; Thu, 30 Sep 2010 09:17:53 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta09.emeryville.ca.mail.comcast.net with comcast id CxC91f0070x6nqcA9xHtG2; Thu, 30 Sep 2010 09:17:53 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta12.emeryville.ca.mail.comcast.net with comcast id CxHr1f0083LrwQ28YxHsdd; Thu, 30 Sep 2010 09:17:52 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B9C539B418; Thu, 30 Sep 2010 02:17:51 -0700 (PDT) Date: Thu, 30 Sep 2010 02:17:51 -0700 From: Jeremy Chadwick To: Karl Pielorz Message-ID: <20100930091751.GA13840@icarus.home.lan> References: <20100929151111.GA91705@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1-R/amd64 - zfs 'hangs' - help tracing? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 09:17:54 -0000 On Thu, Sep 30, 2010 at 09:08:39AM +0100, Karl Pielorz wrote: > --On 29 September 2010 08:11 -0700 Jeremy Chadwick > wrote: > > >I can review these statistics to see if any of the disks look like they > >may be misbehaving. > > I had to back off to 7.3-R. Unfortunately the machine is the > 'everything' server at home (routing, dhcp, storage, mail etc.) - so > it wasn't proving very popular messing around with it :( > > I put 7.3-R back on - everything works as it did. ZFS re-scrubbed > the pools, and I'm good to go. > > We have a 'very similar' machine at the office (same controllers > etc.) - I'll see if I can get enough drives together and run that > up. If that fails as well it's a much better platform to debug on :) All I'm interested in at this point are the drives in the machine which is having the problem. It doesn't matter if you're running smartctl on 7.3-RELEASE or 8.x -- the disk SMART stats will be the same. So if you can provide them, I can review them and point to issues which might explain disk I/O deadlock. You did mention some bad sectors, and with regards to those, people very often misread the attributes and assume the wrong thing (meaning, "oh looks like the disk found some bad LBAs and so they're fixed" when the situation is actually "the LBAs are bad and not fully remapped" which can cause I/O deadlock if those blocks are read and/or sometimes written to). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 13:28:27 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3607F1065674 for ; Thu, 30 Sep 2010 13:28:27 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id AC24F8FC1C for ; Thu, 30 Sep 2010 13:28:26 +0000 (UTC) Received: by bwz15 with SMTP id 15so1764988bwz.13 for ; Thu, 30 Sep 2010 06:28:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Ip0w/go5Hd/apJrlZ+IfX/SJ3XmKiOK97viq1nla3lk=; b=uGe3Uqf1tyQOUJkHj7obrWi0PxkF9VOFNn4P7KcPSQMQIfIypcolfsGF6xCIag1YYH hnsYLHdNHiRse0s6ixukUXAjQCjLcO/64xySTj2GYMrhQzRR71UjZjT2X5UW3m9NHydD Nsxh+350mLvW++/9N4p8Bi4gD+YKHwxJEwKlk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=baBuh7MX3QnIirYZ7jJ4oPFISvTddUifvnloV4a8zmPepYGtS3MTbo8RJ+muz//3+w S++RwEORWQJnSjGbvGQBxwLqFWgsQjybaPluUpWtjNCPlx9YVm9MimpacMKkHgZmgeSg aejAnTreCig0q/WIV6I77l4533KIJMsxzErEI= MIME-Version: 1.0 Received: by 10.204.126.92 with SMTP id b28mr2686416bks.47.1285853305557; Thu, 30 Sep 2010 06:28:25 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Thu, 30 Sep 2010 06:28:25 -0700 (PDT) In-Reply-To: <4CA45444.6070002@dannysplace.net> References: <20100929192534.GA97031@icarus.home.lan> <20100929221549.GA343@icarus.home.lan> <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> <4CA45444.6070002@dannysplace.net> Date: Thu, 30 Sep 2010 15:28:25 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 13:28:27 -0000 On Thu, Sep 30, 2010 at 11:11 AM, Danny Carroll wrot= e: > > =A0On 30/09/2010 6:36 PM, Alexander Leidinger wrote: > > > > Quoting Jeremy Chadwick (from Wed, 29 Sep > > 2010 15:15:49 -0700): > > > >> On Thu, Sep 30, 2010 at 12:11:09AM +0200, Torbjorn Kristoffersen wrote= : > >>> I'm at a complete loss here. I shut down the jail completely, and I a= m > >>> watching the jail's ZFS filesystem grow as we speak. =A0No process is > >>> using > >>> it. =A0 It only grows in "df" and "zfs list", I can't find any files > >>> that are > >>> growing. =A0I have to re-set the quota to be higher and higher to > >>> accommodate > >>> the space. > >>> > >>> On Wed, Sep 29, 2010 at 10:46 PM, Torbjorn Kristoffersen < > >>> torbjoern@gmail.com> wrote: > >>> > >>> > Hi Jeremy. > >>> > > >>> > 1) I checked now, and found nothing extraordinary. Just processes > >>> that have > >>> > been running for a long while, such as screen, cron, sshd, bash, > >>> irssi, > >>> > syslogd, etc. > >>> > > >>> > 2) No compression used on this zfs filesystem (or any of the others= ). > >>> > > >>> > I completedly stopped the jail now, and removed some of the > >>> directories > >>> > with the most data in them, but to no avail. > >>> > > >>> > > >>> > On Wed, Sep 29, 2010 at 9:25 PM, Jeremy Chadwick > >>> >>> > > wrote: > >>> > > >>> >> On Wed, Sep 29, 2010 at 08:46:38PM +0200, Torbjorn Kristoffersen > >>> wrote: > >>> >> > I have a ZFS "tank" called tpool, the server runs a couple of > >>> jails > >>> >> (each > >>> >> > with a zfs filesystem). =A0There is a problem with one of these > >>> >> filesystems. > >>> >> > First, its disk usage as shown in ``df -h'': > >>> >> > ... > >>> >> > tpool/rb.org =A0 =A0 =A0100G =A0 =A0 95G =A0 =A04.6G =A0 =A095% = =A0 =A0/jails/rb.org > >>> >> > ... > >>> >> > > >>> >> > The command ``zfs list'' shows the same: > >>> >> > .. > >>> >> > tpool/rb.org =A0 =A095.4G =A04.56G =A095.4G =A0/jails/rb.org > >>> >> > .. > >>> >> > > >>> >> > However, there is a very mysterious problem somewhere. > >>> >> > Something inside this jail is eating diskspace, but we can't > >>> find any > >>> >> > directories that is actually taking the diskspace. We first > >>> suspected > >>> >> either > >>> >> > fetchmail or spamassassin of causing a lot of space to be used, > >>> since > >>> >> some > >>> >> > of their directories were huge. (These were later deleted, and > >>> which is > >>> >> why > >>> >> > you see that 4.6GB is now available, before that 0GB was > >>> available). > >>> >> > > >>> >> > However, we can't find *any trace* of an actual directory or > >>> file that > >>> >> is > >>> >> > taking all the spac.e > >>> >> > > >>> >> > Take this for instance: > >>> >> > > >>> >> > outsidejail# du -sh rb.org > >>> >> > =A043G =A0 =A0rb.org > >>> >> > > >>> >> > How can this be? =A0df and zfs are showing that the entire drive > >>> is nearly > >>> >> > full, yet I can't find any directory that is actually taking > >>> all this > >>> >> space. > >>> >> > =A0I've carefully looked through every single directory within > >>> the jail > >>> >> trying > >>> >> > to find something that's taking all that space, but to no avail. > >>> >> > > >>> >> > ---- > >>> >> > My system stats: > >>> >> > # uname -a > >>> >> > FreeBSD grim 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 > >>> 02:36:49 UTC > >>> >> > 2010 > >>> root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC =A0amd64 > >>> >> > # zpool get version tpool > >>> >> > NAME =A0 PROPERTY =A0VALUE =A0 =A0SOURCE > >>> >> > tpool =A0version =A0 14 =A0 =A0 =A0 default > >>> >> > # zpool status > >>> >> > =A0 pool: tpool > >>> >> > =A0state: ONLINE > >>> >> > =A0scrub: none requested > >>> >> > config: > >>> >> > > >>> >> > =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKS= UM > >>> >> > =A0 =A0 =A0 =A0 tpool =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0= =A0 =A0 0 > >>> >> > =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0= =A0 =A0 0 > >>> >> > =A0 =A0 =A0 =A0 =A0 =A0 ad4s1d =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0= =A0 =A0 0 > >>> >> > =A0 =A0 =A0 =A0 =A0 =A0 ad6s1d =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0= =A0 =A0 0 > >>> >> > > >>> >> > errors: No known data errors > >>> >> > > >>> >> > [ Note that I've also done a scrub recently ] > >>> >> > >>> >> 1) Have you checked using fstat to ensure that no file descriptors > >>> >> remain open on any of your ZFS filesystems (not pools)? > >>> >> > >>> >> 2) Are you using compression on any of your ZFS filesystems? > >> > >> Andriy and Pawel, > >> > >> Do either of you have ideas as to what could cause the issue Torbjorn'= s > >> experiencing? =A0I swear I remember some bug or quirk that got fixed w= ith > >> regards to free space on ZFS, but as has been proven time and time aga= in > >> my memory is horrible. =A0His kernel's 8.1-RELEASE dated July 19th. > > > > IIRC the commit you talk about was by Martin (CCed). I do not know if > > it is (already) MFCed. > > > > I'm not sure the bug you talk about is related to what Torbjorn is > > talking about. The fact that the free space is going down while the > > jail is shutdown (and I assume jls does not show his JID anymore, so > > all of its processes are really gone) points more to some other > > process (outside of the jail) which is filling some (maybe already > > deleted, so not visible anymore with du) file. > > > > It certainly smells like a process still writing to a file that is unlink= ed. > I wonder if it would show up with lsof. > > If dtrace is enabled on that machine then I think it should be easy to > see which process is performing write operations. > That could very well be. Interestingly, dtrace is not installed and doesn't even load. When I do kldload dtraceall it says: kldload: can't load dtraceall: Exec format error =A0Perhaps I should recompile the kernel on this server, and build in Dtrace into the kernel. Perhaps I should first update to FreeBSD-STABLE, as it is more cutting edge? Actually, I'll first do a complete backup of this jail, remove the zfs filesystem, then re-create it, put the files back, and see what happens. The unfortunate thing is that I will be ruining a chance to find out what really happened. From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 14:34:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CC4D106566C for ; Thu, 30 Sep 2010 14:34:19 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id BFF428FC13 for ; Thu, 30 Sep 2010 14:34:18 +0000 (UTC) Received: from outgoing.leidinger.net (p57B3ABE8.dip.t-dialin.net [87.179.171.232]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D94E884400A; Thu, 30 Sep 2010 16:34:12 +0200 (CEST) Received: from webmail.leidinger.net (unknown [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id E51C611F6; Thu, 30 Sep 2010 16:34:09 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o8UEY70w088149; Thu, 30 Sep 2010 16:34:07 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 30 Sep 2010 16:34:06 +0200 Message-ID: <20100930163406.330767vpzidjygow@webmail.leidinger.net> Date: Thu, 30 Sep 2010 16:34:06 +0200 From: Alexander Leidinger To: Torbjorn Kristoffersen References: <20100929192534.GA97031@icarus.home.lan> <20100929221549.GA343@icarus.home.lan> <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> <4CA45444.6070002@dannysplace.net> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D94E884400A.A8ADB X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=1.351, required 6, autolearn=disabled, RDNS_NONE 1.27, TW_ZF 0.08) X-EBL-MailScanner-SpamScore: s X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1286462055.53009@R0kNRr5etFsbNPTjpmK0kw X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 14:34:19 -0000 Quoting Torbjorn Kristoffersen (from Thu, 30 Sep =20 2010 15:28:25 +0200): > That could very well be. Interestingly, dtrace is not installed and > doesn't even load. When I do > kldload dtraceall it says: > > kldload: can't load dtraceall: Exec format error > > =C2=A0Perhaps I should recompile the kernel on this server, and build in > Dtrace into the kernel. Perhaps I should first update to > FreeBSD-STABLE, as it is more cutting edge? > > Actually, I'll first do a complete backup of this jail, remove the zfs > filesystem, then re-create it, put the files back, and see what > happens. The unfortunate thing is that I will be ruining a chance to > find out what really happened. I would give lsof a try first. Installing it from ports or packages is =20 not as much time consuming as updating the server, and may pinpoint =20 the problem. Bye, Alexander. --=20 Things are more like they are today than they ever were before. =09=09-- Dwight D. Eisenhower http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 14:39:08 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F564106564A for ; Thu, 30 Sep 2010 14:39:08 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 829E28FC17 for ; Thu, 30 Sep 2010 14:39:07 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8UEck7w019474; Thu, 30 Sep 2010 16:39:01 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8UEckoY019473; Thu, 30 Sep 2010 16:38:46 +0200 (CEST) (envelope-from olli) Date: Thu, 30 Sep 2010 16:38:46 +0200 (CEST) Message-Id: <201009301438.o8UEckoY019473@lurza.secnetix.de> From: Oliver Fromme To: freebsd-fs@FreeBSD.ORG, fbsd@dannysplace.net, torbjoern@gmail.com In-Reply-To: <4CA45444.6070002@dannysplace.net> X-Newsgroups: list.freebsd-fs User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Thu, 30 Sep 2010 16:39:02 +0200 (CEST) Cc: Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@FreeBSD.ORG, fbsd@dannysplace.net, torbjoern@gmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 14:39:08 -0000 Danny Carroll wrote: > [...] > It certainly smells like a process still writing to a file that is unlinked. > I wonder if it would show up with lsof. If it's a file that was unlinked that is still held open by a process, then lsof will definitely list it. The command # lsof +L1 lists all open files with a link count of zero. You can restrict it to a certain file system like this: # lsof +aL1 /var Of course, lsof won't list the file name because the file doesn't have a name anymore. But it lists the process by name, PID and user, the file system and the file size. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "IRIX is about as stable as a one-legged drunk with hypothermia in a four-hundred mile per hour wind, balancing on a banana peel on a greased cookie sheet -- when someone throws him an elephant with bad breath and a worse temper." -- Ralf Hildebrandt From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 14:48:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A89F1065672 for ; Thu, 30 Sep 2010 14:48:47 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id 409F48FC0A for ; Thu, 30 Sep 2010 14:48:46 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta02.emeryville.ca.mail.comcast.net with comcast id CzPr1f0040mlR8UA22omu2; Thu, 30 Sep 2010 14:48:46 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta11.emeryville.ca.mail.comcast.net with comcast id D2ol1f00U3LrwQ28X2olLh; Thu, 30 Sep 2010 14:48:46 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 5A82E9B418; Thu, 30 Sep 2010 07:48:45 -0700 (PDT) Date: Thu, 30 Sep 2010 07:48:45 -0700 From: Jeremy Chadwick To: freebsd-fs@FreeBSD.ORG, fbsd@dannysplace.net, torbjoern@gmail.com Message-ID: <20100930144845.GA19926@icarus.home.lan> References: <4CA45444.6070002@dannysplace.net> <201009301438.o8UEckoY019473@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201009301438.o8UEckoY019473@lurza.secnetix.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 14:48:47 -0000 On Thu, Sep 30, 2010 at 04:38:46PM +0200, Oliver Fromme wrote: > Danny Carroll wrote: > > [...] > > It certainly smells like a process still writing to a file that is unlinked. > > I wonder if it would show up with lsof. > > If it's a file that was unlinked that is still held open by > a process, then lsof will definitely list it. The command > > # lsof +L1 > > lists all open files with a link count of zero. You can > restrict it to a certain file system like this: > > # lsof +aL1 /var > > Of course, lsof won't list the file name because the file > doesn't have a name anymore. But it lists the process by > name, PID and user, the file system and the file size. Can someone explain how use of lsof in this regard is different than use of fstat(1) like I originally mentioned? Does lsof do something more thorough or differently that what fstat does? -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 15:48:02 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1847C106564A for ; Thu, 30 Sep 2010 15:48:02 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 60BB18FC13 for ; Thu, 30 Sep 2010 15:48:01 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA24598; Thu, 30 Sep 2010 18:47:56 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA4B12B.7050307@icyb.net.ua> Date: Thu, 30 Sep 2010 18:47:55 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeremy Chadwick References: <4CA45444.6070002@dannysplace.net> <201009301438.o8UEckoY019473@lurza.secnetix.de> <20100930144845.GA19926@icarus.home.lan> In-Reply-To: <20100930144845.GA19926@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 15:48:02 -0000 on 30/09/2010 17:48 Jeremy Chadwick said the following: > On Thu, Sep 30, 2010 at 04:38:46PM +0200, Oliver Fromme wrote: >> Danny Carroll wrote: >> > [...] >> > It certainly smells like a process still writing to a file that is unlinked. >> > I wonder if it would show up with lsof. >> >> If it's a file that was unlinked that is still held open by >> a process, then lsof will definitely list it. The command >> >> # lsof +L1 >> >> lists all open files with a link count of zero. You can >> restrict it to a certain file system like this: >> >> # lsof +aL1 /var >> >> Of course, lsof won't list the file name because the file >> doesn't have a name anymore. But it lists the process by >> name, PID and user, the file system and the file size. > > Can someone explain how use of lsof in this regard is different than use > of fstat(1) like I originally mentioned? Does lsof do something more > thorough or differently that what fstat does? I believe that there is no reason to prefer lsof except for those who spent more time with Linux than with FreeBSD. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 15:50:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B1E31065672 for ; Thu, 30 Sep 2010 15:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8AD938FC15 for ; Thu, 30 Sep 2010 15:50:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8UFo3dp033266 for ; Thu, 30 Sep 2010 15:50:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8UFo3Ua033265; Thu, 30 Sep 2010 15:50:03 GMT (envelope-from gnats) Date: Thu, 30 Sep 2010 15:50:03 GMT Message-Id: <201009301550.o8UFo3Ua033265@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Mark Atkinson Cc: Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support utf-encoded international characters in filr names X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mark Atkinson List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 15:50:03 -0000 The following reply was made to PR kern/133174; it has been noted by GNATS. From: Mark Atkinson To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/133174: [msdosfs] [patch] msdosfs must support utf-encoded international characters in filr names Date: Thu, 30 Sep 2010 08:40:52 -0700 The currently direct link to the url patch. I hope to try this patch out soon as this is bothering me moving mp3 files back and forth to my phone over USB with non-ascii encoded chars in the filenames. http://btload.googlegroups.com/web/msdosfs.patch?gda=6OJa5z8AAABTKdAk9D4djfQOfSDW4ZV9vKlhdfRkDKO3uYPnaA-gp-toi5oIt3BJMRGeqGBbbj-ccyFKn-rNKC-d1pM_IdV0 or via the google url shortener: http://goo.gl/CwRn From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 17:08:17 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 552A6106566B for ; Thu, 30 Sep 2010 17:08:17 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id C0E668FC19 for ; Thu, 30 Sep 2010 17:08:16 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8UH807Q026169; Thu, 30 Sep 2010 19:08:15 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8UH7xAs026168; Thu, 30 Sep 2010 19:07:59 +0200 (CEST) (envelope-from olli) Date: Thu, 30 Sep 2010 19:07:59 +0200 (CEST) Message-Id: <201009301707.o8UH7xAs026168@lurza.secnetix.de> From: Oliver Fromme To: freebsd-fs@FreeBSD.ORG, Jeremy Chadwick , Andriy Gapon In-Reply-To: <4CA4B12B.7050307@icyb.net.ua> X-Newsgroups: list.freebsd-fs User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Thu, 30 Sep 2010 19:08:15 +0200 (CEST) Cc: Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@FreeBSD.ORG, Jeremy Chadwick , Andriy Gapon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 17:08:17 -0000 Andriy Gapon wrote: > on 30/09/2010 17:48 Jeremy Chadwick said the following: > > Can someone explain how use of lsof in this regard is different than use > > of fstat(1) like I originally mentioned? Does lsof do something more > > thorough or differently that what fstat does? > > I believe that there is no reason to prefer lsof except for those who spent more > time with Linux than with FreeBSD. Last time I had a try at fstat(1), it wasn't able to print actual file names, while lsof was able to do it. That's why I generally prefer lsof over fstat(1). For most of my needs fstat(1) is useless if it can't display file names. (I think DragonFly's fstat(1) can do it, FWIW.) Of course, in this particular case it might be irrelevant because the files in questions don't have names anymore. On the other hand, I'm not sure how to use fstat(1) to identify files with link count zero ... I'm looking at the manpage, but maybe it's just too late in the evening. What command line would you suggest, exactly? At least it doesn't seem to be as easy as "lsof +L1". Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd It's trivial to make fun of Microsoft products, but it takes a real man to make them work, and a God to make them do anything useful. From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 17:08:55 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5A0F1065675 for ; Thu, 30 Sep 2010 17:08:55 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 630E48FC1A for ; Thu, 30 Sep 2010 17:08:54 +0000 (UTC) Received: by bwz15 with SMTP id 15so2045521bwz.13 for ; Thu, 30 Sep 2010 10:08:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Q9C6ztQhJlATbYUMqtzfJH68X3l3NO9+/ezEjA2o9UE=; b=T3mN+4uZQfCf4C9C5FwTYprdKCmQCjNBq0pEFnWjT+kRwCWmEwdCi2nwpUIFanq1c6 tCcSj4f0UApseUDMnPa4L8mSrhmt1v1ypWFu0lpbjHLZ+kVHRF5pX+ygNExRehCXgvjk 4j0nb7T/5+AKWZkOs8PwUurc3DKeVWjINPw7Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=clGNU903fGvzl0vxsIWw5Ifcb2BKQSYRl7S5HfWXYNmyHScY1d91n+5jegVr2G7Lpk syQmzNKKBRiWFHsyQZ+9HfVRUZFL98GvWAocZjSz8aC8xv1RXCvNEHJEIOjsAbAXnQcc kQ/RwBepbeDi5vcqk6n6szCKLcyr5hyucdpBU= MIME-Version: 1.0 Received: by 10.204.82.167 with SMTP id b39mr3002759bkl.164.1285866530661; Thu, 30 Sep 2010 10:08:50 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Thu, 30 Sep 2010 10:08:50 -0700 (PDT) In-Reply-To: References: <4CA45444.6070002@dannysplace.net> <201009301438.o8UEckoY019473@lurza.secnetix.de> <20100930144845.GA19926@icarus.home.lan> <4CA4B12B.7050307@icyb.net.ua> Date: Thu, 30 Sep 2010 19:08:50 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Fwd: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 17:08:56 -0000 On Thu, Sep 30, 2010 at 5:47 PM, Andriy Gapon wrote: > on 30/09/2010 17:48 Jeremy Chadwick said the following: >> On Thu, Sep 30, 2010 at 04:38:46PM +0200, Oliver Fromme wrote: >>> Danny Carroll wrote: >>> =A0> [...] >>> =A0> It certainly smells like a process still writing to a file that is= unlinked. >>> =A0> I wonder if it would show up with lsof. >>> >>> If it's a file that was unlinked that is still held open by >>> a process, then lsof will definitely list it. =A0The command >>> >>> # lsof +L1 >>> >>> lists all open files with a link count of zero. =A0You can >>> restrict it to a certain file system like this: >>> >>> # lsof +aL1 /var >>> >>> Of course, lsof won't list the file name because the file >>> doesn't have a name anymore. =A0But it lists the process by >>> name, PID and user, the file system and the file size. >> >> Can someone explain how use of lsof in this regard is different than use >> of fstat(1) like I originally mentioned? =A0Does lsof do something more >> thorough or differently that what fstat does? > > I believe that there is no reason to prefer lsof except for those who spe= nt more > time with Linux than with FreeBSD. > I tried fstat earlier and now I tried lsof as suggested. =A0Doing lsof +L1 only gave me: COMMAND =A0PID =A0USER =A0 FD =A0 TYPE DEVICE SIZE/OFF NLINK =A0 NODE NAME mysqld =A01030 mysql =A0 =A04u =A0VREG =A0 0,99 =A0 =A0 =A0 =A00 =A0 =A0 0 = 800965 / (/dev/mirror/root) mysqld =A01030 mysql =A0 =A05u =A0VREG =A0 0,99 =A0 =A0 =A0 =A00 =A0 =A0 0 = 800969 / (/dev/mirror/root) mysqld =A01030 mysql =A0 =A06u =A0VREG =A0 0,99 =A0 =A0 =A0 =A00 =A0 =A0 0 = 800970 / (/dev/mirror/root) .... Basically, it only gives me mysqld which runs outside the jails. Nothing else was listed. I noticed that the filesystem has stopped growing now though, so that may also be the reason why lsof does not show anything anymore. The "du -sh /jails/rb.org" still gives a low usage value. Also, this is the output from df -h (I've since resized the ZFS quota to make the filesystem bigger for this jail): tpool/rb.org =A0 =A0 =A0200G =A0 =A0111G =A0 =A0 89G =A0 =A056% =A0 =A0/jai= ls/rb.org If the process causing this is gone, or is working correctly (seeing that the fs is no longer growing, I hope), can dead unlinked files still remain, is there a way to purge them? From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 17:23:08 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F13751065695 for ; Thu, 30 Sep 2010 17:23:08 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 462238FC32 for ; Thu, 30 Sep 2010 17:23:07 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA25971; Thu, 30 Sep 2010 20:23:06 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA4C77A.2030807@icyb.net.ua> Date: Thu, 30 Sep 2010 20:23:06 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.ORG, Jeremy Chadwick , Andriy Gapon , Oliver Fromme References: <201009301707.o8UH7xAs026168@lurza.secnetix.de> In-Reply-To: <201009301707.o8UH7xAs026168@lurza.secnetix.de> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 17:23:09 -0000 on 30/09/2010 20:07 Oliver Fromme said the following: > Last time I had a try at fstat(1), it wasn't able to print > actual file names, while lsof was able to do it. That's > why I generally prefer lsof over fstat(1). For most of my > needs fstat(1) is useless if it can't display file names. > (I think DragonFly's fstat(1) can do it, FWIW.) Point taken. However fstat still does print inode numbers. > Of course, in this particular case it might be irrelevant > because the files in questions don't have names anymore. Right. > On the other hand, I'm not sure how to use fstat(1) to > identify files with link count zero ... I'm looking at > the manpage, but maybe it's just too late in the evening. > What command line would you suggest, exactly? At least > it doesn't seem to be as easy as "lsof +L1". Well, I am believer in a Unix way - each tool for its own small job, combine the tools to get a big job done. One tool that does all with a million obscure options does not appeal to me. But that's me. And in this particular case what you ask is irrelevant. We just need to find all processes having opened files on a particular filesystem. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 18:55:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2DFCE1065695 for ; Thu, 30 Sep 2010 18:55:05 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9E5AB8FC36 for ; Thu, 30 Sep 2010 18:55:04 +0000 (UTC) Received: by bwz15 with SMTP id 15so2166078bwz.13 for ; Thu, 30 Sep 2010 11:55:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Y6/wjsINFVgE26fthiBfoxxXxPmrPOlO26HFd6ie9/o=; b=xbXJ7YAPmd8MtY7qdKdp+4RX9K5GNLahFChupuEtASuAlUIPiCUEt/6hQkXQLVvR7V tJreIjwehCIRevZBKhxDgqgGU5Njn9+w3i+08LxRT+VC60JkeYbWnm06TS8iyDXeaMb8 OBG0mr/vGVqrfr+VumwpokpMiS3JSi3qhVf1A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=sUqVae0Qh5Y3ODu1e20SFoAuUzUvkOhfDBEuRdWKhdhL8YlVbsJvvt1UPKhQosS/jb xND2IWWl8m6PSY4u3BY5h/X1NcI+1vLP4SIO3DFP6fGRpEhTaroqD7VCPlN0prxBlBix 3KawK+L/cRcaV16n9/infRjI/UyMPco0ZGHmc= MIME-Version: 1.0 Received: by 10.204.85.90 with SMTP id n26mr3155922bkl.109.1285872903016; Thu, 30 Sep 2010 11:55:03 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Thu, 30 Sep 2010 11:55:02 -0700 (PDT) In-Reply-To: <4CA4C77A.2030807@icyb.net.ua> References: <201009301707.o8UH7xAs026168@lurza.secnetix.de> <4CA4C77A.2030807@icyb.net.ua> Date: Thu, 30 Sep 2010 20:55:02 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 18:55:05 -0000 On Thu, Sep 30, 2010 at 7:23 PM, Andriy Gapon wrote: > on 30/09/2010 20:07 Oliver Fromme said the following: >> Last time I had a try at fstat(1), it wasn't able to print >> actual file names, while lsof was able to do it. =A0That's >> why I generally prefer lsof over fstat(1). =A0For most of my >> needs fstat(1) is useless if it can't display file names. >> (I think DragonFly's fstat(1) can do it, FWIW.) > > Point taken. > However fstat still does print inode numbers. Here's some news, I finally found a file in a user's .spamassassin director= y. $ ls -l .spamassassin/ total 39877936 -rw------- 1 gg gg 76546048 Sep 30 01:13 auto-whitelist -rw------- 1 gg gg 48 Sep 30 01:51 bayes.lock -rw------- 1 gg gg 20840448 Sep 30 01:13 bayes_seen ---------- 1 gg gg 552902721536 Sep 30 01:52 temp -rw------- 1 gg gg 1573 Sep 30 07:51 user_prefs Now that is an incredibly huge (and invalid) file! Something like 514GB, far more than the size of this ZFS filesystem. I removed it, and there was no visible effect in df. Some funny business must be happening with spamassassin though, otherwise this strange file would not be so huge. I then checked the entire filesystem for files that show up as very large in 'ls', but freak-sized files came up. From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 20:52:14 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D61D71065673; Thu, 30 Sep 2010 20:52:14 +0000 (UTC) (envelope-from arundel@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id AC7DA8FC0A; Thu, 30 Sep 2010 20:52:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8UKqENw055347; Thu, 30 Sep 2010 20:52:14 GMT (envelope-from arundel@freefall.freebsd.org) Received: (from arundel@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8UKqD9W055342; Thu, 30 Sep 2010 20:52:14 GMT (envelope-from arundel) Date: Thu, 30 Sep 2010 20:52:14 GMT Message-Id: <201009302052.o8UKqD9W055342@freefall.freebsd.org> To: postmaster@uni-bielefeld.de, arundel@FreeBSD.org, freebsd-fs@FreeBSD.org From: arundel@FreeBSD.org Cc: Subject: Re: kern/115645: [ffs] [snapshots] [panic] lockmgr: thread 0xc4c00d80, not exclusive lock holder 0xc4dd7c00 unlocking X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 20:52:14 -0000 Synopsis: [ffs] [snapshots] [panic] lockmgr: thread 0xc4c00d80, not exclusive lock holder 0xc4dd7c00 unlocking State-Changed-From-To: open->feedback State-Changed-By: arundel State-Changed-When: Thu Sep 30 20:50:12 UTC 2010 State-Changed-Why: Can you still reproduce this PR with a more recent 6.X or 7.X release? Please note that the RELENG_6 went EoL a few weeks ago. http://www.freebsd.org/cgi/query-pr.cgi?pr=115645 From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 22:15:26 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2712C106566B for ; Thu, 30 Sep 2010 22:15:26 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184]) by mx1.freebsd.org (Postfix) with ESMTP id E626C8FC24 for ; Thu, 30 Sep 2010 22:15:25 +0000 (UTC) Received: from [203.206.171.212] (helo=[192.168.10.10]) by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1P1RPs-0001EC-1m for freebsd-fs@freebsd.org; Fri, 01 Oct 2010 08:15:57 +1000 Message-ID: <4CA50BF1.60503@dannysplace.net> Date: Fri, 01 Oct 2010 08:15:13 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4CA45444.6070002@dannysplace.net> <201009301438.o8UEckoY019473@lurza.secnetix.de> <20100930144845.GA19926@icarus.home.lan> <4CA4B12B.7050307@icyb.net.ua> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29) X-Date: 2010-10-01 08:15:52 X-Connected-IP: 203.206.171.212:51791 X-Message-Linecount: 37 X-Body-Linecount: 23 X-Message-Size: 1795 X-Body-Size: 934 X-Received-Count: 1 X-Recipient-Count: 1 X-Local-Recipient-Count: 1 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on damka.dannysplace.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net) Subject: Re: Fwd: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 22:15:26 -0000 On 1/10/2010 3:08 AM, Torbjorn Kristoffersen wrote: > > If the process causing this is gone, or is working correctly (seeing > that the fs is no longer growing, I hope), > can dead unlinked files still remain, is there a way to purge them? I can't remember exactly what happens and it's probably different for each flavour of unix and *nux. If a file is deleted, then the directory entry for the inode is de-linked. If it's the last link to that inode then usually that inode is freed. But when a process holds a handle to a file when it's deleted, then the reclaim does not happen AFAIK until *after* the file handle is closed. I wonder what happens when, if a file handle is opened for writing, someone else comes along and truncates the file. Perhaps a the seek position of the open handle is reset to 0, or perhaps (not likely) a write operation after truncation would result in an error. -D From owner-freebsd-fs@FreeBSD.ORG Thu Sep 30 23:31:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E0A8106566B for ; Thu, 30 Sep 2010 23:31:11 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124]) by mx1.freebsd.org (Postfix) with ESMTP id BDD8B8FC0A for ; Thu, 30 Sep 2010 23:31:10 +0000 (UTC) Received: from 46-127-29-79.dclient.hispeed.ch ([46.127.29.79]:36814 helo=[172.16.1.3]) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1P1Saj-00074T-6Z; Fri, 01 Oct 2010 01:31:09 +0200 Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Markus Gebert In-Reply-To: <4CA50BF1.60503@dannysplace.net> Date: Fri, 1 Oct 2010 01:31:08 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <8762A442-5027-48E4-B51F-73F29658CA2F@hostpoint.ch> References: <4CA45444.6070002@dannysplace.net> <201009301438.o8UEckoY019473@lurza.secnetix.de> <20100930144845.GA19926@icarus.home.lan> <4CA4B12B.7050307@icyb.net.ua> <4CA50BF1.60503@dannysplace.net> To: fbsd@dannysplace.net X-Mailer: Apple Mail (2.1081) Cc: freebsd-fs@freebsd.org Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 23:31:11 -0000 On 01.10.2010, at 00:15, Danny Carroll wrote: >>=20 >> If the process causing this is gone, or is working correctly (seeing >> that the fs is no longer growing, I hope), >> can dead unlinked files still remain, is there a way to purge them? >=20 > I can't remember exactly what happens and it's probably different for > each flavour of unix and *nux. > If a file is deleted, then the directory entry for the inode is > de-linked. If it's the last link to that inode then usually that = inode > is freed. >=20 > But when a process holds a handle to a file when it's deleted, then = the > reclaim does not happen AFAIK until *after* the file handle is closed. >=20 > > I wonder what happens when, if a file handle is opened for writing, > someone else comes along and truncates the file. =20 > Perhaps a the seek position of the open handle is reset to 0, or = perhaps > (not likely) a write operation after truncation would result in an = error. > AFAIK the file handle offset won't get reset to anything unless O_APPEND = was used to open the file (maybe there are other special cases). In = either case, the write will _not_ fail due to an offset beyond EOF, = instead a hole is created and the new data gets written after that. (see = man lseek(2)) The hole won't use disk space (as shown by df or zfs list), but is = considered part of the file size (as shown by ls). In other words, = truncating might free disk space, no matter what offsets other = filehandles have. However I don't see the point here. If the OP knows the file, he may as = well delete it to free disk space. If he doesn't, or it's inaccessible = (deleted but referenced), truncating isn't an option. Markus From owner-freebsd-fs@FreeBSD.ORG Fri Oct 1 00:49:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 838831065694; Fri, 1 Oct 2010 00:49:03 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5A23B8FC15; Fri, 1 Oct 2010 00:49:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o910n3G6094328; Fri, 1 Oct 2010 00:49:03 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o910n38N094324; Fri, 1 Oct 2010 00:49:03 GMT (envelope-from linimon) Date: Fri, 1 Oct 2010 00:49:03 GMT Message-Id: <201010010049.o910n38N094324@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/151082: [zfs] [patch] sappend-flaged files on ZFS not working correctly X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 00:49:03 -0000 Old Synopsis: [patch] sappend-flaged files on ZFS not working correctly New Synopsis: [zfs] [patch] sappend-flaged files on ZFS not working correctly Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Oct 1 00:48:32 UTC 2010 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=151082 From owner-freebsd-fs@FreeBSD.ORG Fri Oct 1 00:55:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A72091065679 for ; Fri, 1 Oct 2010 00:55:05 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184]) by mx1.freebsd.org (Postfix) with ESMTP id 6C5508FC13 for ; Fri, 1 Oct 2010 00:55:05 +0000 (UTC) Received: from [203.206.171.212] (helo=[192.168.10.10]) by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1P1TuQ-00064T-Gp; Fri, 01 Oct 2010 10:55:36 +1000 Message-ID: <4CA5315F.70105@dannysplace.net> Date: Fri, 01 Oct 2010 10:54:55 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Markus Gebert References: <4CA45444.6070002@dannysplace.net> <201009301438.o8UEckoY019473@lurza.secnetix.de> <20100930144845.GA19926@icarus.home.lan> <4CA4B12B.7050307@icyb.net.ua> <4CA50BF1.60503@dannysplace.net> <8762A442-5027-48E4-B51F-73F29658CA2F@hostpoint.ch> In-Reply-To: <8762A442-5027-48E4-B51F-73F29658CA2F@hostpoint.ch> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29) X-Date: 2010-10-01 10:55:34 X-Connected-IP: 203.206.171.212:65454 X-Message-Linecount: 22 X-Body-Linecount: 7 X-Message-Size: 1336 X-Body-Size: 359 X-Received-Count: 1 X-Recipient-Count: 2 X-Local-Recipient-Count: 2 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on damka.dannysplace.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net) Cc: freebsd-fs@freebsd.org Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 00:55:05 -0000 On 1/10/2010 9:31 AM, Markus Gebert wrote: > However I don't see the point here. If the OP knows the file, he may as well delete it to free disk space. If he doesn't, or it's inaccessible (deleted but referenced), truncating isn't an option. Yeah. I was just thinking about what might happen in certain situations. Definitely not relevant to the OP. -D From owner-freebsd-fs@FreeBSD.ORG Fri Oct 1 04:25:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63A051065673 for ; Fri, 1 Oct 2010 04:25:42 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 315D78FC13 for ; Fri, 1 Oct 2010 04:25:42 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD)) id 1P1XBk-0009QM-CQ; Fri, 01 Oct 2010 00:25:40 -0400 Date: Fri, 1 Oct 2010 00:25:40 -0400 From: Gary Palmer To: Alexander Leidinger Message-ID: <20101001042540.GA48601@in-addr.com> References: <20100929192534.GA97031@icarus.home.lan> <20100929221549.GA343@icarus.home.lan> <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> <4CA45444.6070002@dannysplace.net> <20100930163406.330767vpzidjygow@webmail.leidinger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100930163406.330767vpzidjygow@webmail.leidinger.net> Cc: freebsd-fs@freebsd.org Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 04:25:42 -0000 On Thu, Sep 30, 2010 at 04:34:06PM +0200, Alexander Leidinger wrote: > Quoting Torbjorn Kristoffersen (from Thu, 30 Sep > 2010 15:28:25 +0200): > > >That could very well be. Interestingly, dtrace is not installed and > >doesn't even load. When I do > >kldload dtraceall it says: > > > > kldload: can't load dtraceall: Exec format error > > > >??Perhaps I should recompile the kernel on this server, and build in > >Dtrace into the kernel. Perhaps I should first update to > >FreeBSD-STABLE, as it is more cutting edge? > > > >Actually, I'll first do a complete backup of this jail, remove the zfs > >filesystem, then re-create it, put the files back, and see what > >happens. The unfortunate thing is that I will be ruining a chance to > >find out what really happened. > > I would give lsof a try first. Installing it from ports or packages is > not as much time consuming as updating the server, and may pinpoint > the problem. > > Bye, > Alexander. It might be worth running ktrace -C as root. I do not believe ktrace output files show up in lsof or fstat. It seems at least theoretically possible that a ktrace output file has been deleted so it no longer shows up in ls/du but the trace is ongoing. Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Fri Oct 1 07:03:54 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D7A71065670 for ; Fri, 1 Oct 2010 07:03:54 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id D154A8FC13 for ; Fri, 1 Oct 2010 07:03:53 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o9173blT060480; Fri, 1 Oct 2010 09:03:52 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o9173bgq060479; Fri, 1 Oct 2010 09:03:37 +0200 (CEST) (envelope-from olli) Date: Fri, 1 Oct 2010 09:03:37 +0200 (CEST) Message-Id: <201010010703.o9173bgq060479@lurza.secnetix.de> From: Oliver Fromme To: freebsd-fs@FreeBSD.ORG, torbjoern@gmail.com In-Reply-To: X-Newsgroups: list.freebsd-fs User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Fri, 01 Oct 2010 09:03:52 +0200 (CEST) Cc: Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@FreeBSD.ORG, torbjoern@gmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 07:03:54 -0000 Torbjorn Kristoffersen wrote: > Here's some news, I finally found a file in a user's .spamassassin directory. > > $ ls -l .spamassassin/ > total 39877936 > -rw------- 1 gg gg 76546048 Sep 30 01:13 auto-whitelist > -rw------- 1 gg gg 48 Sep 30 01:51 bayes.lock > -rw------- 1 gg gg 20840448 Sep 30 01:13 bayes_seen > ---------- 1 gg gg 552902721536 Sep 30 01:52 temp > -rw------- 1 gg gg 1573 Sep 30 07:51 user_prefs > > > Now that is an incredibly huge (and invalid) file! Something like > 514GB, far more than the size of this ZFS filesystem. > I removed it, and there was no visible effect in df. Some funny > business must be happening with spamassassin though, > otherwise this strange file would not be so huge. Probably a so-called sparse file, i.e. a file with "holes" that don't actually occupy disk space. "ls -ls" will print the number of blocks actually allocated to the file on disk. I once wrote a script that calculates the "sparseness" of files. It's designed for UFS/UFS2. The output will be inaccurate for ZFS, but it should still give a rough number. http://www.secnetix.de/olli/scripts/sparsecheck Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "I invented Ctrl-Alt-Delete, but Bill Gates made it famous." -- David Bradley, original IBM PC design team From owner-freebsd-fs@FreeBSD.ORG Fri Oct 1 07:41:14 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 355CF106564A for ; Fri, 1 Oct 2010 07:41:14 +0000 (UTC) (envelope-from torbjoern@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id B062A8FC08 for ; Fri, 1 Oct 2010 07:41:13 +0000 (UTC) Received: by bwz15 with SMTP id 15so2640939bwz.13 for ; Fri, 01 Oct 2010 00:41:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=X1kVRmhfrQMJ8KswUQoqdhcXu7cM4wnwl8DOTdyub8E=; b=lZtVhXuuz3VaTS794CkwxCriJwHMcCkMbNPJvYrKL8hmqF/Q23il7RjVR1ohyHoV5t aOcEzztHxASJ4u5QmaFIbQwqvxP8DPYWiI0MU+Wv1mzkoHVWbjrLEncRN9Dktem+Zh31 vWBFgewU42bO5PtjaR4py0NvElFIPa8MDMxyA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=D+lAiC1yjQsADksCVv/EvuAaYjw8Pwb6IljB535XOzv7MQuPnR8HXFMe8Nsvivcz/T HUbHHR6EsEA/VOA40ZMeQ3IhqIZBnCHbFTMAmhKr/p1RASUBImMUmeHl122M8DKHFLPd I85xoTITBqS87AA09o8DwO3U0/EoqCy0OuqUg= MIME-Version: 1.0 Received: by 10.204.117.13 with SMTP id o13mr3722797bkq.48.1285918872481; Fri, 01 Oct 2010 00:41:12 -0700 (PDT) Received: by 10.204.71.138 with HTTP; Fri, 1 Oct 2010 00:41:12 -0700 (PDT) In-Reply-To: <20101001042540.GA48601@in-addr.com> References: <20100929192534.GA97031@icarus.home.lan> <20100929221549.GA343@icarus.home.lan> <20100930103647.62193lbkp9yqx5k4@webmail.leidinger.net> <4CA45444.6070002@dannysplace.net> <20100930163406.330767vpzidjygow@webmail.leidinger.net> <20101001042540.GA48601@in-addr.com> Date: Fri, 1 Oct 2010 09:41:12 +0200 Message-ID: From: Torbjorn Kristoffersen To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Strange ZFS problem, filesystem claims to be full when clearly not full X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 07:41:14 -0000 On Fri, Oct 1, 2010 at 6:25 AM, Gary Palmer wrote: > On Thu, Sep 30, 2010 at 04:34:06PM +0200, Alexander Leidinger wrote: >> Quoting Torbjorn Kristoffersen (from Thu, 30 Sep >> 2010 15:28:25 +0200): >> >> >That could very well be. =A0Interestingly, dtrace is not installed and >> >doesn't even load. =A0When I do >> >kldload dtraceall it says: >> > >> > =A0 =A0kldload: can't load dtraceall: Exec format error >> > >> >??Perhaps I should recompile the kernel on this server, and build in >> >Dtrace into the kernel. =A0Perhaps I should first update to >> >FreeBSD-STABLE, as it is more cutting edge? >> > >> >Actually, I'll first do a complete backup of this jail, remove the zfs >> >filesystem, then re-create it, put the files back, and see what >> >happens. =A0The unfortunate thing is that I will be ruining a chance to >> >find out what really happened. >> >> I would give lsof a try first. Installing it from ports or packages is >> not as much time consuming as updating the server, and may pinpoint >> the problem. >> >> Bye, >> Alexander. > > It might be worth running > > ktrace -C > > as root. =A0I do not believe ktrace output files show up in lsof or fstat= . > It seems at least theoretically possible that a ktrace output file has > been deleted so it no longer shows up in ls/du but the trace is ongoing. > Very unlikely as I'm the only admin outside the jails. But I did it regardless, you never know if I started a ktrace in my sleep :-) From owner-freebsd-fs@FreeBSD.ORG Fri Oct 1 16:42:09 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C31831065674 for ; Fri, 1 Oct 2010 16:42:09 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EB4AE8FC19 for ; Fri, 1 Oct 2010 16:42:08 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA15026; Fri, 01 Oct 2010 19:42:06 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4CA60F5D.2070308@icyb.net.ua> Date: Fri, 01 Oct 2010 19:42:05 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Kostik Belousov References: <4CA1D06C.9050305@digiware.nl> <20100928115047.GA62142__15392.0458550148$1285675457$gmane$org@icarus.home.lan> <4CA1DDE9.8090107@icyb.net.ua> <20100928132355.GA63149@icarus.home.lan> <4CA1EF69.4040402@icyb.net.ua> <4CA21809.7090504@icyb.net.ua> <71D54408-4B97-4F7A-BD83-692D8D23461A@wanderview.com> <4CA22337.2010900@icyb.net.ua> <20100928181327.GS43070@deviant.kiev.zoral.com.ua> In-Reply-To: <20100928181327.GS43070@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=KOI8-U Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: Still getting kmem exhausted panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 16:42:09 -0000 on 28/09/2010 21:13 Kostik Belousov said the following: > On Tue, Sep 28, 2010 at 08:17:43PM +0300, Andriy Gapon wrote: >> ARC is a ZFS private cache. >> ZFS doesn't use unified buffer/page cache. >> So ARC is not directly affected by pagedaemon. >> But this is not exactly VFS layer thing. > As a pure speculation, unbacked by any code reasing or understanding > of the principles. Can ARC be changed to use some custom vm pager > instead of managing memory on its own. As I understand it, ARC > uses wired kernel mappings right now. Yes, ARC uses malloc(9) and/or uma(9). > If it starts using managed pages backed by a new pager, then pagedaemon > might take actual decisions on the cache shrink by putting and reclaiming > pages. Does ARC has some `active' count for the caching unit ? It might be > translated to the active count for the page etc. Well, not sure if I'd like pagedaemon directly poking ARC state. ARC seems to be a little bit more sophisticated than pagedaemon. One could argue that ARC is a distinctive and important feature of ZFS. My understanding is that ARC buffer state is determined by a "group" to which it belongs (MRU, MFU, ghost variations of those) and last access time. With each new access a buffer can be moved to a different group. And when a buffer replacement is needed, then that state plays a role in deciding which buffers to re-use. Ditto when ARC size needs to be reduced. Also worth noting that not always data in ARC has an associated vnode, at least that's my understanding. I had another crazy idea that is opposite to yours. ARC keeps using wired pages, but there is a special pager of some kind and ARC's pages get inserted into vnode's object when needed. pagedaemon wouldn't manage those pages as it does with normal ones, but instead would "hint" to ARC what is active and what is not and then ARC would do its "smart thing". But I think in this case we would need some way to steal pages from object when ARC decides that it really needs to shed some pages. This wouldn't solve a problem of double-management of memory, but at least it could try to solve a problem of double-caching. On the other hand, perhaps some people would find useful ZFS without ARC but with integration into VM, if that variant retains most of features of ZFS and provides some benefits in terms of resource usage and/or performance. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 11:50:51 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C71101065670 for ; Sat, 2 Oct 2010 11:50:51 +0000 (UTC) (envelope-from phanquochien@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 5F7568FC14 for ; Sat, 2 Oct 2010 11:50:50 +0000 (UTC) Received: by wwb17 with SMTP id 17so5047175wwb.31 for ; Sat, 02 Oct 2010 04:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=C4m5CcyKTR8XaMNdlynb9jertKH+g9OsrDeKRPVeRwo=; b=Z6GuUHBg9mQADzlj/Zg2nmkM14xgy0kYwEF48jsN/CqOauIn/xMuuXmAkFSiojU5p3 mjyMvlfK/a0hhu1L1ZY+YKaPixw1tT2q2LmEpFDS3QKLOXZR1ZNBdNP/5qij1bmmZ9Z1 P3ClXPosFzTQzblYXFoYDpl2PgG8mK3Npcjgc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=j+SL6QOEXqQFEYy2vy2/5p0EVcosTC4+s712YZenVcklfvaYYOYFH9RcoKT8FmQm7Z FPMgtBejuI61Tg+Dv1fmU1EMva+52vdOvcl+zWSv4cg4NRNySK1L39IDJfXs0YgSWKIv /4v814ZWChnJBaJ5tpxQCh1ScNqP5w27M2qDo= MIME-Version: 1.0 Received: by 10.227.134.144 with SMTP id j16mr5843309wbt.50.1286018939947; Sat, 02 Oct 2010 04:28:59 -0700 (PDT) Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 04:28:59 -0700 (PDT) Date: Sat, 2 Oct 2010 18:28:59 +0700 Message-ID: From: Phan Quoc Hien To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 11:50:51 -0000 hello everybody. I'm new to freebsd, When I hard shutdown my freebsd box..it caused lost some file. I used UFS2. How can prevent that? or recovery my file? Thanks! -- Best regards, Mr.Hien E-mail: phanquochien@gmail.com Website: www.mrhien.info From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:15:12 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 14BC3106564A for ; Sat, 2 Oct 2010 12:15:12 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smtp-out3.tiscali.nl (smtp-out3.tiscali.nl [195.241.79.178]) by mx1.freebsd.org (Postfix) with ESMTP id C9DF28FC14 for ; Sat, 2 Oct 2010 12:15:11 +0000 (UTC) Received: from [212.123.145.58] (helo=sjakie.klop.ws) by smtp-out3.tiscali.nl with esmtp (Exim) (envelope-from ) id 1P20no-0006rF-TI for freebsd-fs@freebsd.org; Sat, 02 Oct 2010 14:02:56 +0200 Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id BE8D1424F for ; Sat, 2 Oct 2010 14:02:54 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: Date: Sat, 02 Oct 2010 14:02:54 +0200 MIME-Version: 1.0 From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/10.62 (FreeBSD) Content-Transfer-Encoding: quoted-printable Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:15:12 -0000 On Sat, 02 Oct 2010 13:28:59 +0200, Phan Quoc Hien =20 wrote: > hello everybody. > I'm new to freebsd, When I hard shutdown my freebsd box..it caused lost= =20 > some > file. I used UFS2. How can prevent that? or recovery my file? > Thanks! > By hard shutdown you mean pulling the power plug? UFS2 (and most other filesystems on other operating systems) guarantee =20 consistency of metadata (filenames, directory structures, etc.) after a =20 crash. However it is possible to loose the last X seconds of unwritten =20 data. That can be the complete contents of a new file. If it is really important you can mount your filesystem 'sync' see 'man =20 mount' in which case it will become slow, but more up-to-date. Ronald. From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:20:06 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F6891065674 for ; Sat, 2 Oct 2010 12:20:06 +0000 (UTC) (envelope-from phanquochien@gmail.com) Received: from mail-ww0-f42.google.com (mail-ww0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3E2728FC08 for ; Sat, 2 Oct 2010 12:20:00 +0000 (UTC) Received: by wwi18 with SMTP id 18so64542wwi.1 for ; Sat, 02 Oct 2010 05:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=xrXqo/x4RIwnxLrxzvMxRnWP/uQFyIGPSj972XIgB3I=; b=YPWAiM74Wp6mTiZ7Op++r1IV9pTDmnXqRWFi9bnC5QaBWT6PWvOj63sqHAyoyWJoXL 5G8mmH0Mv9O/Hq9p9ZMRpl622rULPfPSlw1OiIJviTX7xteYNNUz8NBH90EuPb8K957q 4s9DpMWeXOu6qu2xsGztpbrTFUd2TeH79nCwE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=m/WgWQgtTPVmLOidFS1Zdmh8jdZzvW6FNauVrU3cjK2T1e2q0U94t2v2fgNnZRDfkx u+8ZmNTIH32D5xE58jXqlFh/oIlR0AhNYiuJDGy+ch1YBlzXiEU0DsykO4msMZnU14wv 3UhcnLj103JjCli+G1vFZLjfhB+DV4k3vnNOk= MIME-Version: 1.0 Received: by 10.227.137.15 with SMTP id u15mr5885391wbt.129.1286021952810; Sat, 02 Oct 2010 05:19:12 -0700 (PDT) Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 05:19:12 -0700 (PDT) In-Reply-To: References: Date: Sat, 2 Oct 2010 19:19:12 +0700 Message-ID: From: Phan Quoc Hien To: Ronald Klop Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:20:06 -0000 Thanks for your respond.! Yes. I pulled the power plug . I edited rc.conf and save it then pulling the power plug. And system boot next time rc.conf is a blank file...! On Sat, Oct 2, 2010 at 7:02 PM, Ronald Klop wrote: > On Sat, 02 Oct 2010 13:28:59 +0200, Phan Quoc Hien > wrote: > > hello everybody. >> I'm new to freebsd, When I hard shutdown my freebsd box..it caused lost >> some >> file. I used UFS2. How can prevent that? or recovery my file? >> Thanks! >> >> > By hard shutdown you mean pulling the power plug? > > UFS2 (and most other filesystems on other operating systems) guarantee > consistency of metadata (filenames, directory structures, etc.) after a > crash. However it is possible to loose the last X seconds of unwritten data. > That can be the complete contents of a new file. > > If it is really important you can mount your filesystem 'sync' see 'man > mount' in which case it will become slow, but more up-to-date. > > Ronald. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Best regards, Mr.Hien E-mail: phanquochien@gmail.com Website: www.mrhien.info From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:20:58 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BDFF21065673 for ; Sat, 2 Oct 2010 12:20:58 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4EDC58FC14 for ; Sat, 2 Oct 2010 12:20:57 +0000 (UTC) Received: by fxm9 with SMTP id 9so3272503fxm.13 for ; Sat, 02 Oct 2010 05:20:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:subject:date :message-id:user-agent:mime-version:content-type; bh=Rv1UxpXhIk1+YoVWCmSgiyy9MRK4ZWbyawBKD4LvuZI=; b=tl2rbxgHDdd/24IxeYDJ4/Bku5tPPxca00PwJ1iahS9Bpxg7cTZ93BFIKKnqgtGEGi DQGKczEr8tf3jz7c/Vpqnmwdeb4OhsfCHXWsDJ2iRnpEUYv0i9DAXMR3rKfLRu+sYXqb A0RMIlnZL/zOcHBcIEEQtcNYvL/nhNXBuJp2U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:subject:date:message-id:user-agent:mime-version :content-type; b=WINb/0ytegrnguDI7dQwlEiK+F2C5SUX1kpuhpF8pTzx3ZwmQERQGWkwqcwY1CYOoO 6ZA/pipsf76UUi6S9HVbv0/vRRrqlgwzqEMkoQpGI+HR9xcnVKkeWA3fqWrf8rxpwmUa qd2Y1tecVaNyWDkAT/FJmRDk47cSbhWTPZfX0= Received: by 10.223.125.70 with SMTP id x6mr6497403far.85.1286022056865; Sat, 02 Oct 2010 05:20:56 -0700 (PDT) Received: from localhost ([95.69.162.97]) by mx.google.com with ESMTPS id a6sm1165582faa.20.2010.10.02.05.20.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 02 Oct 2010 05:20:56 -0700 (PDT) From: Mikolaj Golub To: freebsd-fs@freebsd.org Date: Sat, 02 Oct 2010 15:20:58 +0300 Message-ID: <86hbh44wgl.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Subject: hastd: assertion (res->hr_event != NULL) fails in secondary on split-brain X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:20:58 -0000 --=-=-= Hi, After recent changes in hastd (I think r213006: Fix descriptor leaks) if split-brain occurs hastd will abort in child_cleanup() on assertion (res->hr_event != NULL). Oct 2 17:24:17 lolek hastd[39334]: [storage] (init) Role changed to secondary. Oct 2 17:24:17 lolek hastd[39334]: Accepting connection to tcp4://0.0.0.0:8457. Oct 2 17:24:17 lolek hastd[39334]: Connection from tcp4://172.20.68.12:17367 to tcp4://172.20.68.11:8457. Oct 2 17:24:17 lolek hastd[39334]: tcp4://172.20.68.12:17367: resource=storage Oct 2 17:24:17 lolek hastd[39334]: [storage] (secondary) Initial connection from tcp4://172.20.68.12:17367. Oct 2 17:24:17 lolek hastd[39334]: [storage] (secondary) Incoming connection from tcp4://172.20.68.12:17367 configured. Oct 2 17:24:17 lolek hastd[39334]: Accepting connection to tcp4://0.0.0.0:8457. Oct 2 17:24:17 lolek hastd[39334]: Connection from tcp4://172.20.68.12:13769 to tcp4://172.20.68.11:8457. Oct 2 17:24:17 lolek hastd[39334]: tcp4://172.20.68.12:13769: resource=storage Oct 2 17:24:17 lolek hastd[39334]: [storage] (secondary) Outgoing connection to tcp4://172.20.68.12:13769 configured. Oct 2 17:24:17 lolek hastd[39339]: [storage] (secondary) Obtained info about /dev/ad4. Oct 2 17:24:17 lolek hastd[39339]: [storage] (secondary) Locked /dev/ad4. Oct 2 17:24:17 lolek hastd[39339]: [storage] (secondary) Split-brain detected, exiting. Oct 2 17:24:17 lolek hastd[39334]: Unable to receive event header: Socket is not connected. Oct 2 17:24:28 lolek hastd[39334]: Accepting connection to tcp4://0.0.0.0:8457. Oct 2 17:24:28 lolek hastd[39334]: Connection from tcp4://172.20.68.12:59760 to tcp4://172.20.68.11:8457. Oct 2 17:24:28 lolek hastd[39334]: tcp4://172.20.68.12:59760: resource=storage Oct 2 17:24:28 lolek hastd[39334]: [storage] (secondary) Initial connection from tcp4://172.20.68.12:59760. Oct 2 17:24:28 lolek hastd[39334]: [storage] (secondary) Worker process exists (pid=39339), stopping it. Oct 2 17:24:28 lolek hastd[39334]: [storage] (secondary) Worker process exited ungracefully (pid=39339, exitcode=78). Oct 2 17:24:28 lolek kernel: pid 39334 (hastd), uid 0: exited on signal 6 (core dumped) (gdb) bt #0 0x28348d87 in kill () from /lib/libc.so.7 #1 0x280e1017 in raise () from /lib/libthr.so.3 #2 0x2834787a in abort () from /lib/libc.so.7 #3 0x2832fc86 in __assert () from /lib/libc.so.7 #4 0x0805f300 in proto_close (conn=0x0) at /usr/src/sbin/hastd/proto.c:287 #5 0x0804c445 in child_cleanup (res=0x284eb500) at /usr/src/sbin/hastd/control.c:61 #6 0x0804fc6d in listen_accept () at /usr/src/sbin/hastd/hastd.c:526 #7 0x0805059a in main_loop () at /usr/src/sbin/hastd/hastd.c:673 #8 0x08050a7f in main (argc=0, argv=0xbfbfed80) at /usr/src/sbin/hastd/hastd.c:784 (gdb) fr 5 #5 0x0804c445 in child_cleanup (res=0x284eb500) at /usr/src/sbin/hastd/control.c:61 61 proto_close(res->hr_event); (gdb) list 56 child_cleanup(struct hast_resource *res) 57 { 58 59 proto_close(res->hr_ctrl); 60 res->hr_ctrl = NULL; 61 proto_close(res->hr_event); 62 res->hr_event = NULL; 63 res->hr_workerpid = 0; 64 } 65 So we have double close of res->hr_event. The first time it is closed when parent detects that worker exited in main_loop(), and the second time when a new connection from primary comes and the parent does cleanup after previously terminated child before starting new one. The straightforward fix is to check res->hr_event before closing, like in the patch below. -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=control.c.patch Index: sbin/hastd/control.c =================================================================== --- sbin/hastd/control.c (revision 213357) +++ sbin/hastd/control.c (working copy) @@ -58,8 +58,10 @@ child_cleanup(struct hast_resource *res) proto_close(res->hr_ctrl); res->hr_ctrl = NULL; - proto_close(res->hr_event); - res->hr_event = NULL; + if (res->hr_event != NULL) { + proto_close(res->hr_event); + res->hr_event = NULL; + } res->hr_workerpid = 0; } --=-=-=-- From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:25:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC7D4106566C for ; Sat, 2 Oct 2010 12:25:52 +0000 (UTC) (envelope-from bruce@cran.org.uk) Received: from muon.cran.org.uk (unknown [IPv6:2a01:348:0:15:5d59:5c40:0:1]) by mx1.freebsd.org (Postfix) with ESMTP id 417838FC08 for ; Sat, 2 Oct 2010 12:25:52 +0000 (UTC) Received: from muon.cran.org.uk (localhost [127.0.0.1]) by muon.cran.org.uk (Postfix) with ESMTP id 58CB1E7F74; Sat, 2 Oct 2010 13:25:51 +0100 (BST) Received: from unknown (client-82-31-11-222.midd.adsl.virginmedia.com [82.31.11.222]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by muon.cran.org.uk (Postfix) with ESMTPSA; Sat, 2 Oct 2010 13:25:50 +0100 (BST) Date: Sat, 2 Oct 2010 13:25:48 +0100 From: Bruce Cran To: Phan Quoc Hien Message-ID: <20101002132548.00002898@unknown> In-Reply-To: References: X-Mailer: Claws Mail 3.7.6 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Ronald Klop Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:25:52 -0000 On Sat, 2 Oct 2010 19:19:12 +0700 Phan Quoc Hien wrote: > Thanks for your respond.! > Yes. I pulled the power plug . > I edited rc.conf and save it then pulling the power plug. And system > boot next time rc.conf is a blank file...! This is an issue when using SoftUpdates - data isn't written to disk immediately, and empty files are produced when power is lost. See http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES for details. -- Bruce Cran From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:42:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5621C106566B for ; Sat, 2 Oct 2010 12:42:11 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smtp-out3.tiscali.nl (smtp-out3.tiscali.nl [195.241.79.178]) by mx1.freebsd.org (Postfix) with ESMTP id 152F98FC0C for ; Sat, 2 Oct 2010 12:42:10 +0000 (UTC) Received: from [212.123.145.58] (helo=sjakie.klop.ws) by smtp-out3.tiscali.nl with esmtp (Exim) (envelope-from ) id 1P21Pl-0002sg-VN for freebsd-fs@freebsd.org; Sat, 02 Oct 2010 14:42:10 +0200 Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id 33ACF42A2 for ; Sat, 2 Oct 2010 14:42:08 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: Date: Sat, 02 Oct 2010 14:42:08 +0200 MIME-Version: 1.0 From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/10.62 (FreeBSD) Content-Transfer-Encoding: quoted-printable Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:42:11 -0000 On Sat, 02 Oct 2010 14:19:12 +0200, Phan Quoc Hien =20 wrote: > Thanks for your respond.! > Yes. I pulled the power plug . > I edited rc.conf and save it then pulling the power plug. And system bo= ot > next time rc.conf is a blank file...! Don't pull the power plug if you don't have to. The command to reboot is = =20 'shutdown -r now' and than all pending data will be saved safely. Ronald. > On Sat, Oct 2, 2010 at 7:02 PM, Ronald Klop =20 > wrote: > >> On Sat, 02 Oct 2010 13:28:59 +0200, Phan Quoc Hien =20 >> >> wrote: >> >> hello everybody. >>> I'm new to freebsd, When I hard shutdown my freebsd box..it caused lo= st >>> some >>> file. I used UFS2. How can prevent that? or recovery my file? >>> Thanks! >>> >>> >> By hard shutdown you mean pulling the power plug? >> >> UFS2 (and most other filesystems on other operating systems) guarantee >> consistency of metadata (filenames, directory structures, etc.) after = a >> crash. However it is possible to loose the last X seconds of unwritten= =20 >> data. >> That can be the complete contents of a new file. >> >> If it is really important you can mount your filesystem 'sync' see 'ma= n >> mount' in which case it will become slow, but more up-to-date. >> >> Ronald. >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:49:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 302D3106564A for ; Sat, 2 Oct 2010 12:49:52 +0000 (UTC) (envelope-from phanquochien@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id B5F108FC1C for ; Sat, 2 Oct 2010 12:49:51 +0000 (UTC) Received: by wyb29 with SMTP id 29so2592705wyb.13 for ; Sat, 02 Oct 2010 05:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=Sp6yWP1DBXm2eVs+n/kL43YSqRwOnvJGaR0qUDwB2bY=; b=kCFdxcNSmKuiahoHV7qdE2LVW8jhcrJmmYvxCHjlEplikTCqD7fbfh9hIkY61ULlXj +rhB2o2SF76hllIYLXE/4XZtyzjL88goLUB974Tv5QWwjzU5uXVwEFcDTMwaJ5FPvEXd ZseBe7YrhwhQEsuCSyZz71pOtIaC5D/bXhdVY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=hFzU0IWWkddHoWuajKO345EioUEZEWom2T0IBHcrSg2i/IWXRqAM8OI6Dh18K7uFZA ptsCaVUUf+iqcylRpHdvYa6hCIdkASXWumcj9MhVfPIYhrZdv8xOehLkugo1voLJVI3Y DQnIb5N+s7/jVf/hGLWxXW5K4yh0lA9QMA0Ao= MIME-Version: 1.0 Received: by 10.227.151.195 with SMTP id d3mr5646668wbw.170.1286023790106; Sat, 02 Oct 2010 05:49:50 -0700 (PDT) Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 05:49:50 -0700 (PDT) In-Reply-To: <20101002132548.00002898@unknown> References: <20101002132548.00002898@unknown> Date: Sat, 2 Oct 2010 19:49:50 +0700 Message-ID: From: Phan Quoc Hien To: Bruce Cran Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Ronald Klop Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:49:52 -0000 Thank for your respond. I have checked my fstab file. I didn't see any option about SoftUpdates for my / partition. On Sat, Oct 2, 2010 at 7:25 PM, Bruce Cran wrote: > On Sat, 2 Oct 2010 19:19:12 +0700 > Phan Quoc Hien wrote: > > > Thanks for your respond.! > > Yes. I pulled the power plug . > > I edited rc.conf and save it then pulling the power plug. And system > > boot next time rc.conf is a blank file...! > > This is an issue when using SoftUpdates - data isn't written to disk > immediately, and empty files are produced when power is lost. See > http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES for > details. > > -- > Bruce Cran > -- Best regards, Mr.Hien E-mail: phanquochien@gmail.com Website: www.mrhien.info From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 12:59:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5AE7C106564A for ; Sat, 2 Oct 2010 12:59:05 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smtp-out1.tiscali.nl (smtp-out1.tiscali.nl [195.241.79.176]) by mx1.freebsd.org (Postfix) with ESMTP id 185AD8FC1B for ; Sat, 2 Oct 2010 12:59:05 +0000 (UTC) Received: from [212.123.145.58] (helo=sjakie.klop.ws) by smtp-out1.tiscali.nl with esmtp (Exim) (envelope-from ) id 1P21g8-000445-0X for freebsd-fs@freebsd.org; Sat, 02 Oct 2010 14:59:04 +0200 Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id 5918C42BA for ; Sat, 2 Oct 2010 14:59:02 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <20101002132548.00002898@unknown> Date: Sat, 02 Oct 2010 14:59:02 +0200 MIME-Version: 1.0 From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/10.62 (FreeBSD) Content-Transfer-Encoding: quoted-printable Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 12:59:05 -0000 On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien =20 wrote: > Thank for your respond. I have checked my fstab file. I didn't see any > option about SoftUpdates for my / partition. When you give the command 'mount' you will see several lines like this. /dev/ad8s1d on /var (ufs, local, soft-updates) Softupdates can be enabled/disabled with the command tunefs. See 'man =20 tunefs'. My advice is to not pull the power plug after changing critical files, bu= t =20 to reboot cleanly. Than there is no problem for 99% of the time and your = =20 computer is fast also. Or is there a reason for you to prefer the power plug? Ronald. > On Sat, Oct 2, 2010 at 7:25 PM, Bruce Cran wrote: > >> On Sat, 2 Oct 2010 19:19:12 +0700 >> Phan Quoc Hien wrote: >> >> > Thanks for your respond.! >> > Yes. I pulled the power plug . >> > I edited rc.conf and save it then pulling the power plug. And system >> > boot next time rc.conf is a blank file...! >> >> This is an issue when using SoftUpdates - data isn't written to disk >> immediately, and empty files are produced when power is lost. See >> http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES fo= r >> details. >> >> -- >> Bruce Cran >> > > From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 13:07:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8A8C1065673 for ; Sat, 2 Oct 2010 13:07:30 +0000 (UTC) (envelope-from phanquochien@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 378888FC15 for ; Sat, 2 Oct 2010 13:07:30 +0000 (UTC) Received: by wyb29 with SMTP id 29so2602553wyb.13 for ; Sat, 02 Oct 2010 06:07:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=q4wMwwrkNN5vOWJMKRGD4qXBwlNKI28ZA9LH/fEWSJE=; b=JNjciMuiroaAAV7NmyFlA7NhzbG1WYWNeC5+eTfAChkQs5ipts6KdYHaRzd5xtYQSE khYQpyO6ztlrisc0SitPhvgk6tFRl1/Q27gsccDqtCQ6myYYS8iySZcCF9WiFCw8Fuf4 5HTF16WoicI/R3P6dWoSvc3aAi/UJdZ7WMFvg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=xbpDz3SSbZY1Gc0f25E3Fkm5s0OqRpMv4bj6h61gtfAwsnUF+aiJDVVpmHr62GRRFt fU0yiUVxbEQ3DZ0cA0dAom7WfWxyoID5g6qwG8hx2uqSW4sUXv1oX02PjILcWBNNEb99 p2f9zei54e5gf8RRYRqOqlM7DZBLzU0PrJ1js= MIME-Version: 1.0 Received: by 10.227.134.136 with SMTP id j8mr5463231wbt.206.1286024848461; Sat, 02 Oct 2010 06:07:28 -0700 (PDT) Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 06:07:28 -0700 (PDT) In-Reply-To: References: <20101002132548.00002898@unknown> Date: Sat, 2 Oct 2010 20:07:28 +0700 Message-ID: From: Phan Quoc Hien To: Ronald Klop Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 13:07:30 -0000 The reason is power supply problems! Thank for your respond again. Have a nice day. On Sat, Oct 2, 2010 at 7:59 PM, Ronald Klop wrote: > On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien > wrote: > > Thank for your respond. I have checked my fstab file. I didn't see any >> option about SoftUpdates for my / partition. >> > > When you give the command 'mount' you will see several lines like this. > > /dev/ad8s1d on /var (ufs, local, soft-updates) > > Softupdates can be enabled/disabled with the command tunefs. See 'man > tunefs'. > > My advice is to not pull the power plug after changing critical files, but > to reboot cleanly. Than there is no problem for 99% of the time and your > computer is fast also. > Or is there a reason for you to prefer the power plug? > > Ronald. > > > > On Sat, Oct 2, 2010 at 7:25 PM, Bruce Cran wrote: >> >> On Sat, 2 Oct 2010 19:19:12 +0700 >>> Phan Quoc Hien wrote: >>> >>> > Thanks for your respond.! >>> > Yes. I pulled the power plug . >>> > I edited rc.conf and save it then pulling the power plug. And system >>> > boot next time rc.conf is a blank file...! >>> >>> This is an issue when using SoftUpdates - data isn't written to disk >>> immediately, and empty files are produced when power is lost. See >>> http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES for >>> details. >>> >>> -- >>> Bruce Cran >>> >>> >> >> _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Best regards, Mr.Hien E-mail: phanquochien@gmail.com Website: www.mrhien.info From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 13:16:09 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEB2C106566B for ; Sat, 2 Oct 2010 13:16:09 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id B247E8FC08 for ; Sat, 2 Oct 2010 13:16:09 +0000 (UTC) Received: by iwn34 with SMTP id 34so6046160iwn.13 for ; Sat, 02 Oct 2010 06:16:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=lLxnl89tClF5S9RxUC9baCzcYJT+jq5mA/iSpxygqOs=; b=E64Ba3MoVsPY+Z/QZI3T77PeYLbta8OJTuJkVlk/Y7PkkvqyKFKUckWnp7zP8AR7Uv jPXuV5ZEzSHZaln4zwBc6h03pgJCJQzd/W/R7JJULevy9cmkPaROLwZy7R+UR0Itk6eI 9BAHQPfYcy9Nur3iH2ac9EjxHyVojdXESbIa0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=RjrX914t7WManA2wU4olPzq4fcTiT6HZ1QiZDDoC0t1z5nZSBIrkVvfAW8KRsCSD03 7sICVa7FPohh2FeX2s66bLfjsgqGohdKOt3nsoc6lJm/ZUOTOXH6aZQww87+VpqCDNdj nAAOwvnl9jeGY6akxYmsOZaMT6kxHvJCpiVX0= MIME-Version: 1.0 Received: by 10.231.167.130 with SMTP id q2mr7200070iby.163.1286025028464; Sat, 02 Oct 2010 06:10:28 -0700 (PDT) Received: by 10.231.139.90 with HTTP; Sat, 2 Oct 2010 06:10:28 -0700 (PDT) In-Reply-To: References: <20101002132548.00002898@unknown> Date: Sat, 2 Oct 2010 15:10:28 +0200 Message-ID: From: =?UTF-8?Q?=C5=A0imun_Mikecin?= To: Phan Quoc Hien Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org, Ronald Klop Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 13:16:10 -0000 2010/10/2, Phan Quoc Hien : > The reason is power supply problems! Thank for your respond again. > Have a nice day. In that case, ZFS should be your friend! :-) From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 13:26:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A85F5106566B for ; Sat, 2 Oct 2010 13:26:50 +0000 (UTC) (envelope-from phanquochien@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3A8DC8FC0A for ; Sat, 2 Oct 2010 13:26:49 +0000 (UTC) Received: by wyb29 with SMTP id 29so2613223wyb.13 for ; Sat, 02 Oct 2010 06:26:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=K9QIAzQVBH0O+RdfAfMw50mR3K0i5fWijkIdbMPQLrE=; b=tzPLZgi8Yvv2ewEoQ1R6FeNIT02bIt3XMlGSaMtbmot3I1yQfbivu40zNj+9p9NPbj TzhaUdG1nw/cyqA6MwGBiOFdyhgGZFSJ3o9ygto4A16McHxZ6aSxbIwv1tIUvCRUo5TE wZ0SbMzxGLZYrKUCKqN5GuWOievTksmSfwrAs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=g8mqE+c7ZqycrLSVj0t8iTlRt4r6VQjCKZ4rXlfv8ZagcAs9GuEEcya/d+Xe2cAMI7 FFgmb+jW+xtuXy4eRkFXPYznx2QDB3pZcUa6WiOMNcsQET41n1cntsvs7VNNX7GW+YVS YPThQ65+ol5ZuZdOqwFfKTg+95pFaRl/dYISQ= MIME-Version: 1.0 Received: by 10.227.151.195 with SMTP id d3mr5677070wbw.170.1286026008289; Sat, 02 Oct 2010 06:26:48 -0700 (PDT) Received: by 10.227.142.194 with HTTP; Sat, 2 Oct 2010 06:26:48 -0700 (PDT) In-Reply-To: References: <20101002132548.00002898@unknown> Date: Sat, 2 Oct 2010 20:26:48 +0700 Message-ID: From: Phan Quoc Hien To: Ronald Klop Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 13:26:50 -0000 On Sat, Oct 2, 2010 at 7:59 PM, Ronald Klop wrote: > On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien > wrote: > > Thank for your respond. I have checked my fstab file. I didn't see any >> option about SoftUpdates for my / partition. >> > > When you give the command 'mount' you will see several lines like this. > > /dev/ad8s1d on /var (ufs, local, soft-updates) > > Softupdates can be enabled/disabled with the command tunefs. See 'man > tunefs'. > > When I run mount command. it shown: $ mount /dev/ad0s1a on / (ufs, local) devfs on /dev (devfs, local, multilabel) -- Best regards, Mr.Hien E-mail: phanquochien@gmail.com Website: www.mrhien.info From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 14:21:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F6C7106566B for ; Sat, 2 Oct 2010 14:21:47 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id 11DCD8FC19 for ; Sat, 2 Oct 2010 14:21:46 +0000 (UTC) Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90]) by qmta02.emeryville.ca.mail.comcast.net with comcast id Dq5S1f0031wfjNsA2qMmGY; Sat, 02 Oct 2010 14:21:46 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta23.emeryville.ca.mail.comcast.net with comcast id DqMl1f00A3LrwQ28jqMlXu; Sat, 02 Oct 2010 14:21:46 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 713489B418; Sat, 2 Oct 2010 07:21:45 -0700 (PDT) Date: Sat, 2 Oct 2010 07:21:45 -0700 From: Jeremy Chadwick To: Phan Quoc Hien Message-ID: <20101002142145.GA70541@icarus.home.lan> References: <20101002132548.00002898@unknown> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Ronald Klop Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 14:21:47 -0000 On Sat, Oct 02, 2010 at 08:26:48PM +0700, Phan Quoc Hien wrote: > On Sat, Oct 2, 2010 at 7:59 PM, Ronald Klop wrote: > > > On Sat, 02 Oct 2010 14:49:50 +0200, Phan Quoc Hien > > wrote: > > > > Thank for your respond. I have checked my fstab file. I didn't see any > >> option about SoftUpdates for my / partition. > >> > > > > When you give the command 'mount' you will see several lines like this. > > > > /dev/ad8s1d on /var (ufs, local, soft-updates) > > > > Softupdates can be enabled/disabled with the command tunefs. See 'man > > tunefs'. > > > > When I run mount command. it shown: > $ mount > /dev/ad0s1a on / (ufs, local) > devfs on /dev (devfs, local, multilabel) I didn't see anyone mention this to you in the thread, but: By default in FreeBSD (during the installation phase), softupdates are explicitly **not** applied to the root filesystem. This is intentional, but the reason for it I do not know. I imagine it's justified though. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 14:30:45 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4162D1065693 for ; Sat, 2 Oct 2010 14:30:45 +0000 (UTC) (envelope-from bruce@cran.org.uk) Received: from muon.cran.org.uk (unknown [IPv6:2a01:348:0:15:5d59:5c40:0:1]) by mx1.freebsd.org (Postfix) with ESMTP id C8F838FC08 for ; Sat, 2 Oct 2010 14:30:44 +0000 (UTC) Received: from muon.cran.org.uk (localhost [127.0.0.1]) by muon.cran.org.uk (Postfix) with ESMTP id DBED0E616D; Sat, 2 Oct 2010 15:30:43 +0100 (BST) Received: from unknown (client-82-31-11-222.midd.adsl.virginmedia.com [82.31.11.222]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by muon.cran.org.uk (Postfix) with ESMTPSA; Sat, 2 Oct 2010 15:30:42 +0100 (BST) Date: Sat, 2 Oct 2010 15:30:40 +0100 From: Bruce Cran To: Jeremy Chadwick Message-ID: <20101002153040.00001993@unknown> In-Reply-To: <20101002142145.GA70541@icarus.home.lan> References: <20101002132548.00002898@unknown> <20101002142145.GA70541@icarus.home.lan> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Klop , Ronald Subject: Re: Data loss when hard shutdown! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 14:30:45 -0000 On Sat, 2 Oct 2010 07:21:45 -0700 Jeremy Chadwick wrote: > By default in FreeBSD (during the installation phase), softupdates are > explicitly **not** applied to the root filesystem. This is > intentional, but the reason for it I do not know. I imagine it's > justified though. See http://www.freebsd.org/doc/en/books/faq/disks.html#SAFE-SOFTUPDATES -- Bruce Cran From owner-freebsd-fs@FreeBSD.ORG Sat Oct 2 16:26:06 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5E14106564A; Sat, 2 Oct 2010 16:26:06 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3872D8FC17; Sat, 2 Oct 2010 16:26:05 +0000 (UTC) Received: by fxm9 with SMTP id 9so3354844fxm.13 for ; Sat, 02 Oct 2010 09:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=xcK9desYQpRHqcDeASxNdUusoVhCZM3HTGqO7r6gcLY=; b=q2MR3fEFZeIwGGqzSgMLaObtJkxNMGVrf4r8pv9vxQz17f7sLxS9NYwCq1mWHrxuEK 0TrzyBVJMpD1M6jR9pu7AeY7YNIarFElYCoLaNsCrc+/pGfiRuFa0mOEd+7Z+mShXo1t uKmC26r3OO/KaI0zd5Fqv7kVYoXo1iW4cX0Ew= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=fUwvsbBB0ZEzqYWOIgbeZdCavU6uQd/chjWIXabeFgvwl/LSN9mFfKEL/8HiKvJ6OP ePWYHTNreUCYBXpetIruK5j8yH7R61gUQIUDDus0eNci4vQ+OB+y9uiy6vU8+cvf8Zuw Z2z/4pmbz/ZAqxEtrfSbYZR0wm6L4MYNzQ+bA= Received: by 10.223.103.84 with SMTP id j20mr6907188fao.35.1286036765051; Sat, 02 Oct 2010 09:26:05 -0700 (PDT) Received: from localhost ([95.69.162.97]) by mx.google.com with ESMTPS id h12sm1250305faa.13.2010.10.02.09.26.03 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 02 Oct 2010 09:26:04 -0700 (PDT) From: Mikolaj Golub To: freebsd-fs@freebsd.org References: <86hbh44wgl.fsf@kopusha.home.net> X-Comment-To: Mikolaj Golub Date: Sat, 02 Oct 2010 19:26:05 +0300 In-Reply-To: <86hbh44wgl.fsf@kopusha.home.net> (Mikolaj Golub's message of "Sat, 02 Oct 2010 15:20:58 +0300") Message-ID: <86aamw4l42.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: pjd@freebsd.org Subject: Re: hastd: assertion (res->hr_event != NULL) fails in secondary on split-brain X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 16:26:07 -0000 --=-=-= On Sat, 02 Oct 2010 15:20:58 +0300 Mikolaj Golub wrote: MG> After recent changes in hastd (I think r213006: Fix descriptor leaks) if MG> split-brain occurs hastd will abort in child_cleanup() on assertion MG> (res->hr_event != NULL). ... MG> So we have double close of res->hr_event. The first time it is closed when MG> parent detects that worker exited in main_loop(), and the second time when a MG> new connection from primary comes and the parent does cleanup after previously MG> terminated child before starting new one. MG> The straightforward fix is to check res->hr_event before closing, like in the MG> patch below. MG> -- MG> Mikolaj Golub MG> Index: sbin/hastd/control.c MG> =================================================================== MG> --- sbin/hastd/control.c (revision 213357) MG> +++ sbin/hastd/control.c (working copy) MG> @@ -58,8 +58,10 @@ child_cleanup(struct hast_resource *res) MG> MG> proto_close(res->hr_ctrl); MG> res->hr_ctrl = NULL; MG> - proto_close(res->hr_event); MG> - res->hr_event = NULL; MG> + if (res->hr_event != NULL) { MG> + proto_close(res->hr_event); MG> + res->hr_event = NULL; MG> + } MG> res->hr_workerpid = 0; MG> } MG> Running with this fix another issue is observed. On split-brain `hastctl status' on secondary will return "[ERROR] Error 32 received from hastd" most of the times. And only for some runs an output will be returned. lolek# hastctl status storage [ERROR] Error 32 received from hastd. lolek# hastctl status storage [ERROR] Error 32 received from hastd. lolek# hastctl status storage storage: role: secondary provname: storage localpath: /dev/ad4 extentsize: 2097152 keepdirty: 0 remoteaddr: tcp4://bolek replication: memsync status: complete dirty: 0 bytes lolek# hastctl status storage [ERROR] Error 32 received from hastd. This is because hastd clears res->hr_workerpid only when a new connection from the primary comes. Whilst hastd checks res->hr_workerpid in control_status() and if it is not zero it tries to get info from the worker and returns error (broken pipe) if the worker is actually not running. So it looks like it is better not just to close res->hr_ctrl in main_loop() but to do full child cleanup here -- straight away its exit is detected. What do you think about the attached patch? -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=hast.child_kill.patch Index: sbin/hastd/hastd.c =================================================================== --- sbin/hastd/hastd.c (revision 213357) +++ sbin/hastd/hastd.c (working copy) @@ -94,22 +94,6 @@ g_gate_load(void) } static void -child_exit_log(unsigned int pid, int status) -{ - - if (WIFEXITED(status) && WEXITSTATUS(status) == 0) { - pjdlog_debug(1, "Worker process exited gracefully (pid=%u).", - pid); - } else if (WIFSIGNALED(status)) { - pjdlog_error("Worker process killed (pid=%u, signal=%d).", - pid, WTERMSIG(status)); - } else { - pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).", - pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1); - } -} - -static void child_exit(void) { struct hast_resource *res; @@ -388,8 +372,6 @@ listen_accept(void) const unsigned char *token; char laddr[256], raddr[256]; size_t size; - pid_t pid; - int status; proto_local_address(cfg->hc_listenconn, laddr, sizeof(laddr)); pjdlog_debug(1, "Accepting connection to %s.", laddr); @@ -504,26 +486,7 @@ listen_accept(void) "Worker process exists (pid=%u), stopping it.", (unsigned int)res->hr_workerpid); /* Stop child process. */ - if (kill(res->hr_workerpid, SIGINT) < 0) { - pjdlog_errno(LOG_ERR, - "Unable to stop worker process (pid=%u)", - (unsigned int)res->hr_workerpid); - /* - * Other than logging the problem we - * ignore it - nothing smart to do. - */ - } - /* Wait for it to exit. */ - else if ((pid = waitpid(res->hr_workerpid, - &status, 0)) != res->hr_workerpid) { - /* We can only log the problem. */ - pjdlog_errno(LOG_ERR, - "Waiting for worker process (pid=%u) failed", - (unsigned int)res->hr_workerpid); - } else { - child_exit_log(res->hr_workerpid, status); - } - child_cleanup(res); + child_kill(res); } else if (res->hr_remotein != NULL) { char oaddr[256]; @@ -678,8 +641,8 @@ main_loop(void) if (event_recv(res) == 0) continue; /* The worker process exited? */ - proto_close(res->hr_event); - res->hr_event = NULL; + if (res->hr_workerpid != 0) + child_kill(res); } } } Index: sbin/hastd/control.c =================================================================== --- sbin/hastd/control.c (revision 213357) +++ sbin/hastd/control.c (working copy) @@ -63,6 +63,51 @@ child_cleanup(struct hast_resource *res) res->hr_workerpid = 0; } +void +child_exit_log(unsigned int pid, int status) +{ + + if (WIFEXITED(status) && WEXITSTATUS(status) == 0) { + pjdlog_debug(1, "Worker process exited gracefully (pid=%u).", + pid); + } else if (WIFSIGNALED(status)) { + pjdlog_error("Worker process killed (pid=%u, signal=%d).", + pid, WTERMSIG(status)); + } else { + pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).", + pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1); + } +} + +void +child_kill(struct hast_resource *res) +{ + pid_t pid; + int status; + + assert(res->hr_workerpid != 0); + + if (kill(res->hr_workerpid, SIGINT) < 0) { + pjdlog_errno(LOG_ERR, + "Unable to stop worker process (pid=%u)", + (unsigned int)res->hr_workerpid); + /* + * Other than logging the problem we + * ignore it - nothing smart to do. + */ + } + /* Wait for it to exit. */ + else if ((pid = waitpid(res->hr_workerpid, + &status, 0)) != res->hr_workerpid) { + /* We can only log the problem. */ + pjdlog_errno(LOG_ERR, + "Waiting for worker process (pid=%u) failed", + (unsigned int)res->hr_workerpid); + } + child_exit_log(res->hr_workerpid, status); + child_cleanup(res); +} + static void control_set_role_common(struct hastd_config *cfg, struct nv *nvout, uint8_t role, struct hast_resource *res, const char *name, unsigned int no) @@ -107,22 +152,8 @@ control_set_role_common(struct hastd_config *cfg, * If previous role was primary or secondary we have to kill process * doing that work. */ - if (res->hr_workerpid != 0) { - if (kill(res->hr_workerpid, SIGTERM) < 0) { - pjdlog_errno(LOG_WARNING, - "Unable to kill worker process %u", - (unsigned int)res->hr_workerpid); - } else if (waitpid(res->hr_workerpid, NULL, 0) != - res->hr_workerpid) { - pjdlog_errno(LOG_WARNING, - "Error while waiting for worker process %u", - (unsigned int)res->hr_workerpid); - } else { - pjdlog_debug(1, "Worker process %u stopped.", - (unsigned int)res->hr_workerpid); - } - child_cleanup(res); - } + if (res->hr_workerpid != 0) + child_kill(res); /* Start worker process if we are changing to primary. */ if (role == HAST_ROLE_PRIMARY) Index: sbin/hastd/control.h =================================================================== --- sbin/hastd/control.h (revision 213357) +++ sbin/hastd/control.h (working copy) @@ -39,6 +39,8 @@ struct hastd_config; struct hast_resource; void child_cleanup(struct hast_resource *res); +void child_exit_log(unsigned int pid, int status); +void child_kill(struct hast_resource *res); void control_set_role(struct hast_resource *res, uint8_t role); --=-=-=--