From owner-freebsd-fs@FreeBSD.ORG Sun Feb 19 13:28:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CDE5F1065672 for ; Sun, 19 Feb 2012 13:28:42 +0000 (UTC) (envelope-from shuey@fmepnet.org) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 862648FC17 for ; Sun, 19 Feb 2012 13:28:42 +0000 (UTC) Received: by vcmm1 with SMTP id m1so4529345vcm.13 for ; Sun, 19 Feb 2012 05:28:42 -0800 (PST) Received-SPF: pass (google.com: domain of shuey@fmepnet.org designates 10.220.153.201 as permitted sender) client-ip=10.220.153.201; Authentication-Results: mr.google.com; spf=pass (google.com: domain of shuey@fmepnet.org designates 10.220.153.201 as permitted sender) smtp.mail=shuey@fmepnet.org Received: from mr.google.com ([10.220.153.201]) by 10.220.153.201 with SMTP id l9mr9414537vcw.1.1329658122003 (num_hops = 1); Sun, 19 Feb 2012 05:28:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.153.201 with SMTP id l9mr7513432vcw.1.1329658121879; Sun, 19 Feb 2012 05:28:41 -0800 (PST) Received: by 10.220.64.141 with HTTP; Sun, 19 Feb 2012 05:28:41 -0800 (PST) X-Originating-IP: [98.223.59.225] In-Reply-To: <1329595563.42839.28.camel@btw.pki2.com> References: <1329595563.42839.28.camel@btw.pki2.com> Date: Sun, 19 Feb 2012 08:28:41 -0500 Message-ID: From: Michael Shuey To: dg17@penx.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlCgPRFZRZWo2NR7plZIN8OUDHbfHc18HVr1UeQMd7C7zwh1sfPQP9B0u15lwQAkesyGJfO Cc: freebsd-fs@freebsd.org Subject: Re: ZFS size reduced, 100% full, on fbsd9 upgrade X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2012 13:28:42 -0000 Okay, today's lesson: When you replace a disk with a bigger drive, and it increases your raidz2's pool capacity, ALWAYS run a zpool scrub before doing anything else. I rebooted back to 8.2p6, ran a (somewhat longer than normal) scrub, rebooted, then booted back to 9.0. Seems fine now, and is finishing its freebsd-update. Weird....but at least it works. On Sat, Feb 18, 2012 at 3:06 PM, Dennis Glatting wrote: > I'm not a ZFS wiz but... > > > On Sat, 2012-02-18 at 10:25 -0500, Michael Shuey wrote: >> I'm upgrading a server from 8.2p6 to 9.0-RELEASE, and I've tried both >> make in the source tree and freebsd-update and I get the same strange >> result. =A0As soon as I boot to the fbsd9 kernel, even booting into >> single-user mode, the pool's size is greatly reduced. =A0All filesystems >> show 100% full (0 bytes free space), nothing can be written to the >> pool (probably a side-effect of being 100% full), and dmesg shows >> several of "Solaris: WARNING: metaslab_free_dva(): bad DVA >> 0:5978620460544" warnings (with different numbers). =A0Switching kernels >> back to the 8.2p6 kernel restores things to normal, but I'd really >> like to finish my fbsd9 upgrade. >> >> The system is a 64-bit Intel box with 4 GB of memory, and 8 disks in a >> raidz2 pool called "pool". =A0It's booted to the 8.2p6 kernel now, and >> scrubbing the pool, but last time I did this (roughly a week ago) it >> was fine. =A0/ is a gmirror, but /usr, /tmp, and /var all come from the >> pool. =A0Normally, the pool has 1.2 TB of free space, and is version 15 >> (zfs version 4). =A0Some disks are WD drives, with 4k native sectors, >> but some time ago I rebuilt the pool to use a native 4k sector size >> (ashift=3D12). >> > > I believe 4GB of memory is the minimum. More is better. When you use the > minimum of anything, expect dodginess. > > You should upgrade your pool -- bug fixes and all that. > > Are all the disks 4k sectors? I found that a mix of 512 and 4k work but > performance is best when they are all the same. I have also found 512 > emulation isn't a believable choice when looking at performance (i.e., > set for 4k). > > Different people have different opinions but I personally do not use ZFS > for the OS, rather I RAID1 the OS. The question you have to ask is > if /usr goes kablewie whether you have he skills to put it back > together. I do not, so "simple" (i.e., hardware RAID1) for the OS is > good for me -- it isn't the OS that's being worked in my setups, rather > the data areas. > > >> Over time, I've been slowly replacing disks (1 at a time) to increase >> the free space in the pool. =A0Also, the system experienced severe >> failure recently; the power supply blew, and took out the memory (and >> presumably motherboard). =A0I replaced these last week with known-good >> board/memory/processor/PS, and it's been running fine since. >> > > Expect mixed results with mixed disks, at least from my experience, > particularly when it comes to performance. > > Is the MB the same? I have had mixed results. I find the Gigabyte boards > work well but ASUS dodgy when it comes to high interrupt handling. > Server boards with ECC memory are the most reliable. > > >> Any suggestions? =A0Is it possible I've got some nasty pool corruption >> going on - and if so, how do I go about fixing it? =A0Any advice would >> be appreciated. =A0This is a backup server, so I could rebuild its >> contents from the primary, but I'd rather fix it if possible (since I >> want to do a fbsd9 upgrade on the primary next). > > I screw around with my set ups. What I found is rebuilding the pool > (when I screw it up) is the least troublesome approach. > > Recently I found a tray bad on one of my servers. Drove me nuts for two > weeks. It could be a loose cable, or bad cable, or crimped cable, but I > am not yet in the position to open the case. Most of my ZFS weirdnesses > have been hardware related. > > It could be your blowout impacted your disks or wiring. Do you SMART? I > found, generally, SMART is goodness but I presently have a question mark > when it comes to the Hitachi 4TB disks (I misbehaved on that system so > then issue could be my own; however on another system there wasn't any > errors). > > I have found, when I have multiple, identical controllers, that the same > firmware across the controllers is a good approach, otherwise weirdness > and different MBs manifest this problem in different ways. Also, make > sure your MB's BIOS is recent. > > YMMV > > > > From owner-freebsd-fs@FreeBSD.ORG Sun Feb 19 16:55:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 548241065672; Sun, 19 Feb 2012 16:55:46 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.freebsd.org (Postfix) with ESMTP id BBE0F8FC0A; Sun, 19 Feb 2012 16:55:45 +0000 (UTC) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.14.4/jtpda-5.4) with ESMTP id q1JGtInM021294 ; Sun, 19 Feb 2012 17:55:31 +0100 (CET) X-Ids: 168 Received: from heho.snv.jussieu.fr (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.14.3/8.14.3) with ESMTP id q1JGsoLU054604; Sun, 19 Feb 2012 17:54:50 +0100 (CET) (envelope-from arno@heho.snv.jussieu.fr) Received: (from arno@localhost) by heho.snv.jussieu.fr (8.14.3/8.14.3/Submit) id q1JGsoIr054599; Sun, 19 Feb 2012 17:54:50 +0100 (CET) (envelope-from arno) To: Martin Simmons From: "Arno J. Klaassen" References: <201202141820.q1EIK1MP032526@higson.cam.lispworks.com> Date: Sun, 19 Feb 2012 17:54:50 +0100 In-Reply-To: (Arno J. Klaassen's message of "Sat\, 18 Feb 2012 18\:55\:17 +0100") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Miltered: at jchkmail.jussieu.fr with ID 4F412976.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 4F412976.000/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/ Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: 9-stable: one-device ZFS fails [was: 9-stable : geli + one-disk ZFS fails] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2012 16:55:46 -0000 a followup to myself > Hello, > > Martin Simmons writes: > >> Some random ideas: >> >> 1) Can you dd the whole of ada0s3.eli without errors? >> >> 2) If you scrub a few more times, does it find the same number of errors each >> time and are they always in that XNAT.tar file? >> >> 3) Can you try zfs without geli? > > > yeah, and it seems to rule out geli : > > [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and > /dev/ada0s4 ] > > geli init /dev/ada0s3 > geli attach /dev/ada0s3 > > zpool create zgeli /dev/ada0s3.eli > > zfs create zgeli/home > zfs create zgeli/home/arno > zfs create zgeli/home/arno/.priv > zfs create zgeli/home/arno/.scito > zfs set copies=2 zgeli/home/arno/.priv > zfs set atime=off zgeli > > > [put some files on it, wait a little : ] > > > [root@cc ~]# zpool status -v > pool: zgeli > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scan: scrub in progress since Sat Feb 18 17:46:54 2012 > 425M scanned out of 2.49G at 85.0M/s, 0h0m to go > 0 repaired, 16.64% done > config: > > NAME STATE READ WRITE CKSUM > zgeli ONLINE 0 0 1 > ada0s3.eli ONLINE 0 0 2 > > errors: Permanent errors have been detected in the following files: > > /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso > [root@cc ~]# zpool scrub -s zgeli > [root@cc ~]# > > > [then idem directly on next partition ] > > zpool create zgpart /dev/ada0s4 > > zfs create zgpart/home > zfs create zgpart/home/arno > zfs create zgpart/home/arno/.priv > zfs create zgpart/home/arno/.scito > zfs set copies=2 zgpart/home/arno/.priv > zfs set atime=off zgpart > > [put some files on it, wait a little : ] > > pool: zgpart > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012 > config: > > NAME STATE READ WRITE CKSUM > zgpart ONLINE 0 0 1 > ada0s4 ONLINE 0 0 2 > > errors: Permanent errors have been detected in the following files: > > /zgpart/home/arno/.scito/ .... > [root@cc ~]# I tested a bit more this afternoon : - zpool create zgpart /dev/ada0s4d => KO - split ada0s4 in two equally sized partitions and then zpool create zgpart mirror /dev/ada0s4d /dev/ada0s4e => works like a charm ..... ( [root@cc /zgpart]# zpool status -v zgpart pool: zgpart state: ONLINE scan: scrub repaired 0 in 0h36m with 0 errors on Sun Feb 19 17:20:34 2012 config: NAME STATE READ WRITE CKSUM zgpart ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0s4d ONLINE 0 0 0 ada0s4e ONLINE 0 0 0 errors: No known data errors ) FYI, best, Arno > > I still do not particuliarly suspect the disk since I cannot reproduce > similar behaviour on UFS. > > That said, this disk is supposed to be 'hybrid-SSD', maybe something > special ZFS doesn't like ??? : > > > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: ATA-8 SATA 2.x device > ada0: Serial Number 5YX0J5YD > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > GEOM: new disk ada0 > > > Please let me know what information to provide more. > > Best, > > Arno > > > > >> 4) Is the slice/partition layout definitely correct? >> >> __Martin >> >> >>>>>>> On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said: >>> >>> hello, >>> >>> to eventually gain interest in this issue : >>> >>> I updated to today's -stable, tested with vfs.zfs.debug=1 >>> and vfs.zfs.prefetch_disable=0, no difference. >>> >>> I also tested to read the raw partition : >>> >>> [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096 conv=noerror >>> 103746636+0 records in >>> 103746636+0 records out >>> 424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec) >>> [root@cc /usr/ports]# >>> >>> Disk is brand new, looks ok, either my setup is not good or there is >>> a bug somewhere; I can play around with this box for some more time, >>> please feel free to provide me with some hints what to do to be useful >>> for you. >>> >>> Best, >>> >>> Arno >>> >>> >>> "Arno J. Klaassen" writes: >>> >>> > Hello, >>> > >>> > >>> > I finally decided to 'play' a bit with ZFS on a notebook, some years >>> > old, but I installed a brand new disk and memtest passes OK. >>> > >>> > I installed base+ports on partition 2, using 'classical' UFS. >>> > >>> > I crypted partition 3 and created a single zpool on it containing >>> > 4 Z-"file-systems" : >>> > >>> > [root@cc ~]# zfs list >>> > NAME USED AVAIL REFER MOUNTPOINT >>> > zfiles 10.7G 377G 152K /zfiles >>> > zfiles/home 10.6G 377G 119M /zfiles/home >>> > zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno >>> > zfiles/home/arno/.priv 192K 377G 192K /zfiles/home/arno/.priv >>> > zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito >>> > >>> > >>> > I export the ZFS's via nfs and rsynced on the other machine some backup >>> > of my current note-book (geli + UFS, (almost) same 9-stable version, no >>> > problem) to the ZFS's. >>> > >>> > >>> > Quite fast, I see on the notebook : >>> > >>> > >>> > [root@cc /usr/temp]# zpool status -v >>> > pool: zfiles >>> > state: ONLINE >>> > status: One or more devices has experienced an error resulting in data >>> > corruption. Applications may be affected. >>> > action: Restore the file in question if possible. Otherwise restore the >>> > entire pool from backup. >>> > see: http://www.sun.com/msg/ZFS-8000-8A >>> > scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34 >>> > 2012 >>> > config: >>> > >>> > NAME STATE READ WRITE CKSUM >>> > zfiles ONLINE 0 0 11 >>> > ada0s3.eli ONLINE 0 0 23 >>> > >>> > errors: Permanent errors have been detected in the following files: >>> > >>> > /zfiles/home/arno/.scito/contrib/XNAT.tar >>> > [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar >>> > md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error >>> > [root@cc /usr/temp]# >>> > >>> > >>> > As said, memtest is OK, nothing is logged to the console, UFS on the >>> > same disk works OK (I did some tests copying and comparing random data) >>> > and smartctl as well seems to trust the disk : >>> > >>> > SMART Self-test log structure revision number 1 >>> > Num Test_Description Status Remaining LifeTime(hours) >>> > # 1 Extended offline Completed without error 00% 388 >>> > # 2 Short offline Completed without error 00% 387 >>> > >>> > >>> > Am I doing something wrong and/or let me know what I could provide as >>> > extra info to try to solve this (dmesg.boot at the end of this mail). >>> > >>> > Thanx a lot in advance, >>> > >>> > best, Arno >>> > >>> > >>> > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Feb 20 03:43:35 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A682A106564A; Mon, 20 Feb 2012 03:43:35 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by mx1.freebsd.org (Postfix) with ESMTP id 0977E8FC14; Mon, 20 Feb 2012 03:43:34 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAAq9QU920ALe/2dsb2JhbABDsiyBCIF0AQVWIxABCkY5BBq9e4t9AgQQBgsJNQkDAoNiWIMeBKg2 Received: from ppp118-208-2-222.lns20.bne1.internode.on.net (HELO dungeon.home) ([118.208.2.222]) by ipmail04.adl6.internode.on.net with ESMTP; 20 Feb 2012 13:58:17 +1030 Received: from dungeon.home (localhost [127.0.0.1]) by dungeon.home (8.14.4/8.14.3) with ESMTP id q1K3ROrt009042; Mon, 20 Feb 2012 13:27:24 +1000 (EST) (envelope-from mckay) Message-Id: <201202200327.q1K3ROrt009042@dungeon.home> From: Stephen McKay To: freebsd-fs@freebsd.org References: <201103081425.p28EPQtM002115@dungeon.home> <201107052241.p65MfqVA002215@dungeon.home> In-Reply-To: <201107052241.p65MfqVA002215@dungeon.home> from Stephen McKay at "Wed, 06 Jul 2011 08:41:52 +1000" Date: Mon, 20 Feb 2012 13:27:24 +1000 Sender: smckay@internode.on.net Cc: Stephen McKay Subject: Re: Constant minor ZFS corruption, probably solved X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Feb 2012 03:43:35 -0000 On Wednesday, 6th July 2011, Stephen McKay wrote: >Perhaps you remember me struggling with a small but continuous amount >of corruption on ZFS volumes with a new server we had built at work. >... I've now done enough tests so that I'm 90% >certain what the problem is: Seagate's caching firmware. >... I'm certain that disabling write caching >has given us a stable machine. And I'm 90% certain that it's because >of bugs in Seagate's cache firmware. I hope someone else can replicate >this and settle the issue. I'm following up on an old post of mine to confirm that my write cache disabling workaround is well and truly successful. Eight months later we've seen no further corruption when using Seagate ST2000DL003 disks. The machine (now running 9.0-RELEASE) sees constant moderate to low activity as a file server (about 6TB in use). I did receive a message from one other person suffering from the same problem. It was solved by disabling write caching, so that's two data points. And two data points is a trend, right? :-) His system was running 8.2-stable on an AMD Phenom CPU in a MSI 870-G45 motherboard (AMD SB710 southbridge) so there's very little overlap with our system: just zfs and Seagate green disks. His disks were ST1500DL003 (1.5TB) with firmware CC32 so that more or less means the common points are simply zfs and Seagate CC32 firmware. You already know which one I think is to blame. But then again no avalanche of complaints has been seen either, so it's still somewhat mysterious. Is there some other problem that is just being masked by disabling the cache? Unless there's a sudden surge in reports, we'll never know for certain. So, if you've seen this problem and cured it by disabling the write cache, I'd like to know about it. How's your data? Run a scrub lately? Perhaps now is a good time. ;-) Cheers, Stephen. From owner-freebsd-fs@FreeBSD.ORG Mon Feb 20 11:07:05 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00887106566B for ; Mon, 20 Feb 2012 11:07:05 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E2AAD8FC1E for ; Mon, 20 Feb 2012 11:07:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1KB74qi090102 for ; Mon, 20 Feb 2012 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1KB74ml090100 for freebsd-fs@FreeBSD.org; Mon, 20 Feb 2012 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 20 Feb 2012 11:07:04 GMT Message-Id: <201202201107.q1KB74ml090100@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Feb 2012 11:07:05 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164462 fs [nfs] NFSv4 mounting fails to mount; asks for stronger o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/162083 fs [zfs] [panic] zfs unmount -f pool o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161511 fs [unionfs] Filesystem deadlocks when using multiple uni o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159663 fs [socket] [nullfs] sockets don't work though nullfs mou o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs f kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 262 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Feb 20 15:08:06 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8B46106564A for ; Mon, 20 Feb 2012 15:08:06 +0000 (UTC) (envelope-from mikl@d902.iki.rssi.ru) Received: from d902.iki.rssi.ru (d902.iki.rssi.ru [193.232.9.10]) by mx1.freebsd.org (Postfix) with ESMTP id 4178C8FC12 for ; Mon, 20 Feb 2012 15:08:05 +0000 (UTC) Received: from [193.232.9.155] ([193.232.9.155]) by d902.iki.rssi.ru (8.14.2/8.13.1) with ESMTP id q1KESAC6029945 for ; Mon, 20 Feb 2012 18:28:10 +0400 (GMT-4) (envelope-from mikl@d902.iki.rssi.ru) Message-ID: <4F4258DB.3010303@d902.iki.rssi.ru> Date: Mon, 20 Feb 2012 18:29:47 +0400 From: =?UTF-8?B?0KHQtdGA0LPQtdC5INCc0LjQutC70LDRiNC10LLQuNGH?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.26) Gecko/20120131 Thunderbird/3.1.18 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: HAST on raid-controller X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Feb 2012 15:08:06 -0000 Hello! I tried to create hast-cluster on my test-servers. They have raid-controlles Adaptec 2820SA, device aacd1. After creating /etc/hast.conf (much the same as in FreeBSD handbook) it isn't working with the message: >hastctl create reserve >[ERROR] [reserve] Unable to open /dev/aacd1: Operation not permitted. Keep it in mind, can HAST work on raid-controllers (or raid-controllers Adaptec)? With best regards, Sergey. From owner-freebsd-fs@FreeBSD.ORG Mon Feb 20 16:57:08 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9B4A106564A; Mon, 20 Feb 2012 16:57:08 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C119F8FC0C; Mon, 20 Feb 2012 16:57:08 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1KGv87T025173; Mon, 20 Feb 2012 16:57:08 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1KGv8c9025169; Mon, 20 Feb 2012 16:57:08 GMT (envelope-from rmacklem) Date: Mon, 20 Feb 2012 16:57:08 GMT Message-Id: <201202201657.q1KGv8c9025169@freefall.freebsd.org> To: rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org, rmacklem@FreeBSD.org From: rmacklem@FreeBSD.org Cc: Subject: Re: kern/164462: [nfs] NFSv4 mounting fails to mount; asks for stronger authentication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Feb 2012 16:57:09 -0000 Synopsis: [nfs] NFSv4 mounting fails to mount; asks for stronger authentication Responsible-Changed-From-To: freebsd-fs->rmacklem Responsible-Changed-By: rmacklem Responsible-Changed-When: Mon Feb 20 16:55:59 UTC 2012 Responsible-Changed-Why: I have asked for feedback on this via email, so I might as well take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=164462 From owner-freebsd-fs@FreeBSD.ORG Mon Feb 20 19:20:18 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 990811065722 for ; Mon, 20 Feb 2012 19:20:16 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0B74F8FC12 for ; Mon, 20 Feb 2012 19:20:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1KJKFRY058033 for ; Mon, 20 Feb 2012 19:20:15 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1KJKFXE058032; Mon, 20 Feb 2012 19:20:15 GMT (envelope-from gnats) Date: Mon, 20 Feb 2012 19:20:15 GMT Message-Id: <201202201920.q1KJKFXE058032@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Mattias Lindgren Cc: Subject: Re: kern/149495: [zfs] chflags sappend on zfs not working right X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mattias Lindgren List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Feb 2012 19:20:18 -0000 The following reply was made to PR kern/149495; it has been noted by GNATS. From: Mattias Lindgren To: bug-followup@FreeBSD.org, daniel@zhelev.biz Cc: Subject: Re: kern/149495: [zfs] chflags sappend on zfs not working right Date: Mon, 20 Feb 2012 11:49:52 -0700 --e0cb4efe31b482a54904b969c2c3 Content-Type: text/plain; charset=ISO-8859-1 Having similar issues in FreeBSD 9-AMD64 with ZFS v 28 $ mkdir critical $ touch critical/critical.log $ sudo chmod o= critical $ sudo chflags sappnd critical $ sudo chflags sappnd critical/* $ echo "test" > critical/critical.log -bash: critical/critical.log: Operation not permitted $ echo "test" >> critical/critical.log $ grep test critical/critical.log test $ rm -rf critical/critical.log $ ls -l critical/ total 0 Am under the impression that I should not be able to delete files once the sappend flag has been set. Please let me know if you'd like me to do further testing. Thanks, Mattias --e0cb4efe31b482a54904b969c2c3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Having similar issues in FreeBSD 9-AMD64 with ZFS v 28

<= div>$ mkdir critical
$ touch critical/critical.log
$ sudo chmod o=3D critical

$ sudo chfla= gs sappnd critical
$ sudo chflags sappnd critical/*
$ echo "test" > critical/critical.log
-bash: critical/critical.log: Operation not permitted
$ echo &quo= t;test" >> critical/critical.log
$ grep test critical/= critical.log
test
$ rm -rf critical/critical.log
$ ls -l critical/
total 0

Am = under the impression that I should not be able to delete files once the sap= pend flag has been set. =A0

Please let me know if = you'd like me to do further testing.

Thanks,

Mattias

=
--e0cb4efe31b482a54904b969c2c3-- From owner-freebsd-fs@FreeBSD.ORG Tue Feb 21 02:38:04 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94B86106566B for ; Tue, 21 Feb 2012 02:38:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 2C9AB8FC14 for ; Tue, 21 Feb 2012 02:38:03 +0000 (UTC) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q1L2RvJx031027 for ; Tue, 21 Feb 2012 13:27:57 +1100 Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q1L2Rrn7006192 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 21 Feb 2012 13:27:54 +1100 Date: Tue, 21 Feb 2012 13:27:53 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mattias Lindgren In-Reply-To: <201202201920.q1KJKFXE058032@freefall.freebsd.org> Message-ID: <20120221111121.I2928@besplex.bde.org> References: <201202201920.q1KJKFXE058032@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: kern/149495: [zfs] chflags sappend on zfs not working right X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Feb 2012 02:38:04 -0000 On Mon, 20 Feb 2012, Mattias Lindgren wrote: > Having similar issues in FreeBSD 9-AMD64 with ZFS v 28 > > $ mkdir critical > $ touch critical/critical.log > $ sudo chmod o= critical > > $ sudo chflags sappnd critical > $ sudo chflags sappnd critical/* > > $ echo "test" > critical/critical.log > -bash: critical/critical.log: Operation not permitted > $ echo "test" >> critical/critical.log > $ grep test critical/critical.log > test > $ rm -rf critical/critical.log > $ ls -l critical/ > total 0 > > Am under the impression that I should not be able to delete files once the > sappend flag has been set. It is a bug in 4.4BSD and ffs that [su]append prevents deleting files. Deletion of files should be prevented only by the [su]nounlink flag and the [su]immutable flag, but 4.4BSD didn't have the [su]nounlink flag, and it is insecure to allow unlinking any [su]append file, so 4.4BSD and ffs have the non-orthogonal behaviour of never allowing one to be unlinked, and this wasn't changed when [su]nounlink was added. This bug apparently isn't implemented in zfs. I don't know much about zfs, but zfs_zacces_delete() seems to only test the immutable and nounlink flags. Try adding the append flag there. Nearby bugs: - the [su]append flags have the silly abbreviations [su]appnd in chflags(1). ls -o output to show these flags will be wide anyway, and 1 character is not worth saving. The 1-char difference is just confusing for input. - the [su]nounlink flags have the much worse abbreviations [su]unlnk in chflags(1). Even the non-abbreviated forms [su]unlink are missing their 'no' prefix. So unlink means nounlink, and if you want to unset this, you use no[su]unlink which actually means nonounlink, that is, unlink, that is, unlinking is not restricted by the flag. The u prefix also makes uunlink hard to read. unounlink would be better. The following is hopefully only in ffs (except in my version): - setting of flags is non-orthogonal. Normal read-modify-write operations don't work for users, although they work for root. The details of this bug were changed between 4.4BSD-Lite1 and 4.4BSD-Lite2 and reached FreeBSD in chflags(2)'s code in 1997 and in chflags(2)'s man page in 2006 (the latter with grammar errors). This makes chflags(2) very difficult to use. Naive programs like chflags(1) don't understand this, and just do a simple read-modify-write operation. This gives weird behaviour which can be worked around if you understand chflags(2) better than chflags(1) does. For example: Suppose you have a file with some harmless system flag like `archive' (this is the only one). This doesn't prevent anyone changing their flags. But it prevents users changing their flags in the normal way. "chflags uchg file" will fail because it is turned into a chflags(2) request to set the existing archive flag as well as the uchg flag. As documented, the former is not permitted. So to set your uchg flag while preserving the archive flag (which you can't change either way), you must ask for the archive flag to be cleared: "chflags noarch uchg" after first determining which system flags are set. Similarly for using chflags(2), except now you must clear all the system flags that are set, and can do this more easily by setting all the system flags. Clearing all your flags is easier: just ask for flags of 0 with chflags(either 1 or 2). However, if you are root, then you must not request any system flags to be cleared unless you actually want them cleared, since the request will actually work for root. However2, since ffs has null support for the archive flag, setting it for ffs is almost useless and rarely done, so the bug has little affect. My version also allows user changes if only the sunlink flag is set (why should preventing unlinking prevent chmod() when it doesn't prevent truncating the file or filling it with garbage?). But I now thing that this is not a good idea and the change should go the other way, so that uunlink prevents changes like sunlink does. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Feb 21 13:46:37 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2FBD106564A for ; Tue, 21 Feb 2012 13:46:37 +0000 (UTC) (envelope-from gkontos.mail@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8640C8FC0A for ; Tue, 21 Feb 2012 13:46:37 +0000 (UTC) Received: by vcmm1 with SMTP id m1so6106840vcm.13 for ; Tue, 21 Feb 2012 05:46:36 -0800 (PST) Received-SPF: pass (google.com: domain of gkontos.mail@gmail.com designates 10.52.91.196 as permitted sender) client-ip=10.52.91.196; Authentication-Results: mr.google.com; spf=pass (google.com: domain of gkontos.mail@gmail.com designates 10.52.91.196 as permitted sender) smtp.mail=gkontos.mail@gmail.com; dkim=pass header.i=gkontos.mail@gmail.com Received: from mr.google.com ([10.52.91.196]) by 10.52.91.196 with SMTP id cg4mr11864730vdb.68.1329831996730 (num_hops = 1); Tue, 21 Feb 2012 05:46:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=wmHUeA3S9/dNVoRyVdhDRIXs8QD4QJXxUV0fMhDbnbo=; b=GYmTaxOAdzsh8xOXEvwHesrpvpSDPIONtOMi5nwIcMe2Pm9Z3AeR5kQeV2S5JjG4oN bfCZiejudOsinjZmH/ylLs27vsv3fOCMrf50eXLZx2HpvBqFN0MmKMjHCm6ruD3jMxX6 Fv6rNdh/XVk6c3SQBs1b1/N2p4gK5zBiU54mA= MIME-Version: 1.0 Received: by 10.52.91.196 with SMTP id cg4mr9600310vdb.68.1329831996670; Tue, 21 Feb 2012 05:46:36 -0800 (PST) Received: by 10.220.38.67 with HTTP; Tue, 21 Feb 2012 05:46:36 -0800 (PST) In-Reply-To: <4F4258DB.3010303@d902.iki.rssi.ru> References: <4F4258DB.3010303@d902.iki.rssi.ru> Date: Tue, 21 Feb 2012 15:46:36 +0200 Message-ID: From: George Kontostanos To: =?UTF-8?B?0KHQtdGA0LPQtdC5INCc0LjQutC70LDRiNC10LLQuNGH?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: HAST on raid-controller X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Feb 2012 13:46:37 -0000 On Mon, Feb 20, 2012 at 4:29 PM, =D0=A1=D0=B5=D1=80=D0=B3=D0=B5=D0=B9 =D0= =9C=D0=B8=D0=BA=D0=BB=D0=B0=D1=88=D0=B5=D0=B2=D0=B8=D1=87 wrote: > Hello! > > I tried to create hast-cluster on my test-servers. They have raid-control= les > Adaptec 2820SA, device aacd1. After creating /etc/hast.conf (much the sam= e > as in FreeBSD handbook) it isn't working with the message: > >>hastctl create reserve >>[ERROR] [reserve] Unable to open /dev/aacd1: Operation not permitted. > > Keep it in mind, can HAST work on raid-controllers (or raid-controllers > Adaptec)? > > With best regards, Sergey. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" This doesn't appear to be a HAST error message. Can you create a FS in aacd= 1? --=20 George Kontostanos Aicom telecoms ltd http://www.aisecure.net From owner-freebsd-fs@FreeBSD.ORG Wed Feb 22 18:55:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCF68106566C for ; Wed, 22 Feb 2012 18:55:56 +0000 (UTC) (envelope-from ian@ndwns.net) Received: from smtpauth.rollernet.us (smtpauth.rollernet.us [IPv6:2607:fe70:0:3::d]) by mx1.freebsd.org (Postfix) with ESMTP id AD8348FC1C for ; Wed, 22 Feb 2012 18:55:56 +0000 (UTC) Received: from smtpauth.rollernet.us (localhost [127.0.0.1]) by smtpauth.rollernet.us (Postfix) with ESMTP id EB73859446F for ; Wed, 22 Feb 2012 10:55:34 -0800 (PST) Received: from localhost (c-76-126-116-195.hsd1.ca.comcast.net [76.126.116.195]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtpauth.rollernet.us (Postfix) with ESMTPSA for ; Wed, 22 Feb 2012 10:55:34 -0800 (PST) Date: Wed, 22 Feb 2012 10:55:52 -0800 From: Ian Downes To: freebsd-fs@freebsd.org Message-ID: <20120222185552.GA86902@weta.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Rollernet-Abuse: Processed by Roller Network Mail Services. Contact abuse@rollernet.us to report violations. Abuse policy: http://www.rollernet.us/policy X-Rollernet-Submit: Submit ID 5cf5.4f453a26.679b9.0 Subject: ZFS: arc_meta consumes *all* ram X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Feb 2012 18:55:56 -0000 Is vfs.zfs.arc_meta_limit supposed to be a (relatively) hard limit on cached metadata? I've limited the arc size with arc_max but how do I effectively limit the caching of meta data? Suggestions appreciated! details: ZFS is exceeding vfs.zfs.arc_meta_limit on some of my boxes; consuming all available RAM, paging everything out and bringing the system to its knees. $ uname -a FreeBSD local 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Fri Jul 8 00:54:56 UTC 2011 root@8.8.8.8:/usr/obj/usr/src/sys/XENHVM amd64 $ sysctl vfs.zfs | grep arc_meta vfs.zfs.arc_meta_limit: 1610612736 vfs.zfs.arc_meta_used: 12183379056 Note that this is 7-8X over arc_meta_limit and was all the available RAM on the box. This can be reproduced on several boxes (8.2-RELEASE patched to ZFS 5/28 and 9.0-RELEASE) when periodic/security/100.chksetuid runs and does a find over all filesystems. From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 11:42:11 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76853106566B; Fri, 24 Feb 2012 11:42:11 +0000 (UTC) (envelope-from luke-lists@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127]) by mx1.freebsd.org (Postfix) with ESMTP id 3DEF98FC0A; Fri, 24 Feb 2012 11:42:10 +0000 (UTC) Received: from [127.0.0.1] (helo=youse) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S0szG-000EeU-ET; Fri, 24 Feb 2012 11:07:00 +0000 From: Luke Marsden To: "freebsd-stable@freebsd.org" Content-Type: text/plain; charset="UTF-8" Date: Fri, 24 Feb 2012 11:06:52 +0000 Message-ID: <1330081612.13430.39.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: / Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 11:42:11 -0000 Hi all, Just wanted to get your opinion on best practices for ZFS. We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines but have been having trouble with short spikes in application memory usage resulting in huge amounts of swapping, bringing the whole machine to its knees and crashing it hard. I suspect this is because when there is a sudden spike in memory usage the zfs arc reclaim thread is unable to free system memory fast enough. This most recently happened yesterday as you can see from the following munin graphs: E.g. http://hybrid-logic.co.uk/memory-day.png http://hybrid-logic.co.uk/swap-day.png Our response has been to start limiting the ZFS ARC cache to 4GB on our production machines - trading performance for stability is fine with me (and we have L2ARC on SSD so we still get good levels of caching). My questions are: * is this a known problem? * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? * has FreeBSD 9.0 / ZFS v28 solved this problem? * rather than setting a hard limit on the ARC cache size, is it possible to adjust the auto-tuning variables to leave more free memory for spiky memory situations? e.g. set the auto-tuning to make arc eat 80% of memory instead of ~95% like it is at present? * could the arc reclaim thread be made to drop ARC pages with higher priority before the system starts swapping out application pages? Thank you for any/all answers, and thank you for making FreeBSD awesome :-) Best Regards, Luke Marsden -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 12:30:14 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7CB81065678 for ; Fri, 24 Feb 2012 12:30:14 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A8F668FC0C for ; Fri, 24 Feb 2012 12:30:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1OCUEqW055017 for ; Fri, 24 Feb 2012 12:30:14 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1OCUEN5055014; Fri, 24 Feb 2012 12:30:14 GMT (envelope-from gnats) Date: Fri, 24 Feb 2012 12:30:14 GMT Message-Id: <201202241230.q1OCUEN5055014@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Peter Maloney Cc: Subject: Re: kern/128173: [ext2fs] ls gives "Input/output error" on mounted ext3 filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Peter Maloney List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 12:30:14 -0000 The following reply was made to PR kern/128173; it has been noted by GNATS. From: Peter Maloney To: bug-followup@FreeBSD.org, christope.cap@gmail.com Cc: Subject: Re: kern/128173: [ext2fs] ls gives "Input/output error" on mounted ext3 filesystem Date: Fri, 24 Feb 2012 13:23:59 +0100 I have a similar problem... but not with ls. # md5 biglonguglyfilename.zip MD5 (biglonguglyfilename.zip) = 511fdc3352d9265ffac0d472de7bb994 # md5 differentbiglonguglyfilename.zip md5: differentbiglonguglyfilename.zip: Input/output error These files can be read with no problem in Linux, whether I mount the system as ext3 or ext2. I did not create this file system; it was from a 3rd party. Here is tune2fs output from a Linux machine: # tune2fs -l /dev/sdb1 tune2fs 1.41.14 (22-Dec-2010) Filesystem volume name: [snip] Last mounted on: Filesystem UUID: [snip] Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 244203520 Block count: 488378000 Reserved block count: 0 Free blocks: 16884957 Free inodes: 244183788 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 907 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Filesystem created: Sun Oct 23 17:07:18 2011 Last mount time: Fri Feb 24 13:10:50 2012 Last write time: Fri Feb 24 13:10:50 2012 Mount count: 6 Maximum mount count: 21 Last checked: Sun Oct 23 17:07:18 2011 Check interval: 15552000 (6 months) Next check after: Fri Apr 20 17:07:18 2012 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 364481f6-7b5a-4cbb-89d7-7e50c112c884 Journal backup: inode blocks # uname -a FreeBSD smostank2.bc.local 8.2-STABLE-20120204 FreeBSD 8.2-STABLE-20120104 #0: Mon Feb 6 12:10:32 UTC 2012 root@bczfsvm1.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 According to the man page of mkfs.ext4 in FreeBSD: E2fsprogs version 1.42 From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 12:44:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 377D5106566C for ; Fri, 24 Feb 2012 12:44:42 +0000 (UTC) (envelope-from luke-lists@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127]) by mx1.freebsd.org (Postfix) with ESMTP id DC7FA8FC15 for ; Fri, 24 Feb 2012 12:44:41 +0000 (UTC) Received: from [127.0.0.1] (helo=youse) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S0uVm-000InR-Ko; Fri, 24 Feb 2012 12:44:40 +0000 From: Luke Marsden To: Tom Evans In-Reply-To: References: <1330081612.13430.39.camel@pow> Content-Type: text/plain; charset="UTF-8" Date: Fri, 24 Feb 2012 12:44:30 +0000 Message-ID: <1330087470.13430.61.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: / Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 12:44:42 -0000 On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote: > On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden > wrote: > > Hi all, > > > > Just wanted to get your opinion on best practices for ZFS. > > > > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines > > but have been having trouble with short spikes in application memory > > usage resulting in huge amounts of swapping, bringing the whole machine > > to its knees and crashing it hard. I suspect this is because when there > > is a sudden spike in memory usage the zfs arc reclaim thread is unable > > to free system memory fast enough. > > > > This most recently happened yesterday as you can see from the following > > munin graphs: > > > > E.g. http://hybrid-logic.co.uk/memory-day.png > > http://hybrid-logic.co.uk/swap-day.png > > > > Our response has been to start limiting the ZFS ARC cache to 4GB on our > > production machines - trading performance for stability is fine with me > > (and we have L2ARC on SSD so we still get good levels of caching). > > > > My questions are: > > > > * is this a known problem? > > * what is the community's advice for production machines running > > ZFS on FreeBSD, is manually limiting the ARC cache (to ensure > > that there's enough actually free memory to handle a spike in > > application memory usage) the best solution to this > > spike-in-memory-means-crash problem? > > * has FreeBSD 9.0 / ZFS v28 solved this problem? > > * rather than setting a hard limit on the ARC cache size, is it > > possible to adjust the auto-tuning variables to leave more free > > memory for spiky memory situations? e.g. set the auto-tuning to > > make arc eat 80% of memory instead of ~95% like it is at > > present? > > * could the arc reclaim thread be made to drop ARC pages with > > higher priority before the system starts swapping out > > application pages? > > > > Thank you for any/all answers, and thank you for making FreeBSD > > awesome :-) > > It's not a problem, it's a feature! > > By default the ARC will attempt to cache as much as it can - it > assumes the box is a ZFS filer, and doesn't need RAM for applications. > The solution, as you've found out, is to limit how much ARC can take > up. > > In practice, you should be doing this anyway. You should know, or have > an idea, of how much RAM is required for the applications on that box, > and you need to limit ZFS to not eat into that required RAM. Thanks for your reply, Tom! I agree that the ARC cache is a great feature, but for a general purpose filesystem it does seem like a reasonable expectation that filesystem cache will be evicted before application data is swapped, even if the spike in memory usage is rather aggressive. A complete server crash in this scenario is rather unfortunate. My question stands - is this an area which has been improved on in the ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be standard practice to guess how much memory the applications running on the server might need and set the arc_max boot.loader tweak appropriately? This is reasonably tricky when providing general purpose web application hosting and so we'll often end up erring on the side of caution and leaving lots of RAM free "just in case". If the latter is indeed the case in the latest stable releases then I would like to update http://wiki.freebsd.org/ZFSTuningGuide which currently states: FreeBSD 7.2+ has improved kernel memory allocation strategy and no tuning may be necessary on systems with more than 2 GB of RAM. Thank you! Best Regards, Luke Marsden -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 12:51:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1A9F1065670 for ; Fri, 24 Feb 2012 12:51:16 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 724B38FC16 for ; Fri, 24 Feb 2012 12:51:16 +0000 (UTC) Received: by vcge1 with SMTP id e1so64238vcg.13 for ; Fri, 24 Feb 2012 04:51:15 -0800 (PST) Received-SPF: pass (google.com: domain of tevans.uk@googlemail.com designates 10.52.27.99 as permitted sender) client-ip=10.52.27.99; Authentication-Results: mr.google.com; spf=pass (google.com: domain of tevans.uk@googlemail.com designates 10.52.27.99 as permitted sender) smtp.mail=tevans.uk@googlemail.com; dkim=pass header.i=tevans.uk@googlemail.com Received: from mr.google.com ([10.52.27.99]) by 10.52.27.99 with SMTP id s3mr1024903vdg.121.1330087875946 (num_hops = 1); Fri, 24 Feb 2012 04:51:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=2NW5fOcjjAb+cWWaIW+p+keCsFgDdnBAusvA+yIidiE=; b=DkROjfiz3did87s/fCHlXls2Z5hXV9+MIKVW77yaoHPtvTyuA9RJYQv3DU/AAfG2IZ aT3+XgZ3KFaFqlif4LhW+R+wNBHFsXETY1r4j4RRKIRGiuEjgr8tjP4LN27q3y9lPDxq MhFcvUQQlJ05ld33c0frrG8ySl/FT6IQSiEb8= MIME-Version: 1.0 Received: by 10.52.27.99 with SMTP id s3mr766254vdg.121.1330086098209; Fri, 24 Feb 2012 04:21:38 -0800 (PST) Received: by 10.52.91.210 with HTTP; Fri, 24 Feb 2012 04:21:38 -0800 (PST) In-Reply-To: <1330081612.13430.39.camel@pow> References: <1330081612.13430.39.camel@pow> Date: Fri, 24 Feb 2012 12:21:38 +0000 Message-ID: From: Tom Evans To: Luke Marsden Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 12:51:16 -0000 On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden wrote: > Hi all, > > Just wanted to get your opinion on best practices for ZFS. > > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines > but have been having trouble with short spikes in application memory > usage resulting in huge amounts of swapping, bringing the whole machine > to its knees and crashing it hard. =C2=A0I suspect this is because when t= here > is a sudden spike in memory usage the zfs arc reclaim thread is unable > to free system memory fast enough. > > This most recently happened yesterday as you can see from the following > munin graphs: > > E.g. http://hybrid-logic.co.uk/memory-day.png > =C2=A0 =C2=A0 http://hybrid-logic.co.uk/swap-day.png > > Our response has been to start limiting the ZFS ARC cache to 4GB on our > production machines - trading performance for stability is fine with me > (and we have L2ARC on SSD so we still get good levels of caching). > > My questions are: > > =C2=A0 =C2=A0 =C2=A0* is this a known problem? > =C2=A0 =C2=A0 =C2=A0* what is the community's advice for production machi= nes running > =C2=A0 =C2=A0 =C2=A0 =C2=A0ZFS on FreeBSD, is manually limiting the ARC c= ache (to ensure > =C2=A0 =C2=A0 =C2=A0 =C2=A0that there's enough actually free memory to ha= ndle a spike in > =C2=A0 =C2=A0 =C2=A0 =C2=A0application memory usage) the best solution to= this > =C2=A0 =C2=A0 =C2=A0 =C2=A0spike-in-memory-means-crash problem? > =C2=A0 =C2=A0 =C2=A0* has FreeBSD 9.0 / ZFS v28 solved this problem? > =C2=A0 =C2=A0 =C2=A0* rather than setting a hard limit on the ARC cache s= ize, is it > =C2=A0 =C2=A0 =C2=A0 =C2=A0possible to adjust the auto-tuning variables t= o leave more free > =C2=A0 =C2=A0 =C2=A0 =C2=A0memory for spiky memory situations? =C2=A0e.g.= set the auto-tuning to > =C2=A0 =C2=A0 =C2=A0 =C2=A0make arc eat 80% of memory instead of ~95% lik= e it is at > =C2=A0 =C2=A0 =C2=A0 =C2=A0present? > =C2=A0 =C2=A0 =C2=A0* could the arc reclaim thread be made to drop ARC pa= ges with > =C2=A0 =C2=A0 =C2=A0 =C2=A0higher priority before the system starts swapp= ing out > =C2=A0 =C2=A0 =C2=A0 =C2=A0application pages? > > Thank you for any/all answers, and thank you for making FreeBSD > awesome :-) It's not a problem, it's a feature! By default the ARC will attempt to cache as much as it can - it assumes the box is a ZFS filer, and doesn't need RAM for applications. The solution, as you've found out, is to limit how much ARC can take up. In practice, you should be doing this anyway. You should know, or have an idea, of how much RAM is required for the applications on that box, and you need to limit ZFS to not eat into that required RAM. Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 12:59:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A80A5106566B for ; Fri, 24 Feb 2012 12:59:02 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 515358FC15 for ; Fri, 24 Feb 2012 12:59:02 +0000 (UTC) Received: by vbbfa15 with SMTP id fa15so2199606vbb.13 for ; Fri, 24 Feb 2012 04:59:01 -0800 (PST) Received-SPF: pass (google.com: domain of tevans.uk@googlemail.com designates 10.52.20.201 as permitted sender) client-ip=10.52.20.201; Authentication-Results: mr.google.com; spf=pass (google.com: domain of tevans.uk@googlemail.com designates 10.52.20.201 as permitted sender) smtp.mail=tevans.uk@googlemail.com; dkim=pass header.i=tevans.uk@googlemail.com Received: from mr.google.com ([10.52.20.201]) by 10.52.20.201 with SMTP id p9mr1057169vde.87.1330088341628 (num_hops = 1); Fri, 24 Feb 2012 04:59:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=D/OfY5xw7DHqSwWR+2hY29Gb9PuM+0qjirBb5eVbgZ0=; b=BKI15w/5xse5a1vhqTkF/9fqHeo8mvZgioCyUHWI4A50tKYcKjmBxnWr9HAaLPTBhc 40dpP74xTKY3LGPoGjduTYNbRsz5c6eMOglLzZOl1CbeXPyMZ/EGom2suwvhHlJLkAJ3 faAuReg1fJfOqk/n8ydXEEgZnTVrp5ocQdyR8= MIME-Version: 1.0 Received: by 10.52.20.201 with SMTP id p9mr835212vde.87.1330088341287; Fri, 24 Feb 2012 04:59:01 -0800 (PST) Received: by 10.52.91.210 with HTTP; Fri, 24 Feb 2012 04:59:01 -0800 (PST) In-Reply-To: <1330087470.13430.61.camel@pow> References: <1330081612.13430.39.camel@pow> <1330087470.13430.61.camel@pow> Date: Fri, 24 Feb 2012 12:59:01 +0000 Message-ID: From: Tom Evans To: Luke Marsden Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 12:59:02 -0000 On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden wrote: > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote: >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden >> wrote: >> > Hi all, >> > >> > Just wanted to get your opinion on best practices for ZFS. >> > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines >> > but have been having trouble with short spikes in application memory >> > usage resulting in huge amounts of swapping, bringing the whole machin= e >> > to its knees and crashing it hard. =C2=A0I suspect this is because whe= n there >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable >> > to free system memory fast enough. >> > >> > This most recently happened yesterday as you can see from the followin= g >> > munin graphs: >> > >> > E.g. http://hybrid-logic.co.uk/memory-day.png >> > =C2=A0 =C2=A0 http://hybrid-logic.co.uk/swap-day.png >> > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on ou= r >> > production machines - trading performance for stability is fine with m= e >> > (and we have L2ARC on SSD so we still get good levels of caching). >> > >> > My questions are: >> > >> > =C2=A0 =C2=A0 =C2=A0* is this a known problem? >> > =C2=A0 =C2=A0 =C2=A0* what is the community's advice for production ma= chines running >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0ZFS on FreeBSD, is manually limiting the AR= C cache (to ensure >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0that there's enough actually free memory to= handle a spike in >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0application memory usage) the best solution= to this >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0spike-in-memory-means-crash problem? >> > =C2=A0 =C2=A0 =C2=A0* has FreeBSD 9.0 / ZFS v28 solved this problem? >> > =C2=A0 =C2=A0 =C2=A0* rather than setting a hard limit on the ARC cach= e size, is it >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0possible to adjust the auto-tuning variable= s to leave more free >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0memory for spiky memory situations? =C2=A0e= .g. set the auto-tuning to >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0make arc eat 80% of memory instead of ~95% = like it is at >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0present? >> > =C2=A0 =C2=A0 =C2=A0* could the arc reclaim thread be made to drop ARC= pages with >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0higher priority before the system starts sw= apping out >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0application pages? >> > >> > Thank you for any/all answers, and thank you for making FreeBSD >> > awesome :-) >> >> It's not a problem, it's a feature! >> >> By default the ARC will attempt to cache as much as it can - it >> assumes the box is a ZFS filer, and doesn't need RAM for applications. >> The solution, as you've found out, is to limit how much ARC can take >> up. >> >> In practice, you should be doing this anyway. You should know, or have >> an idea, of how much RAM is required for the applications on that box, >> and you need to limit ZFS to not eat into that required RAM. > > Thanks for your reply, Tom! =C2=A0I agree that the ARC cache is a great > feature, but for a general purpose filesystem it does seem like a > reasonable expectation that filesystem cache will be evicted before > application data is swapped, even if the spike in memory usage is rather > aggressive. =C2=A0A complete server crash in this scenario is rather > unfortunate. > > My question stands - is this an area which has been improved on in the > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be > standard practice to guess how much memory the applications running on > the server might need and set the arc_max boot.loader tweak > appropriately? =C2=A0This is reasonably tricky when providing general pur= pose > web application hosting and so we'll often end up erring on the side of > caution and leaving lots of RAM free "just in case". > > If the latter is indeed the case in the latest stable releases then I > would like to update http://wiki.freebsd.org/ZFSTuningGuide which > currently states: > > =C2=A0 =C2=A0 =C2=A0 =C2=A0FreeBSD 7.2+ has improved kernel memory alloca= tion strategy and > =C2=A0 =C2=A0 =C2=A0 =C2=A0no tuning may be necessary on systems with mor= e than 2 GB of > =C2=A0 =C2=A0 =C2=A0 =C2=A0RAM. > > Thank you! > > Best Regards, > Luke Marsden > Hmm. That comment is really talking about that you no longer need to tune vm.kmem_size. I get what you are saying about applications suddenly using a lot of RAM should not cause the server to fall over. Do you know why it fell over? IE, was it a panic, a deadlock, etc. FreeBSD does not cope well when you have used up all RAM and swap (well, what does?), and from your graphs it does look like the ARC is not super massive when you had the problem - around 30-40% of RAM? Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 13:42:25 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8739F1065670 for ; Fri, 24 Feb 2012 13:42:25 +0000 (UTC) (envelope-from luke-lists@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127]) by mx1.freebsd.org (Postfix) with ESMTP id 2F6D38FC18 for ; Fri, 24 Feb 2012 13:42:24 +0000 (UTC) Received: from [127.0.0.1] (helo=youse) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S0vPc-0009FP-56; Fri, 24 Feb 2012 13:42:22 +0000 From: Luke Marsden To: Tom Evans In-Reply-To: References: <1330081612.13430.39.camel@pow> <1330087470.13430.61.camel@pow> Content-Type: text/plain; charset="UTF-8" Date: Fri, 24 Feb 2012 13:42:14 +0000 Message-ID: <1330090934.13430.90.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: / Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 13:42:25 -0000 On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote: > On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden > wrote: > > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote: > >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden > >> wrote: > >> > Hi all, > >> > > >> > Just wanted to get your opinion on best practices for ZFS. > >> > > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines > >> > but have been having trouble with short spikes in application memory > >> > usage resulting in huge amounts of swapping, bringing the whole machine > >> > to its knees and crashing it hard. I suspect this is because when there > >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable > >> > to free system memory fast enough. > >> > > >> > This most recently happened yesterday as you can see from the following > >> > munin graphs: > >> > > >> > E.g. http://hybrid-logic.co.uk/memory-day.png > >> > http://hybrid-logic.co.uk/swap-day.png > >> > > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our > >> > production machines - trading performance for stability is fine with me > >> > (and we have L2ARC on SSD so we still get good levels of caching). > >> > > >> > My questions are: > >> > > >> > * is this a known problem? > >> > * what is the community's advice for production machines running > >> > ZFS on FreeBSD, is manually limiting the ARC cache (to ensure > >> > that there's enough actually free memory to handle a spike in > >> > application memory usage) the best solution to this > >> > spike-in-memory-means-crash problem? > >> > * has FreeBSD 9.0 / ZFS v28 solved this problem? > >> > * rather than setting a hard limit on the ARC cache size, is it > >> > possible to adjust the auto-tuning variables to leave more free > >> > memory for spiky memory situations? e.g. set the auto-tuning to > >> > make arc eat 80% of memory instead of ~95% like it is at > >> > present? > >> > * could the arc reclaim thread be made to drop ARC pages with > >> > higher priority before the system starts swapping out > >> > application pages? > >> > > >> > Thank you for any/all answers, and thank you for making FreeBSD > >> > awesome :-) > >> > >> It's not a problem, it's a feature! > >> > >> By default the ARC will attempt to cache as much as it can - it > >> assumes the box is a ZFS filer, and doesn't need RAM for applications. > >> The solution, as you've found out, is to limit how much ARC can take > >> up. > >> > >> In practice, you should be doing this anyway. You should know, or have > >> an idea, of how much RAM is required for the applications on that box, > >> and you need to limit ZFS to not eat into that required RAM. > > > > Thanks for your reply, Tom! I agree that the ARC cache is a great > > feature, but for a general purpose filesystem it does seem like a > > reasonable expectation that filesystem cache will be evicted before > > application data is swapped, even if the spike in memory usage is rather > > aggressive. A complete server crash in this scenario is rather > > unfortunate. > > > > My question stands - is this an area which has been improved on in the > > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be > > standard practice to guess how much memory the applications running on > > the server might need and set the arc_max boot.loader tweak > > appropriately? This is reasonably tricky when providing general purpose > > web application hosting and so we'll often end up erring on the side of > > caution and leaving lots of RAM free "just in case". > > > > If the latter is indeed the case in the latest stable releases then I > > would like to update http://wiki.freebsd.org/ZFSTuningGuide which > > currently states: > > > > FreeBSD 7.2+ has improved kernel memory allocation strategy and > > no tuning may be necessary on systems with more than 2 GB of > > RAM. > > > > Thank you! > > > > Best Regards, > > Luke Marsden > > > > Hmm. That comment is really talking about that you no longer need to > tune vm.kmem_size. http://wiki.freebsd.org/ZFSTuningGuide "No tuning may be necessary" seems to indicate that no changes need to be made to boot.loader. I'm happy to provide a patch for the wiki which makes it clearer that for servers which may experience sudden spikes in application memory usage (i.e. all servers running user-supplied applications), the speed of ARC eviction is insufficient to ensure stability and arc_max should be tuned downwards. > I get what you are saying about applications suddenly using a lot of > RAM should not cause the server to fall over. Do you know why it fell > over? IE, was it a panic, a deadlock, etc. If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can see a huge spike in swap at the point at which the last line of pixels at http://hybrid-logic.co.uk/memory-day.png indicates the sudden increase in memory usage (by 3GB in active memory usage if you look closely). Since the graph stops at that point it indicates that the server became completely unresponsive (e.g. including munin probe requests). I did manage to log in just before it became completely unresponsive, but at that point the incoming requests weren't being serviced fast enough due to the excessive swapping and the server eventually became completely unresponsive (e.g. 'top' output froze and never came back). It continued to respond to pings though and may have eventually recovered if I had disabled inbound network traffic. I don't have any evidence of a panic or deadlock, we just hard rebooted the machine about 15 minutes later after it failed to recover from the swap-storm. > FreeBSD does not cope well when you have used up all RAM and swap > (well, what does?), and from your graphs it does look like the ARC is > not super massive when you had the problem - around 30-40% of RAM? The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes, 35%. I guess what I'd like is for FreeBSD to detect an emergency out-of-memory condition and aggressively drop much or all of the ARC cache *before* swapping out application memory which causes the system to grind to a halt. Is this a reasonable request, and is there anything I can do to help implement it? If not can we update the wiki to make it clearer that ARC limiting is necessary, even with high RAM boxes, to ensure stability under spiky memory conditions? Thanks! Best Regards, Luke Marsden -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 18:04:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10750106566C for ; Fri, 24 Feb 2012 18:04:00 +0000 (UTC) (envelope-from ian@ndwns.net) Received: from smtpauth.rollernet.us (smtpauth.rollernet.us [IPv6:2607:fe70:0:3::d]) by mx1.freebsd.org (Postfix) with ESMTP id AD7648FC12 for ; Fri, 24 Feb 2012 18:03:59 +0000 (UTC) Received: from smtpauth.rollernet.us (localhost [127.0.0.1]) by smtpauth.rollernet.us (Postfix) with ESMTP id B9040594002; Fri, 24 Feb 2012 10:03:29 -0800 (PST) Received: from localhost (c-76-126-116-195.hsd1.ca.comcast.net [76.126.116.195]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtpauth.rollernet.us (Postfix) with ESMTPSA; Fri, 24 Feb 2012 10:03:28 -0800 (PST) Date: Fri, 24 Feb 2012 10:03:46 -0800 From: Ian Downes To: Luke Marsden Message-ID: <20120224180346.GA83845@weta.local> References: <1330081612.13430.39.camel@pow> <1330087470.13430.61.camel@pow> <1330090934.13430.90.camel@pow> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1330090934.13430.90.camel@pow> User-Agent: Mutt/1.5.21 (2010-09-15) X-Rollernet-Abuse: Processed by Roller Network Mail Services. Contact abuse@rollernet.us to report violations. Abuse policy: http://www.rollernet.us/policy X-Rollernet-Submit: Submit ID 5a5e.4f47d0f0.6ea62.0 Cc: Tom Evans , freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 18:04:00 -0000 On Fri, Feb 24, 2012 at 01:42:14PM +0000, Luke Marsden wrote: > On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote: > > On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden > > wrote: > > > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote: > > >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden > > >> wrote: > > >> > Hi all, > > >> > > > >> > Just wanted to get your opinion on best practices for ZFS. > > >> > > > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines > > >> > but have been having trouble with short spikes in application memory > > >> > usage resulting in huge amounts of swapping, bringing the whole machine > > >> > to its knees and crashing it hard. I suspect this is because when there > > >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable > > >> > to free system memory fast enough. > > >> > > > >> > This most recently happened yesterday as you can see from the following > > >> > munin graphs: > > >> > > > >> > E.g. http://hybrid-logic.co.uk/memory-day.png > > >> > http://hybrid-logic.co.uk/swap-day.png > > >> > > > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our > > >> > production machines - trading performance for stability is fine with me > > >> > (and we have L2ARC on SSD so we still get good levels of caching). > > >> > > > >> > My questions are: > > >> > > > >> > * is this a known problem? > > >> > * what is the community's advice for production machines running > > >> > ZFS on FreeBSD, is manually limiting the ARC cache (to ensure > > >> > that there's enough actually free memory to handle a spike in > > >> > application memory usage) the best solution to this > > >> > spike-in-memory-means-crash problem? > > >> > * has FreeBSD 9.0 / ZFS v28 solved this problem? > > >> > * rather than setting a hard limit on the ARC cache size, is it > > >> > possible to adjust the auto-tuning variables to leave more free > > >> > memory for spiky memory situations? e.g. set the auto-tuning to > > >> > make arc eat 80% of memory instead of ~95% like it is at > > >> > present? > > >> > * could the arc reclaim thread be made to drop ARC pages with > > >> > higher priority before the system starts swapping out > > >> > application pages? > > >> > > > >> > Thank you for any/all answers, and thank you for making FreeBSD > > >> > awesome :-) > > >> > > >> It's not a problem, it's a feature! > > >> > > >> By default the ARC will attempt to cache as much as it can - it > > >> assumes the box is a ZFS filer, and doesn't need RAM for applications. > > >> The solution, as you've found out, is to limit how much ARC can take > > >> up. > > >> > > >> In practice, you should be doing this anyway. You should know, or have > > >> an idea, of how much RAM is required for the applications on that box, > > >> and you need to limit ZFS to not eat into that required RAM. > > > > > > Thanks for your reply, Tom! I agree that the ARC cache is a great > > > feature, but for a general purpose filesystem it does seem like a > > > reasonable expectation that filesystem cache will be evicted before > > > application data is swapped, even if the spike in memory usage is rather > > > aggressive. A complete server crash in this scenario is rather > > > unfortunate. > > > > > > My question stands - is this an area which has been improved on in the > > > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be > > > standard practice to guess how much memory the applications running on > > > the server might need and set the arc_max boot.loader tweak > > > appropriately? This is reasonably tricky when providing general purpose > > > web application hosting and so we'll often end up erring on the side of > > > caution and leaving lots of RAM free "just in case". > > > > > > If the latter is indeed the case in the latest stable releases then I > > > would like to update http://wiki.freebsd.org/ZFSTuningGuide which > > > currently states: > > > > > > FreeBSD 7.2+ has improved kernel memory allocation strategy and > > > no tuning may be necessary on systems with more than 2 GB of > > > RAM. > > > > > > Thank you! > > > > > > Best Regards, > > > Luke Marsden > > > > > > > Hmm. That comment is really talking about that you no longer need to > > tune vm.kmem_size. > > http://wiki.freebsd.org/ZFSTuningGuide > > "No tuning may be necessary" seems to indicate that no changes need to > be made to boot.loader. I'm happy to provide a patch for the wiki which > makes it clearer that for servers which may experience sudden spikes in > application memory usage (i.e. all servers running user-supplied > applications), the speed of ARC eviction is insufficient to ensure > stability and arc_max should be tuned downwards. > > > I get what you are saying about applications suddenly using a lot of > > RAM should not cause the server to fall over. Do you know why it fell > > over? IE, was it a panic, a deadlock, etc. > > If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can > see a huge spike in swap at the point at which the last line of pixels > at http://hybrid-logic.co.uk/memory-day.png indicates the sudden > increase in memory usage (by 3GB in active memory usage if you look > closely). Since the graph stops at that point it indicates that the > server became completely unresponsive (e.g. including munin probe > requests). I did manage to log in just before it became completely > unresponsive, but at that point the incoming requests weren't being > serviced fast enough due to the excessive swapping and the server > eventually became completely unresponsive (e.g. 'top' output froze and > never came back). It continued to respond to pings though and may have > eventually recovered if I had disabled inbound network traffic. I don't > have any evidence of a panic or deadlock, we just hard rebooted the > machine about 15 minutes later after it failed to recover from the > swap-storm. > > > FreeBSD does not cope well when you have used up all RAM and swap > > (well, what does?), and from your graphs it does look like the ARC is > > not super massive when you had the problem - around 30-40% of RAM? > > The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes, > 35%. I guess what I'd like is for FreeBSD to detect an emergency > out-of-memory condition and aggressively drop much or all of the ARC > cache *before* swapping out application memory which causes the system > to grind to a halt. > > Is this a reasonable request, and is there anything I can do to help > implement it? > > If not can we update the wiki to make it clearer that ARC limiting is > necessary, even with high RAM boxes, to ensure stability under spiky > memory conditions? > Are you sure that it is the ARC data that is causing the issue? I've got boxes where the ARC *meta* skyrockets and consumes all RAM, greatly exceeding the arc_meta_limit. E.g. on a very unresponsive local box: vfs.zfs.arc_meta_limit: 1610612736 vfs.zfs.arc_meta_used: 12183379056 Setting arc_max helps (and seems to be respected), but I don't know why arc_meta_used exceeds arc_meta_limit. Ian From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 04:37:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0157106566C for ; Sat, 25 Feb 2012 04:37:15 +0000 (UTC) (envelope-from Kamil.Choudhury@anserinae.net) Received: from hrndva-omtalb.mail.rr.com (hrndva-omtalb.mail.rr.com [71.74.56.122]) by mx1.freebsd.org (Postfix) with ESMTP id 604678FC0C for ; Sat, 25 Feb 2012 04:37:15 +0000 (UTC) X-Authority-Analysis: v=2.0 cv=Z7xu7QtA c=1 sm=0 a=qe0RvMpo0P4Rp0DQO452oA==:17 a=IYgu6Z7xpcEA:10 a=egyE7zw0hOcA:10 a=WWGGoYozHbgA:10 a=kj9zAlcOel0A:10 a=xqWC_Br6kY4A:10 a=VDMU8vR1T1jl0vMf1V4A:9 a=CjuIK1q_8ugA:10 a=qe0RvMpo0P4Rp0DQO452oA==:117 X-Cloudmark-Score: 0 X-Originating-IP: 68.173.236.44 Received: from [68.173.236.44] ([68.173.236.44:50651] helo=janus.anserinae.net) by hrndva-oedge02.mail.rr.com (envelope-from ) (ecelerity 2.2.3.46 r()) with ESMTP id B3/B6-04292-A75684F4; Sat, 25 Feb 2012 04:37:14 +0000 Received: from JANUS.anserinae.net ([fe80::192c:4b89:9fe9:dc6d]) by janus.anserinae.net ([fe80::192c:4b89:9fe9:dc6d%11]) with mapi; Fri, 24 Feb 2012 23:37:02 -0500 From: Kamil Choudhury To: "freebsd-fs@freebsd.org" Thread-Topic: Distributed, snapshotting, checksumming filesystems for FreeBSD Thread-Index: Aczzc7P/LgC0HATgSXWQqxJMHVprPA== Date: Sat, 25 Feb 2012 04:37:03 +0000 Message-ID: <3CEE2DA4348D944399A67E308B78D38A1A57CABA@janus.anserinae.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Distributed, snapshotting, checksumming filesystems for FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 04:37:15 -0000 The dream: a file system spread out over a variable,=20 ever increasing number of hosts, presenting a single=20 unified file system to any client host mounting the=20 file system.=20 >From the client's point of view, it is possible to=20 snapshot the directory view that is presented. The=20 client also has confidence that data written to the=20 file system will be returned exactly as it went in.=20 Now that I think about it, what I seem to be looking for is a network aware ZFS that uses hosts as vdevs.=20 Is there such a thing out there?=20 Kamil From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 08:42:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B639106564A for ; Sat, 25 Feb 2012 08:42:13 +0000 (UTC) (envelope-from peter@pean.org) Received: from system.jails.se (system.jails.se [IPv6:2001:16d8:cc1e:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id DEF708FC16 for ; Sat, 25 Feb 2012 08:42:12 +0000 (UTC) Received: from localhost (system.jails.se [91.205.63.85]) by system.jails.se (Postfix) with SMTP id 9A3D321BB8B for ; Sat, 25 Feb 2012 09:42:10 +0100 (CET) Received: from [172.25.0.25] (c-1105e155.166-7-64736c14.cust.bredbandsbolaget.se [85.225.5.17]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by system.jails.se (Postfix) with ESMTPSA id CDCB521BB81 for ; Sat, 25 Feb 2012 09:42:09 +0100 (CET) From: =?iso-8859-1?Q?Peter_Ankerst=E5l?= Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Sat, 25 Feb 2012 09:42:08 +0100 Message-Id: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1251.1) X-Mailer: Apple Mail (2.1251.1) X-DSPAM-Result: Innocent X-DSPAM-Processed: Sat Feb 25 09:42:10 2012 X-DSPAM-Confidence: 1.0000 X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 4f489ee226816799614642 X-DSPAM-Factors: 27, D, 0.40000, Received*cipher+AES128, 0.40000, should+use, 0.40000, Mime-Version*Message, 0.40000, Message-Id*490B+9574, 0.40000, in+conflict, 0.40000, disks+not, 0.40000, http+//lists, 0.40000, http+//lists, 0.40000, not+partitions!, 0.40000, Hi+Now, 0.40000, of, 0.40000, But, 0.40000, But, 0.40000, Received*2012, 0.40000, says, 0.40000, Subject*zfs+confusion., 0.40000, And+then, 0.40000, Subject*confusion., 0.40000, Received*client+certificate, 0.40000, he, 0.40000, to+like, 0.40000, X-Mailer*(2.1251.1), 0.40000, And, 0.40000, this+seems, 0.40000, Jason+doesn't, 0.40000, use, 0.40000 Subject: glabel, gpart and zfs confusion. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 08:42:13 -0000 Hi, Now Im really confused.=20 I want in some way label my drives so the setup is independent of = physical setup. But Jason doesn't seem to like glabel at all. :D http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013574.html And then he says that you should use gpart instead http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013578.html But this seems to be in conflict with the common knowledge that zfs = should be used on whole disks, not partitions! Any pointers?=20= From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 09:04:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7478D106566B for ; Sat, 25 Feb 2012 09:04:17 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de [IPv6:2a01:238:20a:202:53f5::1]) by mx1.freebsd.org (Postfix) with ESMTP id 09A028FC0A for ; Sat, 25 Feb 2012 09:04:16 +0000 (UTC) X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mufkcHTOex6w== X-RZG-CLASS-ID: mo05 Received: from [192.168.179.39] (hmbg-5f766895.pool.mediaWays.net [95.118.104.149]) by post.strato.de (mrclete mo2) (RZmta 27.7 DYNA|AUTH) with (DHE-RSA-AES128-SHA encrypted) ESMTPA id Z0524co1P8I98G for ; Sat, 25 Feb 2012 10:04:03 +0100 (MET) Message-ID: <4F48A402.70009@brockmann-consult.de> Date: Sat, 25 Feb 2012 10:04:02 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org> In-Reply-To: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Subject: Re: glabel, gpart and zfs confusion. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 09:04:17 -0000 In Solaris, I've read that the IO system is designed such that a some commands (eg. flush of a partition) does not necessarily flush the disk's write cache... like the command can't move up the chain. So if you put zfs on a partition, you can get data loss (eg. transaction rollback required and probably no corruption). In FreeBSD, things are different I am told, without the above limitation. So you can happily put zfs on partitions, and the zfs code can keep your data safe. I haven't had data loss with system panics during sync writes with my ZIL on a partition, so I guess this must be true. People say that glabel is buggy/a hack. But I haven't had any problems myself. So they suggest using gpt to label your disks. I find that sometimes your gpt labels get eaten though, and you end up with gptid in your zpool status output. For labels to get eaten, you need to import the pool elsewhere with -f usually. And maybe this only applies to the root pool in most cases (but I definitely had one other case when it happened to a different pool). There is something you can add to /boot/loader.conf to get rid of the gptids... but I am hesitant to use it... because what happens when you have 2 identical labels and gptid is gone? eg. NAME STATE READ WRITE CKSUM zroot DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gptid/bcc6c93a-f332-11e0-a5b6-0025900edbca OFFLINE 0 0 0 gptid/4629fb4b-f596-11e0-a5b6-0025900edbca OFFLINE 0 0 0 gpt/root2 ONLINE 0 0 0 gpt/root3 ONLINE 0 0 0 And also if a whole disk goes bad, and you try to replace it with another whole disk that is 1 byte smaller, it won't allow you to do that. So if you use gpart and create a slightly smaller partition, you get the advantage of being able to replace disks with smaller ones later. For new systems, I am using gpt labels. And if the gptid thing appears, I just ignore it. Am 25.02.2012 09:42, schrieb Peter Ankerstål: > Hi, > > Now Im really confused. > > I want in some way label my drives so the setup is independent of physical setup. But Jason doesn't > seem to like glabel at all. :D > http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013574.html > > And then he says that you should use gpart instead > http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013578.html > > But this seems to be in conflict with the common knowledge that zfs should > be used on whole disks, not partitions! > > Any pointers? > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 13:31:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AED63106564A for ; Sat, 25 Feb 2012 13:31:02 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de [IPv6:2a01:238:20a:202:53f5::1]) by mx1.freebsd.org (Postfix) with ESMTP id EB6C98FC0A for ; Sat, 25 Feb 2012 13:31:01 +0000 (UTC) X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mufkcHTOex6w== X-RZG-CLASS-ID: mo05 Received: from [192.168.179.39] (hmbg-5f766895.pool.mediaWays.net [95.118.104.149]) by smtp.strato.de (klopstock mo30) (RZmta 27.7 DYNA|AUTH) with (DHE-RSA-AES128-SHA encrypted) ESMTPA id n00a43o1PBfvkc for ; Sat, 25 Feb 2012 14:30:56 +0100 (MET) Message-ID: <4F48E28F.9090600@brockmann-consult.de> Date: Sat, 25 Feb 2012 14:30:55 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org> <4F48A402.70009@brockmann-consult.de> In-Reply-To: <4F48A402.70009@brockmann-consult.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: glabel, gpart and zfs confusion. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 13:31:02 -0000 And btw. related but not an answer to your question... >From the thread you mentioned:/ />/ # zpool attach tank label/m00-d00 label/m00-d01 />/ cannot use '/dev/label/m00-d01': must be a GEOM provider or regular file />/ />/ # glabel label m00-d01 /dev/da2s3 />/ glabel: Can't store metadata on /dev/da2s3: Invalid argument. />/ />/ # sysctl kern.geom.debugflags=17 />/ kern.geom.debugflags: 0 -> 17 />/ />/ # dd if=/dev/zero of=/dev/da2s3 />/ dd: /dev/da2s3: Invalid argument / My guess is that if you exported the pool, the "Invalid argument" errors would go away. / / Am 25.02.2012 10:04, schrieb Peter Maloney: > In Solaris, I've read that the IO system is designed such that a some > commands (eg. flush of a partition) does not necessarily flush the > disk's write cache... like the command can't move up the chain. So if > you put zfs on a partition, you can get data loss (eg. transaction > rollback required and probably no corruption). > > In FreeBSD, things are different I am told, without the above > limitation. So you can happily put zfs on partitions, and the zfs code > can keep your data safe. I haven't had data loss with system panics > during sync writes with my ZIL on a partition, so I guess this must be true. > > People say that glabel is buggy/a hack. But I haven't had any problems > myself. So they suggest using gpt to label your disks. I find that > sometimes your gpt labels get eaten though, and you end up with gptid in > your zpool status output. For labels to get eaten, you need to import > the pool elsewhere with -f usually. And maybe this only applies to the > root pool in most cases (but I definitely had one other case when it > happened to a different pool). There is something you can add to > /boot/loader.conf to get rid of the gptids... but I am hesitant to use > it... because what happens when you have 2 identical labels and gptid is > gone? > > eg. > > NAME STATE READ > WRITE CKSUM > zroot DEGRADED > 0 0 0 > mirror-0 DEGRADED > 0 0 0 > gptid/bcc6c93a-f332-11e0-a5b6-0025900edbca OFFLINE > 0 0 0 > gptid/4629fb4b-f596-11e0-a5b6-0025900edbca OFFLINE > 0 0 0 > gpt/root2 ONLINE > 0 0 0 > gpt/root3 ONLINE > 0 0 0 > > And also if a whole disk goes bad, and you try to replace it with > another whole disk that is 1 byte smaller, it won't allow you to do > that. So if you use gpart and create a slightly smaller partition, you > get the advantage of being able to replace disks with smaller ones later. > > For new systems, I am using gpt labels. And if the gptid thing appears, > I just ignore it. > > > Am 25.02.2012 09:42, schrieb Peter Ankerstål: >> Hi, >> >> Now Im really confused. >> >> I want in some way label my drives so the setup is independent of physical setup. But Jason doesn't >> seem to like glabel at all. :D >> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013574.html >> >> And then he says that you should use gpart instead >> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013578.html >> >> But this seems to be in conflict with the common knowledge that zfs should >> be used on whole disks, not partitions! >> >> Any pointers? >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 15:24:41 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32850106564A; Sat, 25 Feb 2012 15:24:41 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 064CB8FC12; Sat, 25 Feb 2012 15:24:41 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1PFOeod089705; Sat, 25 Feb 2012 15:24:40 GMT (envelope-from eadler@freefall.freebsd.org) Received: (from eadler@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1PFOeRs089701; Sat, 25 Feb 2012 15:24:40 GMT (envelope-from eadler) Date: Sat, 25 Feb 2012 15:24:40 GMT Message-Id: <201202251524.q1PFOeRs089701@freefall.freebsd.org> To: eadler@FreeBSD.org, eadler@FreeBSD.org, freebsd-fs@FreeBSD.org From: eadler@FreeBSD.org Cc: Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 15:24:41 -0000 Synopsis: Multiple mkdir/rmdir fails with errno 31 Responsible-Changed-From-To: eadler->freebsd-fs Responsible-Changed-By: eadler Responsible-Changed-When: Sat Feb 25 15:24:40 UTC 2012 Responsible-Changed-Why: I'm not going to have time to look into this soon enough http://www.freebsd.org/cgi/query-pr.cgi?pr=165392 From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 15:56:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38C30106566C for ; Sat, 25 Feb 2012 15:56:57 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id ECBE98FC08 for ; Sat, 25 Feb 2012 15:56:56 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q1PFuqn6015039; Sat, 25 Feb 2012 09:56:52 -0600 (CST) Date: Sat, 25 Feb 2012 09:56:52 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Peter Maloney In-Reply-To: <4F48A402.70009@brockmann-consult.de> Message-ID: References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org> <4F48A402.70009@brockmann-consult.de> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sat, 25 Feb 2012 09:56:52 -0600 (CST) Cc: freebsd-fs@freebsd.org Subject: Re: glabel, gpart and zfs confusion. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 15:56:57 -0000 On Sat, 25 Feb 2012, Peter Maloney wrote: > In Solaris, I've read that the IO system is designed such that a some > commands (eg. flush of a partition) does not necessarily flush the > disk's write cache... like the command can't move up the chain. So if > you put zfs on a partition, you can get data loss (eg. transaction > rollback required and probably no corruption). I wonder where you read that since it seems like bad information? In Solaris, if zfs uses a partition (rather than the whole disk), the disk write cache is not enabled by default due to the possibility that some other partition uses a legacy filesystem like UFS, which could become inconsistent and corrupted if the write cache is enabled. The drawback then becomes that zfs writes are likely to incur more latency. > In FreeBSD, things are different I am told, without the above > limitation. So you can happily put zfs on partitions, and the zfs code > can keep your data safe. I haven't had data loss with system panics > during sync writes with my ZIL on a partition, so I guess this must be true. It seems unlikely that FreeBSD zfs is somehow "safer" than Solaris zfs. Both rely on a disk cache flush request to write buffered data to disk. Synchronous writes necessarily require that the zil (zfs intent log) be flushed to disk before write returns success to the user. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 18:30:16 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D6C081065670 for ; Sat, 25 Feb 2012 18:30:16 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A7CE88FC19 for ; Sat, 25 Feb 2012 18:30:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1PIUGLd056189 for ; Sat, 25 Feb 2012 18:30:16 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1PIUGV5056188; Sat, 25 Feb 2012 18:30:16 GMT (envelope-from gnats) Date: Sat, 25 Feb 2012 18:30:16 GMT Message-Id: <201202251830.q1PIUGV5056188@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Jilles Tjoelker Cc: Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jilles Tjoelker List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 18:30:16 -0000 The following reply was made to PR kern/165392; it has been noted by GNATS. From: Jilles Tjoelker To: bug-followup@FreeBSD.org, vvv@colocall.net Cc: Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31 Date: Sat, 25 Feb 2012 19:27:02 +0100 > [mkdir fails with [EMLINK], but link count < LINK_MAX] I can reproduce this problem with UFS with soft updates (with or without journaling). A reproduction without C programs is: cd empty_dir mkdir `jot 32766 1` # the last one will fail (correctly) rmdir 1 mkdir a # will erroneously fail The problem appears to be because the previous rmdir has not yet been fully completed. It is still holding onto the link count until the directory is written, which may take up to two minutes. The same problem can occur with other calls that increase the link count such as link() and rename(). A workaround is to call fsync() on the directory that contained the deleted entries. It will then release its hold on the link count and allow mkdir or other calls. If fsync() is only called when [EMLINK] is returned, the performance impact should not be very bad, although it still causes more I/O than necessary. The book "The Design and Implementation of the FreeBSD Operating System" contains a detailed description of soft updates in section 8.6 Soft Updates. The subsection "File Removal Requirements for Soft Updates" appears particularly relevant to this problem. A possible solution is to check for the problematic situation (i_effnlink < LINK_MAX && i_nlink >= LINK_MAX) and if so synchronously write one or more deleted directory entries that pointed to the inode with the link count problem. After that, i_nlink should be less than LINK_MAX and the link count can be checked again (depending on whether locks need to be dropped to do the write, it may or may not be possible for another thread to use up the last link first). For mkdir() and rename(), the directory that contains the deleted entries is obvious (the directory that will contain the new directory) while for link() it can (in the general case) only be found in soft updates data structures. Soft updates must track this because (if the link count became 0) it will not clear the inode before all directory entries that pointed to it have been written. Simply replacing the i_nlink < LINK_MAX check with i_effnlink < LINK_MAX is unsafe because it will lead to overflow of the 16-bit signed i_nlink field. If the field is made larger, I don't see how it is prevented that the code commits such a set of changes that an inode on disk has more than LINK_MAX links for some time (for example if a file in the new directory is fsynced while the old directory entries are still on the disk). -- Jilles Tjoelker From owner-freebsd-fs@FreeBSD.ORG Sat Feb 25 21:43:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D77A3106564A for ; Sat, 25 Feb 2012 21:43:34 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de [IPv6:2a01:238:20a:202:53f5::1]) by mx1.freebsd.org (Postfix) with ESMTP id 3C5898FC1B for ; Sat, 25 Feb 2012 21:43:34 +0000 (UTC) X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mufkcHTOex6w== X-RZG-CLASS-ID: mo05 Received: from [192.168.179.39] (hmbg-5f766895.pool.mediaWays.net [95.118.104.149]) by smtp.strato.de (jimi mo42) (RZmta 27.7 DYNA|AUTH) with (DHE-RSA-AES256-SHA encrypted) ESMTPA id J02071o1PKvObr ; Sat, 25 Feb 2012 22:43:25 +0100 (MET) Message-ID: <4F4955FC.6040308@brockmann-consult.de> Date: Sat, 25 Feb 2012 22:43:24 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Bob Friesenhahn References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org> <4F48A402.70009@brockmann-consult.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: glabel, gpart and zfs confusion. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 21:43:34 -0000 Am 25.02.2012 16:56, schrieb Bob Friesenhahn: > On Sat, 25 Feb 2012, Peter Maloney wrote: > >> In Solaris, I've read that the IO system is designed such that a some >> commands (eg. flush of a partition) does not necessarily flush the >> disk's write cache... like the command can't move up the chain. So if >> you put zfs on a partition, you can get data loss (eg. transaction >> rollback required and probably no corruption). > > I wonder where you read that since it seems like bad information? In > Solaris, if zfs uses a partition (rather than the whole disk), the > disk write cache is not enabled by default due to the possibility that > some other partition uses a legacy filesystem like UFS, which could > become inconsistent and corrupted if the write cache is enabled. The > drawback then becomes that zfs writes are likely to incur more latency. No idea. I was just trying to point out where this recommendation to keep it separate comes from... but I don't know the details. But what you said makes sense. But I am sure that among the random things I read that sounded semi-credible (eg. by some guy claiming to be a ZFS engineer), it wasn't only about performance; it was more about corruption. (but then again, there are lots of doomsayers saying ZFS will somehow fail you, even though when they explain it, it is usually user error) And thanks for your criticism; looking back at this document: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide It looks like they just talk about the cache and not corruption, even if I look at very old versions of the page. So either what I read before was likely quite wrong, or just opinion based eg. some bad experience of some tester or admin.