From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 19 13:28:42 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CDE5F1065672
	for <freebsd-fs@freebsd.org>; Sun, 19 Feb 2012 13:28:42 +0000 (UTC)
	(envelope-from shuey@fmepnet.org)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 862648FC17
	for <freebsd-fs@freebsd.org>; Sun, 19 Feb 2012 13:28:42 +0000 (UTC)
Received: by vcmm1 with SMTP id m1so4529345vcm.13
	for <freebsd-fs@freebsd.org>; Sun, 19 Feb 2012 05:28:42 -0800 (PST)
Received-SPF: pass (google.com: domain of shuey@fmepnet.org designates
	10.220.153.201 as permitted sender) client-ip=10.220.153.201; 
Authentication-Results: mr.google.com;
	spf=pass (google.com: domain of shuey@fmepnet.org
	designates 10.220.153.201 as permitted sender)
	smtp.mail=shuey@fmepnet.org
Received: from mr.google.com ([10.220.153.201])
	by 10.220.153.201 with SMTP id l9mr9414537vcw.1.1329658122003 (num_hops
	= 1); Sun, 19 Feb 2012 05:28:42 -0800 (PST)
MIME-Version: 1.0
Received: by 10.220.153.201 with SMTP id l9mr7513432vcw.1.1329658121879; Sun,
	19 Feb 2012 05:28:41 -0800 (PST)
Received: by 10.220.64.141 with HTTP; Sun, 19 Feb 2012 05:28:41 -0800 (PST)
X-Originating-IP: [98.223.59.225]
In-Reply-To: <1329595563.42839.28.camel@btw.pki2.com>
References: <CAELRr5kPXjqTooLbjPC1oPB3e2TfRC=eE+zvsu-tW54Pz42xFg@mail.gmail.com>
	<1329595563.42839.28.camel@btw.pki2.com>
Date: Sun, 19 Feb 2012 08:28:41 -0500
Message-ID: <CAELRr5k+vuN8G2BRigFT4+pmLergbcn_ybOV+SQj7KGDE-FEOw@mail.gmail.com>
From: Michael Shuey <shuey@fmepnet.org>
To: dg17@penx.com
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQlCgPRFZRZWo2NR7plZIN8OUDHbfHc18HVr1UeQMd7C7zwh1sfPQP9B0u15lwQAkesyGJfO
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS size reduced, 100% full, on fbsd9 upgrade
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2012 13:28:42 -0000

Okay, today's lesson: When you replace a disk with a bigger drive, and
it increases your raidz2's pool capacity, ALWAYS run a zpool scrub
<pool> before doing anything else.

I rebooted back to 8.2p6, ran a (somewhat longer than normal) scrub,
rebooted, then booted back to 9.0.  Seems fine now, and is finishing
its freebsd-update.  Weird....but at least it works.


On Sat, Feb 18, 2012 at 3:06 PM, Dennis Glatting <dg17@penx.com> wrote:
> I'm not a ZFS wiz but...
>
>
> On Sat, 2012-02-18 at 10:25 -0500, Michael Shuey wrote:
>> I'm upgrading a server from 8.2p6 to 9.0-RELEASE, and I've tried both
>> make in the source tree and freebsd-update and I get the same strange
>> result. =A0As soon as I boot to the fbsd9 kernel, even booting into
>> single-user mode, the pool's size is greatly reduced. =A0All filesystems
>> show 100% full (0 bytes free space), nothing can be written to the
>> pool (probably a side-effect of being 100% full), and dmesg shows
>> several of "Solaris: WARNING: metaslab_free_dva(): bad DVA
>> 0:5978620460544" warnings (with different numbers). =A0Switching kernels
>> back to the 8.2p6 kernel restores things to normal, but I'd really
>> like to finish my fbsd9 upgrade.
>>
>> The system is a 64-bit Intel box with 4 GB of memory, and 8 disks in a
>> raidz2 pool called "pool". =A0It's booted to the 8.2p6 kernel now, and
>> scrubbing the pool, but last time I did this (roughly a week ago) it
>> was fine. =A0/ is a gmirror, but /usr, /tmp, and /var all come from the
>> pool. =A0Normally, the pool has 1.2 TB of free space, and is version 15
>> (zfs version 4). =A0Some disks are WD drives, with 4k native sectors,
>> but some time ago I rebuilt the pool to use a native 4k sector size
>> (ashift=3D12).
>>
>
> I believe 4GB of memory is the minimum. More is better. When you use the
> minimum of anything, expect dodginess.
>
> You should upgrade your pool -- bug fixes and all that.
>
> Are all the disks 4k sectors? I found that a mix of 512 and 4k work but
> performance is best when they are all the same. I have also found 512
> emulation isn't a believable choice when looking at performance (i.e.,
> set for 4k).
>
> Different people have different opinions but I personally do not use ZFS
> for the OS, rather I RAID1 the OS. The question you have to ask is
> if /usr goes kablewie whether you have he skills to put it back
> together. I do not, so "simple" (i.e., hardware RAID1) for the OS is
> good for me -- it isn't the OS that's being worked in my setups, rather
> the data areas.
>
>
>> Over time, I've been slowly replacing disks (1 at a time) to increase
>> the free space in the pool. =A0Also, the system experienced severe
>> failure recently; the power supply blew, and took out the memory (and
>> presumably motherboard). =A0I replaced these last week with known-good
>> board/memory/processor/PS, and it's been running fine since.
>>
>
> Expect mixed results with mixed disks, at least from my experience,
> particularly when it comes to performance.
>
> Is the MB the same? I have had mixed results. I find the Gigabyte boards
> work well but ASUS dodgy when it comes to high interrupt handling.
> Server boards with ECC memory are the most reliable.
>
>
>> Any suggestions? =A0Is it possible I've got some nasty pool corruption
>> going on - and if so, how do I go about fixing it? =A0Any advice would
>> be appreciated. =A0This is a backup server, so I could rebuild its
>> contents from the primary, but I'd rather fix it if possible (since I
>> want to do a fbsd9 upgrade on the primary next).
>
> I screw around with my set ups. What I found is rebuilding the pool
> (when I screw it up) is the least troublesome approach.
>
> Recently I found a tray bad on one of my servers. Drove me nuts for two
> weeks. It could be a loose cable, or bad cable, or crimped cable, but I
> am not yet in the position to open the case. Most of my ZFS weirdnesses
> have been hardware related.
>
> It could be your blowout impacted your disks or wiring. Do you SMART? I
> found, generally, SMART is goodness but I presently have a question mark
> when it comes to the Hitachi 4TB disks (I misbehaved on that system so
> then issue could be my own; however on another system there wasn't any
> errors).
>
> I have found, when I have multiple, identical controllers, that the same
> firmware across the controllers is a good approach, otherwise weirdness
> and different MBs manifest this problem in different ways. Also, make
> sure your MB's BIOS is recent.
>
> YMMV
>
>
>
>

From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 19 16:55:46 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 548241065672;
	Sun, 19 Feb 2012 16:55:46 +0000 (UTC)
	(envelope-from arno@heho.snv.jussieu.fr)
Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129])
	by mx1.freebsd.org (Postfix) with ESMTP id BBE0F8FC0A;
	Sun, 19 Feb 2012 16:55:45 +0000 (UTC)
Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22])
	by shiva.jussieu.fr (8.14.4/jtpda-5.4) with ESMTP id q1JGtInM021294
	; Sun, 19 Feb 2012 17:55:31 +0100 (CET)
X-Ids: 168
Received: from heho.snv.jussieu.fr (localhost [127.0.0.1])
	by heho.snv.jussieu.fr (8.14.3/8.14.3) with ESMTP id q1JGsoLU054604;
	Sun, 19 Feb 2012 17:54:50 +0100 (CET)
	(envelope-from arno@heho.snv.jussieu.fr)
Received: (from arno@localhost)
	by heho.snv.jussieu.fr (8.14.3/8.14.3/Submit) id q1JGsoIr054599;
	Sun, 19 Feb 2012 17:54:50 +0100 (CET) (envelope-from arno)
To: Martin Simmons <martin@lispworks.com>
From: "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
References: <wpty2xcqop.fsf@heho.snv.jussieu.fr>
	<wppqdifjed.fsf@heho.snv.jussieu.fr>
	<201202141820.q1EIK1MP032526@higson.cam.lispworks.com>
	<wpty2orpq2.fsf@heho.snv.jussieu.fr>
Date: Sun, 19 Feb 2012 17:54:50 +0100
In-Reply-To: <wpty2orpq2.fsf@heho.snv.jussieu.fr> (Arno J. Klaassen's message
	of "Sat\, 18 Feb 2012 18\:55\:17 +0100")
Message-ID: <wpzkce2279.fsf_-_@heho.snv.jussieu.fr>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Miltered: at jchkmail.jussieu.fr with ID 4F412976.000 by Joe's j-chkmail
	(http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 4F412976.000/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/<arno@heho.snv.jussieu.fr>
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject: 9-stable: one-device ZFS fails [was: 9-stable : geli + one-disk ZFS
	fails]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2012 16:55:46 -0000


a followup to myself

> Hello,
>
> Martin Simmons <martin@lispworks.com> writes:
>
>> Some random ideas:
>>
>> 1) Can you dd the whole of ada0s3.eli without errors?
>>
>> 2) If you scrub a few more times, does it find the same number of errors each
>> time and are they always in that XNAT.tar file?
>>
>> 3) Can you try zfs without geli?
>
>
> yeah, and it seems to rule out geli :
>
> [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and
> /dev/ada0s4 ]
>
>  geli init /dev/ada0s3
>  geli attach /dev/ada0s3
>
>  zpool create zgeli /dev/ada0s3.eli
>
>  zfs create zgeli/home
>  zfs create zgeli/home/arno
>  zfs create zgeli/home/arno/.priv
>  zfs create zgeli/home/arno/.scito
>  zfs set copies=2 zgeli/home/arno/.priv
>  zfs set atime=off zgeli
>
>
> [put some files on it, wait a little : ]
>
>
>    [root@cc ~]# zpool status -v
>    pool: zgeli
>   state: ONLINE
>  status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
>  action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://www.sun.com/msg/ZFS-8000-8A
>    scan: scrub in progress since Sat Feb 18 17:46:54 2012
>          425M scanned out of 2.49G at 85.0M/s, 0h0m to go
>          0 repaired, 16.64% done
>  config: 
>  
>          NAME          STATE     READ WRITE CKSUM
>          zgeli         ONLINE       0     0     1
>            ada0s3.eli  ONLINE       0     0     2
>
>  errors: Permanent errors have been detected in the following files:
>
>         /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso
>  [root@cc ~]# zpool scrub -s zgeli
>  [root@cc ~]# 
>
>
> [then idem directly on next partition ]
>
>  zpool create zgpart /dev/ada0s4
>
>  zfs create zgpart/home
>  zfs create zgpart/home/arno
>  zfs create zgpart/home/arno/.priv
>  zfs create zgpart/home/arno/.scito
>  zfs set copies=2 zgpart/home/arno/.priv
>  zfs set atime=off zgpart
>
> [put some files on it, wait a little : ]
>
>    pool: zgpart
>   state: ONLINE
>  status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
>  action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://www.sun.com/msg/ZFS-8000-8A
>    scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012
>  config:
>
>          NAME        STATE     READ WRITE CKSUM
>          zgpart      ONLINE       0     0     1
>            ada0s4    ONLINE       0     0     2
>
>  errors: Permanent errors have been detected in the following files:
>
>          /zgpart/home/arno/.scito/ ....
>  [root@cc ~]# 


I tested a bit more this afternoon :


  - zpool create zgpart /dev/ada0s4d  => 
    
    KO

  - split ada0s4 in two equally sized partitions and then
      
      zpool create zgpart mirror /dev/ada0s4d /dev/ada0s4e =>

    works like a charm .....

   ( [root@cc /zgpart]# zpool status -v zgpart
       pool: zgpart
       state: ONLINE
       scan: scrub repaired 0 in 0h36m with 0 errors on Sun Feb 19
       17:20:34 2012
     config:

        NAME         STATE     READ WRITE CKSUM
        zgpart       ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            ada0s4d  ONLINE       0     0     0
            ada0s4e  ONLINE       0     0     0

     errors: No known data errors )
  

FYI, best, Arno


>
> I still do not particuliarly suspect the disk since I cannot reproduce
> similar behaviour on UFS.
>
> That said, this disk is supposed to be 'hybrid-SSD', maybe something
> special ZFS doesn't like ??? :
>
>
>  ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
>  ada0: <ST95005620AS SD23> ATA-8 SATA 2.x device
>  ada0: Serial Number 5YX0J5YD
>  ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>  ada0: Command Queueing enabled
>  ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
>  ada0: Previously was known as ad4
>  GEOM: new disk ada0
>
>
> Please let me know what information to provide more.
>
> Best,
>
> Arno
>
>
>
>
>> 4) Is the slice/partition layout definitely correct?
>>
>> __Martin
>>
>>
>>>>>>> On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
>>> 
>>> hello,
>>> 
>>> to eventually gain interest in this issue :
>>> 
>>>  I updated to today's -stable, tested with vfs.zfs.debug=1
>>>  and vfs.zfs.prefetch_disable=0, no difference.
>>> 
>>>  I also tested to read the raw partition :
>>> 
>>>   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
>>>   103746636+0 records in
>>>   103746636+0 records out
>>>   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
>>>   [root@cc /usr/ports]#
>>> 
>>>  Disk is brand new, looks ok, either my setup is not good or there is
>>>  a bug somewhere; I can play around with this box for some more time,
>>>  please feel free to provide me with some hints what to do to be useful
>>>  for you.
>>> 
>>> Best,
>>> 
>>> Arno
>>> 
>>> 
>>> "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> writes:
>>> 
>>> > Hello,
>>> >
>>> >
>>> > I finally decided to 'play' a bit with ZFS on a notebook, some years
>>> > old, but I installed a brand new disk and memtest passes OK.
>>> >
>>> > I installed base+ports on partition 2, using 'classical' UFS.
>>> >
>>> > I crypted partition 3 and created a single zpool on it containing
>>> > 4 Z-"file-systems" :
>>> >
>>> >  [root@cc ~]# zfs list
>>> >  NAME                      USED  AVAIL  REFER  MOUNTPOINT
>>> >  zfiles                   10.7G   377G   152K  /zfiles
>>> >  zfiles/home              10.6G   377G   119M  /zfiles/home
>>> >  zfiles/home/arno         10.5G   377G  2.35G  /zfiles/home/arno
>>> >  zfiles/home/arno/.priv    192K   377G   192K  /zfiles/home/arno/.priv
>>> >  zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
>>> >
>>> >
>>> > I export the ZFS's via nfs and rsynced on the other machine some backup
>>> > of my current note-book (geli + UFS, (almost) same 9-stable version, no
>>> > problem) to the ZFS's.
>>> >
>>> >
>>> > Quite fast, I see on the notebook :
>>> >
>>> >
>>> >  [root@cc /usr/temp]# zpool status -v
>>> >    pool: zfiles
>>> >   state: ONLINE
>>> >  status: One or more devices has experienced an error resulting in data
>>> >          corruption.  Applications may be affected.
>>> >  action: Restore the file in question if possible.  Otherwise restore the
>>> >          entire pool from backup.
>>> >     see: http://www.sun.com/msg/ZFS-8000-8A
>>> >    scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
>>> >    2012
>>> >  config: 
>>> >  
>>> >          NAME          STATE     READ WRITE CKSUM
>>> >          zfiles        ONLINE       0     0    11
>>> >            ada0s3.eli  ONLINE       0     0    23
>>> >
>>> >  errors: Permanent errors have been detected in the following files:
>>> >
>>> >          /zfiles/home/arno/.scito/contrib/XNAT.tar
>>> >  [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
>>> >  md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
>>> >  [root@cc /usr/temp]#
>>> >
>>> >
>>> > As said, memtest is OK, nothing is logged to the console, UFS on the
>>> > same disk works OK (I did some tests copying and comparing random data)
>>> > and smartctl as well seems to trust the disk :
>>> >
>>> >  SMART Self-test log structure revision number 1
>>> >  Num  Test_Description    Status                  Remaining  LifeTime(hours)
>>> >  # 1  Extended offline    Completed without error       00%       388
>>> >  # 2  Short offline       Completed without error       00%       387 
>>> >
>>> >
>>> > Am I doing something wrong and/or let me know what I could provide as
>>> > extra info to try to solve this (dmesg.boot at the end of this mail).
>>> >
>>> > Thanx a lot in advance,
>>> >
>>> > best, Arno
>>> >
>>> >
>>> >
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 20 03:43:35 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A682A106564A;
	Mon, 20 Feb 2012 03:43:35 +0000 (UTC)
	(envelope-from smckay@internode.on.net)
Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net
	[150.101.137.141])
	by mx1.freebsd.org (Postfix) with ESMTP id 0977E8FC14;
	Mon, 20 Feb 2012 03:43:34 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Av0EAAq9QU920ALe/2dsb2JhbABDsiyBCIF0AQVWIxABCkY5BBq9e4t9AgQQBgsJNQkDAoNiWIMeBKg2
Received: from ppp118-208-2-222.lns20.bne1.internode.on.net (HELO
	dungeon.home) ([118.208.2.222])
	by ipmail04.adl6.internode.on.net with ESMTP; 20 Feb 2012 13:58:17 +1030
Received: from dungeon.home (localhost [127.0.0.1])
	by dungeon.home (8.14.4/8.14.3) with ESMTP id q1K3ROrt009042;
	Mon, 20 Feb 2012 13:27:24 +1000 (EST) (envelope-from mckay)
Message-Id: <201202200327.q1K3ROrt009042@dungeon.home>
From: Stephen McKay <mckay@freebsd.org>
To: freebsd-fs@freebsd.org
References: <201103081425.p28EPQtM002115@dungeon.home>
	<201107052241.p65MfqVA002215@dungeon.home>
In-Reply-To: <201107052241.p65MfqVA002215@dungeon.home>
	from Stephen McKay at "Wed, 06 Jul 2011 08:41:52 +1000"
Date: Mon, 20 Feb 2012 13:27:24 +1000
Sender: smckay@internode.on.net
Cc: Stephen McKay <mckay@freebsd.org>
Subject: Re: Constant minor ZFS corruption, probably solved 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Feb 2012 03:43:35 -0000

On Wednesday, 6th July 2011, Stephen McKay wrote:

>Perhaps you remember me struggling with a small but continuous amount
>of corruption on ZFS volumes with a new server we had built at work.

>... I've now done enough tests so that I'm 90%
>certain what the problem is: Seagate's caching firmware.

>... I'm certain that disabling write caching
>has given us a stable machine.  And I'm 90% certain that it's because
>of bugs in Seagate's cache firmware.  I hope someone else can replicate
>this and settle the issue.

I'm following up on an old post of mine to confirm that my write cache
disabling workaround is well and truly successful.

Eight months later we've seen no further corruption when using Seagate
ST2000DL003 disks.  The machine (now running 9.0-RELEASE) sees constant
moderate to low activity as a file server (about 6TB in use).

I did receive a message from one other person suffering from the same
problem.  It was solved by disabling write caching, so that's two
data points.  And two data points is a trend, right? :-)

His system was running 8.2-stable on an AMD Phenom CPU in a MSI 870-G45
motherboard (AMD SB710 southbridge) so there's very little overlap
with our system: just zfs and Seagate green disks.  His disks were
ST1500DL003 (1.5TB) with firmware CC32 so that more or less means the
common points are simply zfs and Seagate CC32 firmware.  You already
know which one I think is to blame.

But then again no avalanche of complaints has been seen either, so
it's still somewhat mysterious.  Is there some other problem that is
just being masked by disabling the cache?  Unless there's a sudden
surge in reports, we'll never know for certain.

So, if you've seen this problem and cured it by disabling the write
cache, I'd like to know about it.

How's your data?  Run a scrub lately?  Perhaps now is a good time. ;-)

Cheers,

Stephen.

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 20 11:07:05 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00887106566B
	for <freebsd-fs@FreeBSD.org>; Mon, 20 Feb 2012 11:07:05 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id E2AAD8FC1E
	for <freebsd-fs@FreeBSD.org>; Mon, 20 Feb 2012 11:07:04 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1KB74qi090102
	for <freebsd-fs@FreeBSD.org>; Mon, 20 Feb 2012 11:07:04 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1KB74ml090100
	for freebsd-fs@FreeBSD.org; Mon, 20 Feb 2012 11:07:04 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 20 Feb 2012 11:07:04 GMT
Message-Id: <201202201107.q1KB74ml090100@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Feb 2012 11:07:05 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/165087  fs         [unionfs] lock violation in unionfs
o kern/164472  fs         [ufs] fsck -B panics on particular data inconsistency
o kern/164462  fs         [nfs] NFSv4 mounting fails to mount; asks for stronger
o kern/164370  fs         [zfs] zfs destroy for snapshot fails on i386 and sparc
o kern/164261  fs         [nullfs] [patch] fix panic with NFS served from NULLFS
o kern/164256  fs         [zfs] device entry for volume is not created after zfs
o kern/164184  fs         [ufs] [panic] Kernel panic with ufs_makeinode
o kern/163801  fs         [md] [request] allow mfsBSD legacy installed in 'swap'
o kern/163770  fs         [zfs] [hang] LOR between zfs&syncer + vnlru leading to
o kern/163501  fs         [nfs] NFS exporting a dir and a subdir in that dir to 
o kern/162944  fs         [coda] Coda file system module looks broken in 9.0
o kern/162860  fs         [zfs] Cannot share ZFS filesystem to hosts with a hyph
o kern/162751  fs         [zfs] [panic] kernel panics during file operations
o kern/162591  fs         [nullfs] cross-filesystem nullfs does not work as expe
o kern/162519  fs         [zfs] "zpool import" relies on buggy realpath() behavi
o kern/162362  fs         [snapshots] [panic] ufs with snapshot(s) panics when g
o kern/162083  fs         [zfs] [panic] zfs unmount -f pool
o kern/161968  fs         [zfs] [hang] renaming snapshot with -r including a zvo
o kern/161897  fs         [zfs] [patch] zfs partition probing causing long delay
o kern/161864  fs         [ufs] removing journaling from UFS partition fails on 
o bin/161807   fs         [patch] add option for explicitly specifying metadata 
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161511  fs         [unionfs] Filesystem deadlocks when using multiple uni
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
o kern/160860  fs         [ufs] Random UFS root filesystem corruption with SU+J 
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159663  fs         [socket] [nullfs] sockets don't work though nullfs mou
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         amd(8) ICMP storm and unkillable process.
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157722  fs         [geli] unable to newfs a geli encrypted partition
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
f kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
f bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
f sparc/123566 fs         [zfs] zpool import issue: EOVERFLOW
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

262 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 20 15:08:06 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D8B46106564A
	for <freebsd-fs@FreeBSD.org>; Mon, 20 Feb 2012 15:08:06 +0000 (UTC)
	(envelope-from mikl@d902.iki.rssi.ru)
Received: from d902.iki.rssi.ru (d902.iki.rssi.ru [193.232.9.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 4178C8FC12
	for <freebsd-fs@FreeBSD.org>; Mon, 20 Feb 2012 15:08:05 +0000 (UTC)
Received: from [193.232.9.155] ([193.232.9.155])
	by d902.iki.rssi.ru (8.14.2/8.13.1) with ESMTP id q1KESAC6029945
	for <freebsd-fs@FreeBSD.org>; Mon, 20 Feb 2012 18:28:10 +0400 (GMT-4)
	(envelope-from mikl@d902.iki.rssi.ru)
Message-ID: <4F4258DB.3010303@d902.iki.rssi.ru>
Date: Mon, 20 Feb 2012 18:29:47 +0400
From: =?UTF-8?B?0KHQtdGA0LPQtdC5INCc0LjQutC70LDRiNC10LLQuNGH?=
	<mikl@d902.iki.rssi.ru>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.26) Gecko/20120131 Thunderbird/3.1.18
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: HAST on raid-controller
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Feb 2012 15:08:06 -0000

Hello!

I tried to create hast-cluster on my test-servers. They have 
raid-controlles Adaptec 2820SA, device aacd1. After creating 
/etc/hast.conf (much the same as in FreeBSD handbook) it isn't working 
with the message:

 >hastctl create reserve
 >[ERROR] [reserve] Unable to open /dev/aacd1: Operation not permitted.

Keep it in mind, can HAST work on raid-controllers (or raid-controllers 
Adaptec)?

With best regards, Sergey.


From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 20 16:57:08 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E9B4A106564A;
	Mon, 20 Feb 2012 16:57:08 +0000 (UTC)
	(envelope-from rmacklem@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id C119F8FC0C;
	Mon, 20 Feb 2012 16:57:08 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1KGv87T025173;
	Mon, 20 Feb 2012 16:57:08 GMT
	(envelope-from rmacklem@freefall.freebsd.org)
Received: (from rmacklem@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1KGv8c9025169;
	Mon, 20 Feb 2012 16:57:08 GMT (envelope-from rmacklem)
Date: Mon, 20 Feb 2012 16:57:08 GMT
Message-Id: <201202201657.q1KGv8c9025169@freefall.freebsd.org>
To: rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org, rmacklem@FreeBSD.org
From: rmacklem@FreeBSD.org
Cc: 
Subject: Re: kern/164462: [nfs] NFSv4 mounting fails to mount;
	asks for stronger authentication
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Feb 2012 16:57:09 -0000

Synopsis: [nfs] NFSv4 mounting fails to mount; asks for stronger authentication

Responsible-Changed-From-To: freebsd-fs->rmacklem
Responsible-Changed-By: rmacklem
Responsible-Changed-When: Mon Feb 20 16:55:59 UTC 2012
Responsible-Changed-Why: 

I have asked for feedback on this via email, so I might
as well take it.

http://www.freebsd.org/cgi/query-pr.cgi?pr=164462

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 20 19:20:18 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 990811065722
	for <freebsd-fs@hub.freebsd.org>; Mon, 20 Feb 2012 19:20:16 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B74F8FC12
	for <freebsd-fs@hub.freebsd.org>; Mon, 20 Feb 2012 19:20:16 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1KJKFRY058033
	for <freebsd-fs@freefall.freebsd.org>; Mon, 20 Feb 2012 19:20:15 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1KJKFXE058032;
	Mon, 20 Feb 2012 19:20:15 GMT (envelope-from gnats)
Date: Mon, 20 Feb 2012 19:20:15 GMT
Message-Id: <201202201920.q1KJKFXE058032@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Mattias Lindgren <mlindgren@gmail.com>
Cc: 
Subject: Re: kern/149495: [zfs] chflags sappend on zfs not working right
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Mattias Lindgren <mlindgren@gmail.com>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Feb 2012 19:20:18 -0000

The following reply was made to PR kern/149495; it has been noted by GNATS.

From: Mattias Lindgren <mlindgren@gmail.com>
To: bug-followup@FreeBSD.org, daniel@zhelev.biz
Cc:  
Subject: Re: kern/149495: [zfs] chflags sappend on zfs not working right
Date: Mon, 20 Feb 2012 11:49:52 -0700

 --e0cb4efe31b482a54904b969c2c3
 Content-Type: text/plain; charset=ISO-8859-1
 
 Having similar issues in FreeBSD 9-AMD64 with ZFS v 28
 
 $ mkdir critical
 $ touch critical/critical.log
 $ sudo chmod o= critical
 
 $ sudo chflags sappnd critical
 $ sudo chflags sappnd critical/*
 
 $ echo "test" > critical/critical.log
 -bash: critical/critical.log: Operation not permitted
 $ echo "test" >> critical/critical.log
 $ grep test critical/critical.log
 test
 $ rm -rf critical/critical.log
 $ ls -l critical/
 total 0
 
 Am under the impression that I should not be able to delete files once the
 sappend flag has been set.
 
 Please let me know if you'd like me to do further testing.
 
 Thanks,
 
 Mattias
 
 --e0cb4efe31b482a54904b969c2c3
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 Having similar issues in FreeBSD 9-AMD64 with ZFS v 28<div><br></div><div><=
 div>$ mkdir critical</div><div>$ touch critical/critical.log
 </div><div>$ sudo chmod o=3D critical</div><div><br></div><div>$ sudo chfla=
 gs sappnd critical</div><div>$ sudo chflags sappnd critical/*</div><div><br=
 ></div><div>$ echo &quot;test&quot; &gt; critical/critical.log</div><div>
 -bash: critical/critical.log: Operation not permitted</div><div>$ echo &quo=
 t;test&quot; &gt;&gt; critical/critical.log</div><div>$ grep test critical/=
 critical.log</div><div>test</div><div>$ rm -rf critical/critical.log</div>
 </div><div>$ ls -l critical/</div><div>total 0</div><div><br></div><div>Am =
 under the impression that I should not be able to delete files once the sap=
 pend flag has been set. =A0</div><div><br></div><div>Please let me know if =
 you&#39;d like me to do further testing.</div>
 <div><br></div><div>Thanks,</div><div><br></div><div>Mattias</div><div><br>=
 </div>
 
 --e0cb4efe31b482a54904b969c2c3--

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 21 02:38:04 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 94B86106566B
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Feb 2012 02:38:04 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx09.syd.optusnet.com.au
	(fallbackmx09.syd.optusnet.com.au [211.29.132.242])
	by mx1.freebsd.org (Postfix) with ESMTP id 2C9AB8FC14
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Feb 2012 02:38:03 +0000 (UTC)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q1L2RvJx031027
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Feb 2012 13:27:57 +1100
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q1L2Rrn7006192
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 21 Feb 2012 13:27:54 +1100
Date: Tue, 21 Feb 2012 13:27:53 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Mattias Lindgren <mlindgren@gmail.com>
In-Reply-To: <201202201920.q1KJKFXE058032@freefall.freebsd.org>
Message-ID: <20120221111121.I2928@besplex.bde.org>
References: <201202201920.q1KJKFXE058032@freefall.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org
Subject: Re: kern/149495: [zfs] chflags sappend on zfs not working right
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Feb 2012 02:38:04 -0000

On Mon, 20 Feb 2012, Mattias Lindgren wrote:

> Having similar issues in FreeBSD 9-AMD64 with ZFS v 28
>
> $ mkdir critical
> $ touch critical/critical.log
> $ sudo chmod o= critical
>
> $ sudo chflags sappnd critical
> $ sudo chflags sappnd critical/*
>
> $ echo "test" > critical/critical.log
> -bash: critical/critical.log: Operation not permitted
> $ echo "test" >> critical/critical.log
> $ grep test critical/critical.log
> test
> $ rm -rf critical/critical.log
> $ ls -l critical/
> total 0
>
> Am under the impression that I should not be able to delete files once the
> sappend flag has been set.

It is a bug in 4.4BSD and ffs that [su]append prevents deleting files.
Deletion of files should be prevented only by the [su]nounlink flag
and the [su]immutable flag, but 4.4BSD didn't have the [su]nounlink
flag, and it is insecure to allow unlinking any [su]append file, so
4.4BSD and ffs have the non-orthogonal behaviour of never allowing one
to be unlinked, and this wasn't changed when [su]nounlink was added.

This bug apparently isn't implemented in zfs.  I don't know much about
zfs, but zfs_zacces_delete() seems to only test the immutable and
nounlink flags.  Try adding the append flag there.

Nearby bugs:
- the [su]append flags have the silly abbreviations [su]appnd in chflags(1).
   ls -o output to show these flags will be wide anyway, and 1 character is
   not worth saving.  The 1-char difference is just confusing for input.
- the [su]nounlink flags have the much worse abbreviations [su]unlnk in
   chflags(1).  Even the non-abbreviated forms [su]unlink are missing their
   'no' prefix.  So unlink means nounlink, and if you want to unset this,
   you use no[su]unlink which actually means nonounlink, that is, unlink,
   that is, unlinking is not restricted by the flag.  The u prefix also
   makes uunlink hard to read.  unounlink would be better.

The following is hopefully only in ffs (except in my version):
- setting of flags is non-orthogonal.  Normal read-modify-write operations
   don't work for users, although they work for root.

   The details of this bug were changed between 4.4BSD-Lite1 and
   4.4BSD-Lite2 and reached FreeBSD in chflags(2)'s code in 1997 and
   in chflags(2)'s man page in 2006 (the latter with grammar errors).
   This makes chflags(2) very difficult to use.  Naive programs like
   chflags(1) don't understand this, and just do a simple
   read-modify-write operation.  This gives weird behaviour which
   can be worked around if you understand chflags(2) better than
   chflags(1) does.  For example:

     Suppose you have a file with some harmless system flag like
     `archive' (this is the only one).  This doesn't prevent anyone
     changing their flags.  But it prevents users changing their
     flags in the normal way.  "chflags uchg file" will fail because
     it is turned into a chflags(2) request to set the existing
     archive flag as well as the uchg flag.  As documented, the
     former is not permitted.  So to set your uchg flag while
     preserving the archive flag (which you can't change either
     way), you must ask for the archive flag to be cleared: "chflags
     noarch uchg" after first determining which system flags are
     set.  Similarly for using chflags(2), except now you must clear
     all the system flags that are set, and can do this more easily
     by setting all the system flags.  Clearing all your flags is
     easier: just ask for flags of 0 with chflags(either 1 or 2).

     However, if you are root, then you must not request any system
     flags to be cleared unless you actually want them cleared, since
     the request will actually work for root.

     However2, since ffs has null support for the archive flag, setting
     it for ffs is almost useless and rarely done, so the bug has little
     affect.  My version also allows user changes if only the sunlink
     flag is set (why should preventing unlinking prevent chmod() when
     it doesn't prevent truncating the file or filling it with garbage?).
     But I now thing that this is not a good idea and the change should
     go the other way, so that uunlink prevents changes like sunlink does.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 21 13:46:37 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D2FBD106564A
	for <freebsd-fs@freebsd.org>; Tue, 21 Feb 2012 13:46:37 +0000 (UTC)
	(envelope-from gkontos.mail@gmail.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 8640C8FC0A
	for <freebsd-fs@freebsd.org>; Tue, 21 Feb 2012 13:46:37 +0000 (UTC)
Received: by vcmm1 with SMTP id m1so6106840vcm.13
	for <freebsd-fs@freebsd.org>; Tue, 21 Feb 2012 05:46:36 -0800 (PST)
Received-SPF: pass (google.com: domain of gkontos.mail@gmail.com designates
	10.52.91.196 as permitted sender) client-ip=10.52.91.196; 
Authentication-Results: mr.google.com;
	spf=pass (google.com: domain of gkontos.mail@gmail.com
	designates 10.52.91.196 as permitted sender)
	smtp.mail=gkontos.mail@gmail.com;
	dkim=pass header.i=gkontos.mail@gmail.com
Received: from mr.google.com ([10.52.91.196])
	by 10.52.91.196 with SMTP id cg4mr11864730vdb.68.1329831996730
	(num_hops = 1); Tue, 21 Feb 2012 05:46:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=wmHUeA3S9/dNVoRyVdhDRIXs8QD4QJXxUV0fMhDbnbo=;
	b=GYmTaxOAdzsh8xOXEvwHesrpvpSDPIONtOMi5nwIcMe2Pm9Z3AeR5kQeV2S5JjG4oN
	bfCZiejudOsinjZmH/ylLs27vsv3fOCMrf50eXLZx2HpvBqFN0MmKMjHCm6ruD3jMxX6
	Fv6rNdh/XVk6c3SQBs1b1/N2p4gK5zBiU54mA=
MIME-Version: 1.0
Received: by 10.52.91.196 with SMTP id cg4mr9600310vdb.68.1329831996670; Tue,
	21 Feb 2012 05:46:36 -0800 (PST)
Received: by 10.220.38.67 with HTTP; Tue, 21 Feb 2012 05:46:36 -0800 (PST)
In-Reply-To: <4F4258DB.3010303@d902.iki.rssi.ru>
References: <4F4258DB.3010303@d902.iki.rssi.ru>
Date: Tue, 21 Feb 2012 15:46:36 +0200
Message-ID: <CA+dUSyrSFsAgo5F6ReQcoHKxnsjx5e2jAdZjdc8AFxyqFbOJFw@mail.gmail.com>
From: George Kontostanos <gkontos.mail@gmail.com>
To: =?UTF-8?B?0KHQtdGA0LPQtdC5INCc0LjQutC70LDRiNC10LLQuNGH?=
	<mikl@d902.iki.rssi.ru>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: HAST on raid-controller
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Feb 2012 13:46:37 -0000

On Mon, Feb 20, 2012 at 4:29 PM, =D0=A1=D0=B5=D1=80=D0=B3=D0=B5=D0=B9 =D0=
=9C=D0=B8=D0=BA=D0=BB=D0=B0=D1=88=D0=B5=D0=B2=D0=B8=D1=87
<mikl@d902.iki.rssi.ru> wrote:
> Hello!
>
> I tried to create hast-cluster on my test-servers. They have raid-control=
les
> Adaptec 2820SA, device aacd1. After creating /etc/hast.conf (much the sam=
e
> as in FreeBSD handbook) it isn't working with the message:
>
>>hastctl create reserve
>>[ERROR] [reserve] Unable to open /dev/aacd1: Operation not permitted.
>
> Keep it in mind, can HAST work on raid-controllers (or raid-controllers
> Adaptec)?
>
> With best regards, Sergey.
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

This doesn't appear to be a HAST error message. Can you create a FS in aacd=
1?


--=20
George Kontostanos
Aicom telecoms ltd
http://www.aisecure.net

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 22 18:55:56 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CCF68106566C
	for <freebsd-fs@freebsd.org>; Wed, 22 Feb 2012 18:55:56 +0000 (UTC)
	(envelope-from ian@ndwns.net)
Received: from smtpauth.rollernet.us (smtpauth.rollernet.us
	[IPv6:2607:fe70:0:3::d])
	by mx1.freebsd.org (Postfix) with ESMTP id AD8348FC1C
	for <freebsd-fs@freebsd.org>; Wed, 22 Feb 2012 18:55:56 +0000 (UTC)
Received: from smtpauth.rollernet.us (localhost [127.0.0.1])
	by smtpauth.rollernet.us (Postfix) with ESMTP id EB73859446F
	for <freebsd-fs@freebsd.org>; Wed, 22 Feb 2012 10:55:34 -0800 (PST)
Received: from localhost (c-76-126-116-195.hsd1.ca.comcast.net
	[76.126.116.195])
	(using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits))
	(No client certificate requested)
	by smtpauth.rollernet.us (Postfix) with ESMTPSA
	for <freebsd-fs@freebsd.org>; Wed, 22 Feb 2012 10:55:34 -0800 (PST)
Date: Wed, 22 Feb 2012 10:55:52 -0800
From: Ian Downes <ian@ndwns.net>
To: freebsd-fs@freebsd.org
Message-ID: <20120222185552.GA86902@weta.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Rollernet-Abuse: Processed by Roller Network Mail Services. Contact
	abuse@rollernet.us to report violations. Abuse policy:
	http://www.rollernet.us/policy
X-Rollernet-Submit: Submit ID 5cf5.4f453a26.679b9.0
Subject: ZFS: arc_meta consumes *all* ram
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Feb 2012 18:55:56 -0000

Is vfs.zfs.arc_meta_limit supposed to be a (relatively) hard limit on
cached metadata?

I've limited the arc size with arc_max but how do I effectively limit
the caching of meta data?

Suggestions appreciated!

details:
ZFS is exceeding vfs.zfs.arc_meta_limit on some of my boxes; consuming
all available RAM, paging everything out and bringing the system to its
knees.

$ uname -a
FreeBSD local 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Fri Jul  8 00:54:56 UTC 2011     root@8.8.8.8:/usr/obj/usr/src/sys/XENHVM  amd64
$ sysctl vfs.zfs | grep arc_meta
vfs.zfs.arc_meta_limit: 1610612736
vfs.zfs.arc_meta_used: 12183379056

Note that this is 7-8X over arc_meta_limit and was all the available
RAM on the box.

This can be reproduced on several boxes (8.2-RELEASE patched to ZFS
5/28 and 9.0-RELEASE) when periodic/security/100.chksetuid runs and does
a find over all filesystems.



From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 11:42:11 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 76853106566B;
	Fri, 24 Feb 2012 11:42:11 +0000 (UTC)
	(envelope-from luke-lists@hybrid-logic.co.uk)
Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127])
	by mx1.freebsd.org (Postfix) with ESMTP id 3DEF98FC0A;
	Fri, 24 Feb 2012 11:42:10 +0000 (UTC)
Received: from [127.0.0.1] (helo=youse)
	by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD))
	(envelope-from <luke-lists@hybrid-logic.co.uk>)
	id 1S0szG-000EeU-ET; Fri, 24 Feb 2012 11:07:00 +0000
From: Luke Marsden <luke-lists@hybrid-logic.co.uk>
To: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 24 Feb 2012 11:06:52 +0000
Message-ID: <1330081612.13430.39.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Spam-bar: /
Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk
Subject: Another ZFS ARC memory question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 11:42:11 -0000

Hi all,

Just wanted to get your opinion on best practices for ZFS.

We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
but have been having trouble with short spikes in application memory
usage resulting in huge amounts of swapping, bringing the whole machine
to its knees and crashing it hard.  I suspect this is because when there
is a sudden spike in memory usage the zfs arc reclaim thread is unable
to free system memory fast enough.

This most recently happened yesterday as you can see from the following
munin graphs:

E.g. http://hybrid-logic.co.uk/memory-day.png
     http://hybrid-logic.co.uk/swap-day.png

Our response has been to start limiting the ZFS ARC cache to 4GB on our
production machines - trading performance for stability is fine with me
(and we have L2ARC on SSD so we still get good levels of caching).

My questions are:

      * is this a known problem?
      * what is the community's advice for production machines running
        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
        that there's enough actually free memory to handle a spike in
        application memory usage) the best solution to this
        spike-in-memory-means-crash problem?
      * has FreeBSD 9.0 / ZFS v28 solved this problem?
      * rather than setting a hard limit on the ARC cache size, is it
        possible to adjust the auto-tuning variables to leave more free
        memory for spiky memory situations?  e.g. set the auto-tuning to
        make arc eat 80% of memory instead of ~95% like it is at
        present?
      * could the arc reclaim thread be made to drop ARC pages with
        higher priority before the system starts swapping out
        application pages?

Thank you for any/all answers, and thank you for making FreeBSD
awesome :-)

Best Regards,
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 




From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 12:30:14 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D7CB81065678
	for <freebsd-fs@hub.freebsd.org>; Fri, 24 Feb 2012 12:30:14 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id A8F668FC0C
	for <freebsd-fs@hub.freebsd.org>; Fri, 24 Feb 2012 12:30:14 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1OCUEqW055017
	for <freebsd-fs@freefall.freebsd.org>; Fri, 24 Feb 2012 12:30:14 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1OCUEN5055014;
	Fri, 24 Feb 2012 12:30:14 GMT (envelope-from gnats)
Date: Fri, 24 Feb 2012 12:30:14 GMT
Message-Id: <201202241230.q1OCUEN5055014@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Peter Maloney <peter.maloney@brockmann-consult.de>
Cc: 
Subject: Re: kern/128173: [ext2fs] ls gives &quot;Input/output error&quot;
 on mounted ext3 filesystem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Peter Maloney <peter.maloney@brockmann-consult.de>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 12:30:14 -0000

The following reply was made to PR kern/128173; it has been noted by GNATS.

From: Peter Maloney <peter.maloney@brockmann-consult.de>
To: bug-followup@FreeBSD.org, christope.cap@gmail.com
Cc:  
Subject: Re: kern/128173: [ext2fs] ls gives &quot;Input/output error&quot;
 on mounted ext3 filesystem
Date: Fri, 24 Feb 2012 13:23:59 +0100

 I have a similar problem... but not with ls.
 
 # md5 biglonguglyfilename.zip
 MD5 (biglonguglyfilename.zip) = 511fdc3352d9265ffac0d472de7bb994
 
 # md5 differentbiglonguglyfilename.zip
 md5: differentbiglonguglyfilename.zip: Input/output error
 
 These files can be read with no problem in Linux, whether I mount the
 system as ext3 or ext2.
 
 I did not create this file system; it was from a 3rd party.
 
 Here is tune2fs output from a Linux machine:
 
 # tune2fs -l /dev/sdb1
 tune2fs 1.41.14 (22-Dec-2010)
 Filesystem volume name:   [snip]
 Last mounted on:          <not available>
 Filesystem UUID:          [snip]
 Filesystem magic number:  0xEF53
 Filesystem revision #:    1 (dynamic)
 Filesystem features:      has_journal ext_attr resize_inode dir_index
 filetype needs_recovery sparse_super large_file
 Filesystem flags:         signed_directory_hash
 Default mount options:    (none)
 Filesystem state:         clean
 Errors behavior:          Continue
 Filesystem OS type:       Linux
 Inode count:              244203520
 Block count:              488378000
 Reserved block count:     0
 Free blocks:              16884957
 Free inodes:              244183788
 First block:              0
 Block size:               4096
 Fragment size:            4096
 Reserved GDT blocks:      907
 Blocks per group:         32768
 Fragments per group:      32768
 Inodes per group:         16384
 Inode blocks per group:   512
 Filesystem created:       Sun Oct 23 17:07:18 2011
 Last mount time:          Fri Feb 24 13:10:50 2012
 Last write time:          Fri Feb 24 13:10:50 2012
 Mount count:              6
 Maximum mount count:      21
 Last checked:             Sun Oct 23 17:07:18 2011
 Check interval:           15552000 (6 months)
 Next check after:         Fri Apr 20 17:07:18 2012
 Reserved blocks uid:      0 (user root)
 Reserved blocks gid:      0 (group root)
 First inode:              11
 Inode size:               128
 Journal inode:            8
 Default directory hash:   tea
 Directory Hash Seed:      364481f6-7b5a-4cbb-89d7-7e50c112c884
 Journal backup:           inode blocks
 
 
 # uname -a
 FreeBSD smostank2.bc.local 8.2-STABLE-20120204 FreeBSD
 8.2-STABLE-20120104 #0: Mon Feb  6 12:10:32 UTC 2012    
 root@bczfsvm1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
 
 According to the man page of mkfs.ext4 in FreeBSD: E2fsprogs version 1.42

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 12:44:42 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 377D5106566C
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 12:44:42 +0000 (UTC)
	(envelope-from luke-lists@hybrid-logic.co.uk)
Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127])
	by mx1.freebsd.org (Postfix) with ESMTP id DC7FA8FC15
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 12:44:41 +0000 (UTC)
Received: from [127.0.0.1] (helo=youse)
	by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD))
	(envelope-from <luke-lists@hybrid-logic.co.uk>)
	id 1S0uVm-000InR-Ko; Fri, 24 Feb 2012 12:44:40 +0000
From: Luke Marsden <luke-lists@hybrid-logic.co.uk>
To: Tom Evans <tevans.uk@googlemail.com>
In-Reply-To: <CAFHbX1KPW+4h2-LHE9rB0aVRqw+AzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com>
References: <1330081612.13430.39.camel@pow>
	<CAFHbX1KPW+4h2-LHE9rB0aVRqw+AzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 24 Feb 2012 12:44:30 +0000
Message-ID: <1330087470.13430.61.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Spam-bar: /
Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk
Subject: Re: Another ZFS ARC memory question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 12:44:42 -0000

On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> <luke-lists@hybrid-logic.co.uk> wrote:
> > Hi all,
> >
> > Just wanted to get your opinion on best practices for ZFS.
> >
> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> > but have been having trouble with short spikes in application memory
> > usage resulting in huge amounts of swapping, bringing the whole machine
> > to its knees and crashing it hard.  I suspect this is because when there
> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> > to free system memory fast enough.
> >
> > This most recently happened yesterday as you can see from the following
> > munin graphs:
> >
> > E.g. http://hybrid-logic.co.uk/memory-day.png
> >     http://hybrid-logic.co.uk/swap-day.png
> >
> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> > production machines - trading performance for stability is fine with me
> > (and we have L2ARC on SSD so we still get good levels of caching).
> >
> > My questions are:
> >
> >      * is this a known problem?
> >      * what is the community's advice for production machines running
> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >        that there's enough actually free memory to handle a spike in
> >        application memory usage) the best solution to this
> >        spike-in-memory-means-crash problem?
> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> >      * rather than setting a hard limit on the ARC cache size, is it
> >        possible to adjust the auto-tuning variables to leave more free
> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> >        make arc eat 80% of memory instead of ~95% like it is at
> >        present?
> >      * could the arc reclaim thread be made to drop ARC pages with
> >        higher priority before the system starts swapping out
> >        application pages?
> >
> > Thank you for any/all answers, and thank you for making FreeBSD
> > awesome :-)
> 
> It's not a problem, it's a feature!
> 
> By default the ARC will attempt to cache as much as it can - it
> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> The solution, as you've found out, is to limit how much ARC can take
> up.
> 
> In practice, you should be doing this anyway. You should know, or have
> an idea, of how much RAM is required for the applications on that box,
> and you need to limit ZFS to not eat into that required RAM.

Thanks for your reply, Tom!  I agree that the ARC cache is a great
feature, but for a general purpose filesystem it does seem like a
reasonable expectation that filesystem cache will be evicted before
application data is swapped, even if the spike in memory usage is rather
aggressive.  A complete server crash in this scenario is rather
unfortunate.

My question stands - is this an area which has been improved on in the
ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
standard practice to guess how much memory the applications running on
the server might need and set the arc_max boot.loader tweak
appropriately?  This is reasonably tricky when providing general purpose
web application hosting and so we'll often end up erring on the side of
caution and leaving lots of RAM free "just in case".

If the latter is indeed the case in the latest stable releases then I
would like to update http://wiki.freebsd.org/ZFSTuningGuide which
currently states:

        FreeBSD 7.2+ has improved kernel memory allocation strategy and
        no tuning may be necessary on systems with more than 2 GB of
        RAM.

Thank you!

Best Regards,
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 



From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 12:51:16 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C1A9F1065670
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 12:51:16 +0000 (UTC)
	(envelope-from tevans.uk@googlemail.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 724B38FC16
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 12:51:16 +0000 (UTC)
Received: by vcge1 with SMTP id e1so64238vcg.13
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 04:51:15 -0800 (PST)
Received-SPF: pass (google.com: domain of tevans.uk@googlemail.com designates
	10.52.27.99 as permitted sender) client-ip=10.52.27.99; 
Authentication-Results: mr.google.com; spf=pass (google.com: domain of
	tevans.uk@googlemail.com designates 10.52.27.99 as permitted
	sender) smtp.mail=tevans.uk@googlemail.com;
	dkim=pass header.i=tevans.uk@googlemail.com
Received: from mr.google.com ([10.52.27.99])
	by 10.52.27.99 with SMTP id s3mr1024903vdg.121.1330087875946 (num_hops
	= 1); Fri, 24 Feb 2012 04:51:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=2NW5fOcjjAb+cWWaIW+p+keCsFgDdnBAusvA+yIidiE=;
	b=DkROjfiz3did87s/fCHlXls2Z5hXV9+MIKVW77yaoHPtvTyuA9RJYQv3DU/AAfG2IZ
	aT3+XgZ3KFaFqlif4LhW+R+wNBHFsXETY1r4j4RRKIRGiuEjgr8tjP4LN27q3y9lPDxq
	MhFcvUQQlJ05ld33c0frrG8ySl/FT6IQSiEb8=
MIME-Version: 1.0
Received: by 10.52.27.99 with SMTP id s3mr766254vdg.121.1330086098209; Fri, 24
	Feb 2012 04:21:38 -0800 (PST)
Received: by 10.52.91.210 with HTTP; Fri, 24 Feb 2012 04:21:38 -0800 (PST)
In-Reply-To: <1330081612.13430.39.camel@pow>
References: <1330081612.13430.39.camel@pow>
Date: Fri, 24 Feb 2012 12:21:38 +0000
Message-ID: <CAFHbX1KPW+4h2-LHE9rB0aVRqw+AzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com>
From: Tom Evans <tevans.uk@googlemail.com>
To: Luke Marsden <luke-lists@hybrid-logic.co.uk>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk
Subject: Re: Another ZFS ARC memory question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 12:51:16 -0000

On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
<luke-lists@hybrid-logic.co.uk> wrote:
> Hi all,
>
> Just wanted to get your opinion on best practices for ZFS.
>
> We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> but have been having trouble with short spikes in application memory
> usage resulting in huge amounts of swapping, bringing the whole machine
> to its knees and crashing it hard. =C2=A0I suspect this is because when t=
here
> is a sudden spike in memory usage the zfs arc reclaim thread is unable
> to free system memory fast enough.
>
> This most recently happened yesterday as you can see from the following
> munin graphs:
>
> E.g. http://hybrid-logic.co.uk/memory-day.png
> =C2=A0 =C2=A0 http://hybrid-logic.co.uk/swap-day.png
>
> Our response has been to start limiting the ZFS ARC cache to 4GB on our
> production machines - trading performance for stability is fine with me
> (and we have L2ARC on SSD so we still get good levels of caching).
>
> My questions are:
>
> =C2=A0 =C2=A0 =C2=A0* is this a known problem?
> =C2=A0 =C2=A0 =C2=A0* what is the community's advice for production machi=
nes running
> =C2=A0 =C2=A0 =C2=A0 =C2=A0ZFS on FreeBSD, is manually limiting the ARC c=
ache (to ensure
> =C2=A0 =C2=A0 =C2=A0 =C2=A0that there's enough actually free memory to ha=
ndle a spike in
> =C2=A0 =C2=A0 =C2=A0 =C2=A0application memory usage) the best solution to=
 this
> =C2=A0 =C2=A0 =C2=A0 =C2=A0spike-in-memory-means-crash problem?
> =C2=A0 =C2=A0 =C2=A0* has FreeBSD 9.0 / ZFS v28 solved this problem?
> =C2=A0 =C2=A0 =C2=A0* rather than setting a hard limit on the ARC cache s=
ize, is it
> =C2=A0 =C2=A0 =C2=A0 =C2=A0possible to adjust the auto-tuning variables t=
o leave more free
> =C2=A0 =C2=A0 =C2=A0 =C2=A0memory for spiky memory situations? =C2=A0e.g.=
 set the auto-tuning to
> =C2=A0 =C2=A0 =C2=A0 =C2=A0make arc eat 80% of memory instead of ~95% lik=
e it is at
> =C2=A0 =C2=A0 =C2=A0 =C2=A0present?
> =C2=A0 =C2=A0 =C2=A0* could the arc reclaim thread be made to drop ARC pa=
ges with
> =C2=A0 =C2=A0 =C2=A0 =C2=A0higher priority before the system starts swapp=
ing out
> =C2=A0 =C2=A0 =C2=A0 =C2=A0application pages?
>
> Thank you for any/all answers, and thank you for making FreeBSD
> awesome :-)

It's not a problem, it's a feature!

By default the ARC will attempt to cache as much as it can - it
assumes the box is a ZFS filer, and doesn't need RAM for applications.
The solution, as you've found out, is to limit how much ARC can take
up.

In practice, you should be doing this anyway. You should know, or have
an idea, of how much RAM is required for the applications on that box,
and you need to limit ZFS to not eat into that required RAM.

Cheers

Tom

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 12:59:02 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A80A5106566B
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 12:59:02 +0000 (UTC)
	(envelope-from tevans.uk@googlemail.com)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com
	[209.85.212.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 515358FC15
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 12:59:02 +0000 (UTC)
Received: by vbbfa15 with SMTP id fa15so2199606vbb.13
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 04:59:01 -0800 (PST)
Received-SPF: pass (google.com: domain of tevans.uk@googlemail.com designates
	10.52.20.201 as permitted sender) client-ip=10.52.20.201; 
Authentication-Results: mr.google.com; spf=pass (google.com: domain of
	tevans.uk@googlemail.com designates 10.52.20.201 as permitted
	sender) smtp.mail=tevans.uk@googlemail.com;
	dkim=pass header.i=tevans.uk@googlemail.com
Received: from mr.google.com ([10.52.20.201])
	by 10.52.20.201 with SMTP id p9mr1057169vde.87.1330088341628 (num_hops
	= 1); Fri, 24 Feb 2012 04:59:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=D/OfY5xw7DHqSwWR+2hY29Gb9PuM+0qjirBb5eVbgZ0=;
	b=BKI15w/5xse5a1vhqTkF/9fqHeo8mvZgioCyUHWI4A50tKYcKjmBxnWr9HAaLPTBhc
	40dpP74xTKY3LGPoGjduTYNbRsz5c6eMOglLzZOl1CbeXPyMZ/EGom2suwvhHlJLkAJ3
	faAuReg1fJfOqk/n8ydXEEgZnTVrp5ocQdyR8=
MIME-Version: 1.0
Received: by 10.52.20.201 with SMTP id p9mr835212vde.87.1330088341287; Fri, 24
	Feb 2012 04:59:01 -0800 (PST)
Received: by 10.52.91.210 with HTTP; Fri, 24 Feb 2012 04:59:01 -0800 (PST)
In-Reply-To: <1330087470.13430.61.camel@pow>
References: <1330081612.13430.39.camel@pow>
	<CAFHbX1KPW+4h2-LHE9rB0aVRqw+AzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com>
	<1330087470.13430.61.camel@pow>
Date: Fri, 24 Feb 2012 12:59:01 +0000
Message-ID: <CAFHbX1JA9HdF59_NAXzy3R+ZGN9CFrTWcbYq4ajBjvD_WTBTwA@mail.gmail.com>
From: Tom Evans <tevans.uk@googlemail.com>
To: Luke Marsden <luke-lists@hybrid-logic.co.uk>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk
Subject: Re: Another ZFS ARC memory question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 12:59:02 -0000

On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
<luke-lists@hybrid-logic.co.uk> wrote:
> On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
>> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
>> <luke-lists@hybrid-logic.co.uk> wrote:
>> > Hi all,
>> >
>> > Just wanted to get your opinion on best practices for ZFS.
>> >
>> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
>> > but have been having trouble with short spikes in application memory
>> > usage resulting in huge amounts of swapping, bringing the whole machin=
e
>> > to its knees and crashing it hard. =C2=A0I suspect this is because whe=
n there
>> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
>> > to free system memory fast enough.
>> >
>> > This most recently happened yesterday as you can see from the followin=
g
>> > munin graphs:
>> >
>> > E.g. http://hybrid-logic.co.uk/memory-day.png
>> > =C2=A0 =C2=A0 http://hybrid-logic.co.uk/swap-day.png
>> >
>> > Our response has been to start limiting the ZFS ARC cache to 4GB on ou=
r
>> > production machines - trading performance for stability is fine with m=
e
>> > (and we have L2ARC on SSD so we still get good levels of caching).
>> >
>> > My questions are:
>> >
>> > =C2=A0 =C2=A0 =C2=A0* is this a known problem?
>> > =C2=A0 =C2=A0 =C2=A0* what is the community's advice for production ma=
chines running
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0ZFS on FreeBSD, is manually limiting the AR=
C cache (to ensure
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0that there's enough actually free memory to=
 handle a spike in
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0application memory usage) the best solution=
 to this
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0spike-in-memory-means-crash problem?
>> > =C2=A0 =C2=A0 =C2=A0* has FreeBSD 9.0 / ZFS v28 solved this problem?
>> > =C2=A0 =C2=A0 =C2=A0* rather than setting a hard limit on the ARC cach=
e size, is it
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0possible to adjust the auto-tuning variable=
s to leave more free
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0memory for spiky memory situations? =C2=A0e=
.g. set the auto-tuning to
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0make arc eat 80% of memory instead of ~95% =
like it is at
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0present?
>> > =C2=A0 =C2=A0 =C2=A0* could the arc reclaim thread be made to drop ARC=
 pages with
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0higher priority before the system starts sw=
apping out
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0application pages?
>> >
>> > Thank you for any/all answers, and thank you for making FreeBSD
>> > awesome :-)
>>
>> It's not a problem, it's a feature!
>>
>> By default the ARC will attempt to cache as much as it can - it
>> assumes the box is a ZFS filer, and doesn't need RAM for applications.
>> The solution, as you've found out, is to limit how much ARC can take
>> up.
>>
>> In practice, you should be doing this anyway. You should know, or have
>> an idea, of how much RAM is required for the applications on that box,
>> and you need to limit ZFS to not eat into that required RAM.
>
> Thanks for your reply, Tom! =C2=A0I agree that the ARC cache is a great
> feature, but for a general purpose filesystem it does seem like a
> reasonable expectation that filesystem cache will be evicted before
> application data is swapped, even if the spike in memory usage is rather
> aggressive. =C2=A0A complete server crash in this scenario is rather
> unfortunate.
>
> My question stands - is this an area which has been improved on in the
> ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> standard practice to guess how much memory the applications running on
> the server might need and set the arc_max boot.loader tweak
> appropriately? =C2=A0This is reasonably tricky when providing general pur=
pose
> web application hosting and so we'll often end up erring on the side of
> caution and leaving lots of RAM free "just in case".
>
> If the latter is indeed the case in the latest stable releases then I
> would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> currently states:
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0FreeBSD 7.2+ has improved kernel memory alloca=
tion strategy and
> =C2=A0 =C2=A0 =C2=A0 =C2=A0no tuning may be necessary on systems with mor=
e than 2 GB of
> =C2=A0 =C2=A0 =C2=A0 =C2=A0RAM.
>
> Thank you!
>
> Best Regards,
> Luke Marsden
>

Hmm. That comment is really talking about that you no longer need to
tune vm.kmem_size.

I get what you are saying about applications suddenly using a lot of
RAM should not cause the server to fall over. Do you know why it fell
over? IE, was it a panic, a deadlock, etc.

FreeBSD does not cope well when you have used up all RAM and swap
(well, what does?), and from your graphs it does look like the ARC is
not super massive when you had the problem - around 30-40% of RAM?

Cheers

Tom

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 13:42:25 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8739F1065670
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 13:42:25 +0000 (UTC)
	(envelope-from luke-lists@hybrid-logic.co.uk)
Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127])
	by mx1.freebsd.org (Postfix) with ESMTP id 2F6D38FC18
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 13:42:24 +0000 (UTC)
Received: from [127.0.0.1] (helo=youse)
	by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD))
	(envelope-from <luke-lists@hybrid-logic.co.uk>)
	id 1S0vPc-0009FP-56; Fri, 24 Feb 2012 13:42:22 +0000
From: Luke Marsden <luke-lists@hybrid-logic.co.uk>
To: Tom Evans <tevans.uk@googlemail.com>
In-Reply-To: <CAFHbX1JA9HdF59_NAXzy3R+ZGN9CFrTWcbYq4ajBjvD_WTBTwA@mail.gmail.com>
References: <1330081612.13430.39.camel@pow>
	<CAFHbX1KPW+4h2-LHE9rB0aVRqw+AzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com>
	<1330087470.13430.61.camel@pow>
	<CAFHbX1JA9HdF59_NAXzy3R+ZGN9CFrTWcbYq4ajBjvD_WTBTwA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 24 Feb 2012 13:42:14 +0000
Message-ID: <1330090934.13430.90.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Spam-bar: /
Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk
Subject: Re: Another ZFS ARC memory question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 13:42:25 -0000

On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote:
> On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
> <luke-lists@hybrid-logic.co.uk> wrote:
> > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> >> <luke-lists@hybrid-logic.co.uk> wrote:
> >> > Hi all,
> >> >
> >> > Just wanted to get your opinion on best practices for ZFS.
> >> >
> >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> >> > but have been having trouble with short spikes in application memory
> >> > usage resulting in huge amounts of swapping, bringing the whole machine
> >> > to its knees and crashing it hard.  I suspect this is because when there
> >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> >> > to free system memory fast enough.
> >> >
> >> > This most recently happened yesterday as you can see from the following
> >> > munin graphs:
> >> >
> >> > E.g. http://hybrid-logic.co.uk/memory-day.png
> >> >     http://hybrid-logic.co.uk/swap-day.png
> >> >
> >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> >> > production machines - trading performance for stability is fine with me
> >> > (and we have L2ARC on SSD so we still get good levels of caching).
> >> >
> >> > My questions are:
> >> >
> >> >      * is this a known problem?
> >> >      * what is the community's advice for production machines running
> >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >> >        that there's enough actually free memory to handle a spike in
> >> >        application memory usage) the best solution to this
> >> >        spike-in-memory-means-crash problem?
> >> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> >> >      * rather than setting a hard limit on the ARC cache size, is it
> >> >        possible to adjust the auto-tuning variables to leave more free
> >> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> >> >        make arc eat 80% of memory instead of ~95% like it is at
> >> >        present?
> >> >      * could the arc reclaim thread be made to drop ARC pages with
> >> >        higher priority before the system starts swapping out
> >> >        application pages?
> >> >
> >> > Thank you for any/all answers, and thank you for making FreeBSD
> >> > awesome :-)
> >>
> >> It's not a problem, it's a feature!
> >>
> >> By default the ARC will attempt to cache as much as it can - it
> >> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> >> The solution, as you've found out, is to limit how much ARC can take
> >> up.
> >>
> >> In practice, you should be doing this anyway. You should know, or have
> >> an idea, of how much RAM is required for the applications on that box,
> >> and you need to limit ZFS to not eat into that required RAM.
> >
> > Thanks for your reply, Tom!  I agree that the ARC cache is a great
> > feature, but for a general purpose filesystem it does seem like a
> > reasonable expectation that filesystem cache will be evicted before
> > application data is swapped, even if the spike in memory usage is rather
> > aggressive.  A complete server crash in this scenario is rather
> > unfortunate.
> >
> > My question stands - is this an area which has been improved on in the
> > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> > standard practice to guess how much memory the applications running on
> > the server might need and set the arc_max boot.loader tweak
> > appropriately?  This is reasonably tricky when providing general purpose
> > web application hosting and so we'll often end up erring on the side of
> > caution and leaving lots of RAM free "just in case".
> >
> > If the latter is indeed the case in the latest stable releases then I
> > would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> > currently states:
> >
> >        FreeBSD 7.2+ has improved kernel memory allocation strategy and
> >        no tuning may be necessary on systems with more than 2 GB of
> >        RAM.
> >
> > Thank you!
> >
> > Best Regards,
> > Luke Marsden
> >
> 
> Hmm. That comment is really talking about that you no longer need to
> tune vm.kmem_size.

http://wiki.freebsd.org/ZFSTuningGuide

"No tuning may be necessary" seems to indicate that no changes need to
be made to boot.loader.  I'm happy to provide a patch for the wiki which
makes it clearer that for servers which may experience sudden spikes in
application memory usage (i.e. all servers running user-supplied
applications), the speed of ARC eviction is insufficient to ensure
stability and arc_max should be tuned downwards.

> I get what you are saying about applications suddenly using a lot of
> RAM should not cause the server to fall over. Do you know why it fell
> over? IE, was it a panic, a deadlock, etc.

If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can
see a huge spike in swap at the point at which the last line of pixels
at http://hybrid-logic.co.uk/memory-day.png indicates the sudden
increase in memory usage (by 3GB in active memory usage if you look
closely).  Since the graph stops at that point it indicates that the
server became completely unresponsive (e.g. including munin probe
requests).  I did manage to log in just before it became completely
unresponsive, but at that point the incoming requests weren't being
serviced fast enough due to the excessive swapping and the server
eventually became completely unresponsive (e.g. 'top' output froze and
never came back).  It continued to respond to pings though and may have
eventually recovered if I had disabled inbound network traffic.  I don't
have any evidence of a panic or deadlock, we just hard rebooted the
machine about 15 minutes later after it failed to recover from the
swap-storm.

> FreeBSD does not cope well when you have used up all RAM and swap
> (well, what does?), and from your graphs it does look like the ARC is
> not super massive when you had the problem - around 30-40% of RAM?

The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes,
35%.  I guess what I'd like is for FreeBSD to detect an emergency
out-of-memory condition and aggressively drop much or all of the ARC
cache *before* swapping out application memory which causes the system
to grind to a halt.

Is this a reasonable request, and is there anything I can do to help
implement it?

If not can we update the wiki to make it clearer that ARC limiting is
necessary, even with high RAM boxes, to ensure stability under spiky
memory conditions?

Thanks!

Best Regards,
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 


From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 24 18:04:00 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 10750106566C
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 18:04:00 +0000 (UTC)
	(envelope-from ian@ndwns.net)
Received: from smtpauth.rollernet.us (smtpauth.rollernet.us
	[IPv6:2607:fe70:0:3::d])
	by mx1.freebsd.org (Postfix) with ESMTP id AD7648FC12
	for <freebsd-fs@freebsd.org>; Fri, 24 Feb 2012 18:03:59 +0000 (UTC)
Received: from smtpauth.rollernet.us (localhost [127.0.0.1])
	by smtpauth.rollernet.us (Postfix) with ESMTP id B9040594002;
	Fri, 24 Feb 2012 10:03:29 -0800 (PST)
Received: from localhost (c-76-126-116-195.hsd1.ca.comcast.net
	[76.126.116.195])
	(using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits))
	(No client certificate requested)
	by smtpauth.rollernet.us (Postfix) with ESMTPSA;
	Fri, 24 Feb 2012 10:03:28 -0800 (PST)
Date: Fri, 24 Feb 2012 10:03:46 -0800
From: Ian Downes <ian@ndwns.net>
To: Luke Marsden <luke-lists@hybrid-logic.co.uk>
Message-ID: <20120224180346.GA83845@weta.local>
References: <1330081612.13430.39.camel@pow>
	<CAFHbX1KPW+4h2-LHE9rB0aVRqw+AzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com>
	<1330087470.13430.61.camel@pow>
	<CAFHbX1JA9HdF59_NAXzy3R+ZGN9CFrTWcbYq4ajBjvD_WTBTwA@mail.gmail.com>
	<1330090934.13430.90.camel@pow>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1330090934.13430.90.camel@pow>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Rollernet-Abuse: Processed by Roller Network Mail Services. Contact
	abuse@rollernet.us to report violations. Abuse policy:
	http://www.rollernet.us/policy
X-Rollernet-Submit: Submit ID 5a5e.4f47d0f0.6ea62.0
Cc: Tom Evans <tevans.uk@googlemail.com>, freebsd-fs@freebsd.org,
	team@hybrid-logic.co.uk
Subject: Re: Another ZFS ARC memory question
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2012 18:04:00 -0000

On Fri, Feb 24, 2012 at 01:42:14PM +0000, Luke Marsden wrote:
> On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote:
> > On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
> > <luke-lists@hybrid-logic.co.uk> wrote:
> > > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> > >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> > >> <luke-lists@hybrid-logic.co.uk> wrote:
> > >> > Hi all,
> > >> >
> > >> > Just wanted to get your opinion on best practices for ZFS.
> > >> >
> > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> > >> > but have been having trouble with short spikes in application memory
> > >> > usage resulting in huge amounts of swapping, bringing the whole machine
> > >> > to its knees and crashing it hard.  I suspect this is because when there
> > >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> > >> > to free system memory fast enough.
> > >> >
> > >> > This most recently happened yesterday as you can see from the following
> > >> > munin graphs:
> > >> >
> > >> > E.g. http://hybrid-logic.co.uk/memory-day.png
> > >> >     http://hybrid-logic.co.uk/swap-day.png
> > >> >
> > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> > >> > production machines - trading performance for stability is fine with me
> > >> > (and we have L2ARC on SSD so we still get good levels of caching).
> > >> >
> > >> > My questions are:
> > >> >
> > >> >      * is this a known problem?
> > >> >      * what is the community's advice for production machines running
> > >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> > >> >        that there's enough actually free memory to handle a spike in
> > >> >        application memory usage) the best solution to this
> > >> >        spike-in-memory-means-crash problem?
> > >> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> > >> >      * rather than setting a hard limit on the ARC cache size, is it
> > >> >        possible to adjust the auto-tuning variables to leave more free
> > >> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> > >> >        make arc eat 80% of memory instead of ~95% like it is at
> > >> >        present?
> > >> >      * could the arc reclaim thread be made to drop ARC pages with
> > >> >        higher priority before the system starts swapping out
> > >> >        application pages?
> > >> >
> > >> > Thank you for any/all answers, and thank you for making FreeBSD
> > >> > awesome :-)
> > >>
> > >> It's not a problem, it's a feature!
> > >>
> > >> By default the ARC will attempt to cache as much as it can - it
> > >> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> > >> The solution, as you've found out, is to limit how much ARC can take
> > >> up.
> > >>
> > >> In practice, you should be doing this anyway. You should know, or have
> > >> an idea, of how much RAM is required for the applications on that box,
> > >> and you need to limit ZFS to not eat into that required RAM.
> > >
> > > Thanks for your reply, Tom!  I agree that the ARC cache is a great
> > > feature, but for a general purpose filesystem it does seem like a
> > > reasonable expectation that filesystem cache will be evicted before
> > > application data is swapped, even if the spike in memory usage is rather
> > > aggressive.  A complete server crash in this scenario is rather
> > > unfortunate.
> > >
> > > My question stands - is this an area which has been improved on in the
> > > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> > > standard practice to guess how much memory the applications running on
> > > the server might need and set the arc_max boot.loader tweak
> > > appropriately?  This is reasonably tricky when providing general purpose
> > > web application hosting and so we'll often end up erring on the side of
> > > caution and leaving lots of RAM free "just in case".
> > >
> > > If the latter is indeed the case in the latest stable releases then I
> > > would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> > > currently states:
> > >
> > >        FreeBSD 7.2+ has improved kernel memory allocation strategy and
> > >        no tuning may be necessary on systems with more than 2 GB of
> > >        RAM.
> > >
> > > Thank you!
> > >
> > > Best Regards,
> > > Luke Marsden
> > >
> > 
> > Hmm. That comment is really talking about that you no longer need to
> > tune vm.kmem_size.
> 
> http://wiki.freebsd.org/ZFSTuningGuide
> 
> "No tuning may be necessary" seems to indicate that no changes need to
> be made to boot.loader.  I'm happy to provide a patch for the wiki which
> makes it clearer that for servers which may experience sudden spikes in
> application memory usage (i.e. all servers running user-supplied
> applications), the speed of ARC eviction is insufficient to ensure
> stability and arc_max should be tuned downwards.
> 
> > I get what you are saying about applications suddenly using a lot of
> > RAM should not cause the server to fall over. Do you know why it fell
> > over? IE, was it a panic, a deadlock, etc.
> 
> If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can
> see a huge spike in swap at the point at which the last line of pixels
> at http://hybrid-logic.co.uk/memory-day.png indicates the sudden
> increase in memory usage (by 3GB in active memory usage if you look
> closely).  Since the graph stops at that point it indicates that the
> server became completely unresponsive (e.g. including munin probe
> requests).  I did manage to log in just before it became completely
> unresponsive, but at that point the incoming requests weren't being
> serviced fast enough due to the excessive swapping and the server
> eventually became completely unresponsive (e.g. 'top' output froze and
> never came back).  It continued to respond to pings though and may have
> eventually recovered if I had disabled inbound network traffic.  I don't
> have any evidence of a panic or deadlock, we just hard rebooted the
> machine about 15 minutes later after it failed to recover from the
> swap-storm.
> 
> > FreeBSD does not cope well when you have used up all RAM and swap
> > (well, what does?), and from your graphs it does look like the ARC is
> > not super massive when you had the problem - around 30-40% of RAM?
> 
> The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes,
> 35%.  I guess what I'd like is for FreeBSD to detect an emergency
> out-of-memory condition and aggressively drop much or all of the ARC
> cache *before* swapping out application memory which causes the system
> to grind to a halt.
> 
> Is this a reasonable request, and is there anything I can do to help
> implement it?
> 
> If not can we update the wiki to make it clearer that ARC limiting is
> necessary, even with high RAM boxes, to ensure stability under spiky
> memory conditions?
> 

Are you sure that it is the ARC data that is causing the issue? I've got
boxes where the ARC *meta* skyrockets and consumes all RAM, greatly
exceeding the arc_meta_limit. E.g. on a very unresponsive local box:

vfs.zfs.arc_meta_limit: 1610612736
vfs.zfs.arc_meta_used: 12183379056

Setting arc_max helps (and seems to be respected), but I don't know why
arc_meta_used exceeds arc_meta_limit.

Ian

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 04:37:15 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A0157106566C
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 04:37:15 +0000 (UTC)
	(envelope-from Kamil.Choudhury@anserinae.net)
Received: from hrndva-omtalb.mail.rr.com (hrndva-omtalb.mail.rr.com
	[71.74.56.122]) by mx1.freebsd.org (Postfix) with ESMTP id 604678FC0C
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 04:37:15 +0000 (UTC)
X-Authority-Analysis: v=2.0 cv=Z7xu7QtA c=1 sm=0 a=qe0RvMpo0P4Rp0DQO452oA==:17
	a=IYgu6Z7xpcEA:10 a=egyE7zw0hOcA:10 a=WWGGoYozHbgA:10
	a=kj9zAlcOel0A:10 a=xqWC_Br6kY4A:10 a=VDMU8vR1T1jl0vMf1V4A:9
	a=CjuIK1q_8ugA:10 a=qe0RvMpo0P4Rp0DQO452oA==:117
X-Cloudmark-Score: 0
X-Originating-IP: 68.173.236.44
Received: from [68.173.236.44] ([68.173.236.44:50651] helo=janus.anserinae.net)
	by hrndva-oedge02.mail.rr.com (envelope-from
	<Kamil.Choudhury@anserinae.net>) (ecelerity 2.2.3.46 r()) with ESMTP
	id B3/B6-04292-A75684F4; Sat, 25 Feb 2012 04:37:14 +0000
Received: from JANUS.anserinae.net ([fe80::192c:4b89:9fe9:dc6d]) by
	janus.anserinae.net ([fe80::192c:4b89:9fe9:dc6d%11]) with mapi;
	Fri, 24 Feb 2012 23:37:02 -0500
From: Kamil Choudhury <Kamil.Choudhury@anserinae.net>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Thread-Topic: Distributed, snapshotting, checksumming filesystems for FreeBSD
Thread-Index: Aczzc7P/LgC0HATgSXWQqxJMHVprPA==
Date: Sat, 25 Feb 2012 04:37:03 +0000
Message-ID: <3CEE2DA4348D944399A67E308B78D38A1A57CABA@janus.anserinae.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Distributed, snapshotting, checksumming filesystems for FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 04:37:15 -0000

The dream: a file system spread out over a variable,=20
ever increasing number of hosts, presenting a single=20
unified file system to any client host mounting the=20
file system.=20

>From the client's point of view, it is possible to=20
snapshot the directory view that is presented. The=20
client also has confidence that data written to the=20
file system will be returned exactly as it went in.=20

Now that I think about it, what I seem to be looking
for is a network aware ZFS that uses hosts as vdevs.=20

Is there such a thing out there?=20

Kamil



From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 08:42:13 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3B639106564A
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 08:42:13 +0000 (UTC)
	(envelope-from peter@pean.org)
Received: from system.jails.se (system.jails.se [IPv6:2001:16d8:cc1e:1::1])
	by mx1.freebsd.org (Postfix) with ESMTP id DEF708FC16
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 08:42:12 +0000 (UTC)
Received: from localhost (system.jails.se [91.205.63.85])
	by system.jails.se (Postfix) with SMTP id 9A3D321BB8B
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 09:42:10 +0100 (CET)
Received: from [172.25.0.25]
	(c-1105e155.166-7-64736c14.cust.bredbandsbolaget.se [85.225.5.17])
	(using TLSv1 with cipher AES128-SHA (128/128 bits))
	(No client certificate requested)
	by system.jails.se (Postfix) with ESMTPSA id CDCB521BB81
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 09:42:09 +0100 (CET)
From: =?iso-8859-1?Q?Peter_Ankerst=E5l?= <peter@pean.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Sat, 25 Feb 2012 09:42:08 +0100
Message-Id: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1251.1)
X-Mailer: Apple Mail (2.1251.1)
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Feb 25 09:42:10 2012
X-DSPAM-Confidence: 1.0000
X-DSPAM-Probability: 0.0023
X-DSPAM-Signature: 4f489ee226816799614642
X-DSPAM-Factors: 27, D, 0.40000, Received*cipher+AES128, 0.40000,
	should+use, 0.40000, Mime-Version*Message, 0.40000,
	Message-Id*490B+9574, 0.40000, in+conflict, 0.40000,
	disks+not, 0.40000, http+//lists, 0.40000,
	http+//lists, 0.40000, not+partitions!, 0.40000,
	Hi+Now, 0.40000, of, 0.40000, But, 0.40000, But, 0.40000,
	Received*2012, 0.40000, says, 0.40000,
	Subject*zfs+confusion., 0.40000, And+then, 0.40000,
	Subject*confusion., 0.40000,
	Received*client+certificate, 0.40000, he, 0.40000,
	to+like, 0.40000, X-Mailer*(2.1251.1), 0.40000, And, 0.40000,
	this+seems, 0.40000, Jason+doesn't, 0.40000, use, 0.40000
Subject: glabel, gpart and zfs confusion.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 08:42:13 -0000

Hi,

Now Im really confused.=20

I want in some way label my drives so the setup is independent of =
physical setup. But Jason doesn't
seem to like glabel at all. :D
http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013574.html

And then he says that you should use gpart instead
http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013578.html

But this seems to be in conflict with the common knowledge that zfs =
should
be used on whole disks, not partitions!

Any pointers?=20=


From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 09:04:17 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7478D106566B
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 09:04:17 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de
	[IPv6:2a01:238:20a:202:53f5::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 09A028FC0A
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 09:04:16 +0000 (UTC)
X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mufkcHTOex6w==
X-RZG-CLASS-ID: mo05
Received: from [192.168.179.39]
	(hmbg-5f766895.pool.mediaWays.net [95.118.104.149])
	by post.strato.de (mrclete mo2) (RZmta 27.7 DYNA|AUTH)
	with (DHE-RSA-AES128-SHA encrypted) ESMTPA id Z0524co1P8I98G
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 10:04:03 +0100 (MET)
Message-ID: <4F48A402.70009@brockmann-consult.de>
Date: Sat, 25 Feb 2012 10:04:02 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org>
In-Reply-To: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Subject: Re: glabel, gpart and zfs confusion.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 09:04:17 -0000

In Solaris, I've read that the IO system is designed such that a some
commands (eg. flush of a partition) does not necessarily flush the
disk's write cache... like the command can't move up the chain. So if
you put zfs on a partition, you can get data loss (eg. transaction
rollback required and probably no corruption).

In FreeBSD, things are different I am told, without the above
limitation. So you can happily put zfs on partitions, and the zfs code
can keep your data safe. I haven't had data loss with system panics
during sync writes with my ZIL on a partition, so I guess this must be true.

People say that glabel is buggy/a hack. But I haven't had any problems
myself. So they suggest using gpt to label your disks. I find that
sometimes your gpt labels get eaten though, and you end up with gptid in
your zpool status output. For labels to get eaten, you need to import
the pool elsewhere with -f usually. And maybe this only applies to the
root pool in most cases (but I definitely had one other case when it
happened to a different pool). There is something you can add to
/boot/loader.conf to get rid of the gptids... but I am hesitant to use
it... because what happens when you have 2 identical labels and gptid is
gone?

eg.

        NAME                                            STATE     READ
WRITE CKSUM
        zroot                                           DEGRADED    
0     0     0
          mirror-0                                      DEGRADED    
0     0     0
            gptid/bcc6c93a-f332-11e0-a5b6-0025900edbca  OFFLINE     
0     0     0
            gptid/4629fb4b-f596-11e0-a5b6-0025900edbca  OFFLINE     
0     0     0
            gpt/root2                                   ONLINE      
0     0     0
            gpt/root3                                   ONLINE      
0     0     0

And also if a whole disk goes bad, and you try to replace it with
another whole disk that is 1 byte smaller, it won't allow you to do
that. So if you use gpart and create a slightly smaller partition, you
get the advantage of being able to replace disks with smaller ones later.

For new systems, I am using gpt labels. And if the gptid thing appears,
I just ignore it.


Am 25.02.2012 09:42, schrieb Peter Ankerst�l:
> Hi,
>
> Now Im really confused. 
>
> I want in some way label my drives so the setup is independent of physical setup. But Jason doesn't
> seem to like glabel at all. :D
> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013574.html
>
> And then he says that you should use gpart instead
> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013578.html
>
> But this seems to be in conflict with the common knowledge that zfs should
> be used on whole disks, not partitions!
>
> Any pointers? 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 13:31:02 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AED63106564A
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 13:31:02 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de
	[IPv6:2a01:238:20a:202:53f5::1])
	by mx1.freebsd.org (Postfix) with ESMTP id EB6C98FC0A
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 13:31:01 +0000 (UTC)
X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mufkcHTOex6w==
X-RZG-CLASS-ID: mo05
Received: from [192.168.179.39]
	(hmbg-5f766895.pool.mediaWays.net [95.118.104.149])
	by smtp.strato.de (klopstock mo30) (RZmta 27.7 DYNA|AUTH)
	with (DHE-RSA-AES128-SHA encrypted) ESMTPA id n00a43o1PBfvkc
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 14:30:56 +0100 (MET)
Message-ID: <4F48E28F.9090600@brockmann-consult.de>
Date: Sat, 25 Feb 2012 14:30:55 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org>
	<4F48A402.70009@brockmann-consult.de>
In-Reply-To: <4F48A402.70009@brockmann-consult.de>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: glabel, gpart and zfs confusion.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 13:31:02 -0000

And btw. related but not an answer to your question...

>From the thread you mentioned:/

/>/ # zpool attach tank label/m00-d00 label/m00-d01
/>/ cannot use '/dev/label/m00-d01': must be a GEOM provider or regular file
/>/ 
/>/ # glabel label m00-d01 /dev/da2s3
/>/ glabel: Can't store metadata on /dev/da2s3: Invalid argument.
/>/ 
/>/ # sysctl kern.geom.debugflags=17
/>/ kern.geom.debugflags: 0 -> 17
/>/ 
/>/ # dd if=/dev/zero of=/dev/da2s3
/>/ dd: /dev/da2s3: Invalid argument
/
My guess is that if you exported the pool, the "Invalid argument" errors would go away.
/
/



Am 25.02.2012 10:04, schrieb Peter Maloney:
> In Solaris, I've read that the IO system is designed such that a some
> commands (eg. flush of a partition) does not necessarily flush the
> disk's write cache... like the command can't move up the chain. So if
> you put zfs on a partition, you can get data loss (eg. transaction
> rollback required and probably no corruption).
>
> In FreeBSD, things are different I am told, without the above
> limitation. So you can happily put zfs on partitions, and the zfs code
> can keep your data safe. I haven't had data loss with system panics
> during sync writes with my ZIL on a partition, so I guess this must be true.
>
> People say that glabel is buggy/a hack. But I haven't had any problems
> myself. So they suggest using gpt to label your disks. I find that
> sometimes your gpt labels get eaten though, and you end up with gptid in
> your zpool status output. For labels to get eaten, you need to import
> the pool elsewhere with -f usually. And maybe this only applies to the
> root pool in most cases (but I definitely had one other case when it
> happened to a different pool). There is something you can add to
> /boot/loader.conf to get rid of the gptids... but I am hesitant to use
> it... because what happens when you have 2 identical labels and gptid is
> gone?
>
> eg.
>
>         NAME                                            STATE     READ
> WRITE CKSUM
>         zroot                                           DEGRADED    
> 0     0     0
>           mirror-0                                      DEGRADED    
> 0     0     0
>             gptid/bcc6c93a-f332-11e0-a5b6-0025900edbca  OFFLINE     
> 0     0     0
>             gptid/4629fb4b-f596-11e0-a5b6-0025900edbca  OFFLINE     
> 0     0     0
>             gpt/root2                                   ONLINE      
> 0     0     0
>             gpt/root3                                   ONLINE      
> 0     0     0
>
> And also if a whole disk goes bad, and you try to replace it with
> another whole disk that is 1 byte smaller, it won't allow you to do
> that. So if you use gpart and create a slightly smaller partition, you
> get the advantage of being able to replace disks with smaller ones later.
>
> For new systems, I am using gpt labels. And if the gptid thing appears,
> I just ignore it.
>
>
> Am 25.02.2012 09:42, schrieb Peter Ankerst�l:
>> Hi,
>>
>> Now Im really confused. 
>>
>> I want in some way label my drives so the setup is independent of physical setup. But Jason doesn't
>> seem to like glabel at all. :D
>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013574.html
>>
>> And then he says that you should use gpart instead
>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013578.html
>>
>> But this seems to be in conflict with the common knowledge that zfs should
>> be used on whole disks, not partitions!
>>
>> Any pointers? 
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 15:24:41 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 32850106564A;
	Sat, 25 Feb 2012 15:24:41 +0000 (UTC)
	(envelope-from eadler@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 064CB8FC12;
	Sat, 25 Feb 2012 15:24:41 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1PFOeod089705;
	Sat, 25 Feb 2012 15:24:40 GMT
	(envelope-from eadler@freefall.freebsd.org)
Received: (from eadler@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1PFOeRs089701;
	Sat, 25 Feb 2012 15:24:40 GMT (envelope-from eadler)
Date: Sat, 25 Feb 2012 15:24:40 GMT
Message-Id: <201202251524.q1PFOeRs089701@freefall.freebsd.org>
To: eadler@FreeBSD.org, eadler@FreeBSD.org, freebsd-fs@FreeBSD.org
From: eadler@FreeBSD.org
Cc: 
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 15:24:41 -0000

Synopsis: Multiple mkdir/rmdir fails with errno 31

Responsible-Changed-From-To: eadler->freebsd-fs
Responsible-Changed-By: eadler
Responsible-Changed-When: Sat Feb 25 15:24:40 UTC 2012
Responsible-Changed-Why: 
I'm not going to have time to look into this soon enough

http://www.freebsd.org/cgi/query-pr.cgi?pr=165392

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 15:56:57 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 38C30106566C
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 15:56:57 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id ECBE98FC08
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 15:56:56 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	q1PFuqn6015039; Sat, 25 Feb 2012 09:56:52 -0600 (CST)
Date: Sat, 25 Feb 2012 09:56:52 -0600 (CST)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Peter Maloney <peter.maloney@brockmann-consult.de>
In-Reply-To: <4F48A402.70009@brockmann-consult.de>
Message-ID: <alpine.GSO.2.01.1202250943480.6378@freddy.simplesystems.org>
References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org>
	<4F48A402.70009@brockmann-consult.de>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Sat, 25 Feb 2012 09:56:52 -0600 (CST)
Cc: freebsd-fs@freebsd.org
Subject: Re: glabel, gpart and zfs confusion.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 15:56:57 -0000

On Sat, 25 Feb 2012, Peter Maloney wrote:

> In Solaris, I've read that the IO system is designed such that a some
> commands (eg. flush of a partition) does not necessarily flush the
> disk's write cache... like the command can't move up the chain. So if
> you put zfs on a partition, you can get data loss (eg. transaction
> rollback required and probably no corruption).

I wonder where you read that since it seems like bad information?  In 
Solaris, if zfs uses a partition (rather than the whole disk), the 
disk write cache is not enabled by default due to the possibility that 
some other partition uses a legacy filesystem like UFS, which could 
become inconsistent and corrupted if the write cache is enabled.  The 
drawback then becomes that zfs writes are likely to incur more 
latency.

> In FreeBSD, things are different I am told, without the above
> limitation. So you can happily put zfs on partitions, and the zfs code
> can keep your data safe. I haven't had data loss with system panics
> during sync writes with my ZIL on a partition, so I guess this must be true.

It seems unlikely that FreeBSD zfs is somehow "safer" than Solaris 
zfs.  Both rely on a disk cache flush request to write buffered data 
to disk.  Synchronous writes necessarily require that the zil (zfs 
intent log) be flushed to disk before write returns success to the 
user.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 18:30:16 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D6C081065670
	for <freebsd-fs@hub.freebsd.org>; Sat, 25 Feb 2012 18:30:16 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id A7CE88FC19
	for <freebsd-fs@hub.freebsd.org>; Sat, 25 Feb 2012 18:30:16 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1PIUGLd056189
	for <freebsd-fs@freefall.freebsd.org>; Sat, 25 Feb 2012 18:30:16 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1PIUGV5056188;
	Sat, 25 Feb 2012 18:30:16 GMT (envelope-from gnats)
Date: Sat, 25 Feb 2012 18:30:16 GMT
Message-Id: <201202251830.q1PIUGV5056188@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Jilles Tjoelker <jilles@stack.nl>
Cc: 
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Jilles Tjoelker <jilles@stack.nl>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 18:30:16 -0000

The following reply was made to PR kern/165392; it has been noted by GNATS.

From: Jilles Tjoelker <jilles@stack.nl>
To: bug-followup@FreeBSD.org, vvv@colocall.net
Cc:  
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Sat, 25 Feb 2012 19:27:02 +0100

 > [mkdir fails with [EMLINK], but link count < LINK_MAX]
 
 I can reproduce this problem with UFS with soft updates (with or without
 journaling).
 
 A reproduction without C programs is:
 
 cd empty_dir
 mkdir `jot 32766 1`     # the last one will fail (correctly)
 rmdir 1
 mkdir a                 # will erroneously fail
 
 The problem appears to be because the previous rmdir has not yet been
 fully completed. It is still holding onto the link count until the
 directory is written, which may take up to two minutes.
 
 The same problem can occur with other calls that increase the link count
 such as link() and rename().
 
 A workaround is to call fsync() on the directory that contained the
 deleted entries. It will then release its hold on the link count and
 allow mkdir or other calls. If fsync() is only called when [EMLINK] is
 returned, the performance impact should not be very bad, although it
 still causes more I/O than necessary.
 
 The book "The Design and Implementation of the FreeBSD Operating System"
 contains a detailed description of soft updates in section 8.6 Soft
 Updates. The subsection "File Removal Requirements for Soft Updates"
 appears particularly relevant to this problem.
 
 A possible solution is to check for the problematic situation
 (i_effnlink < LINK_MAX && i_nlink >= LINK_MAX) and if so synchronously
 write one or more deleted directory entries that pointed to the inode
 with the link count problem. After that, i_nlink should be less than
 LINK_MAX and the link count can be checked again (depending on whether
 locks need to be dropped to do the write, it may or may not be possible
 for another thread to use up the last link first).
 
 For mkdir() and rename(), the directory that contains the deleted
 entries is obvious (the directory that will contain the new directory)
 while for link() it can (in the general case) only be found in soft
 updates data structures. Soft updates must track this because (if the
 link count became 0) it will not clear the inode before all directory
 entries that pointed to it have been written.
 
 Simply replacing the i_nlink < LINK_MAX check with i_effnlink < LINK_MAX
 is unsafe because it will lead to overflow of the 16-bit signed i_nlink
 field. If the field is made larger, I don't see how it is prevented that
 the code commits such a set of changes that an inode on disk has more
 than LINK_MAX links for some time (for example if a file in the new
 directory is fsynced while the old directory entries are still on the
 disk).
 
 -- 
 Jilles Tjoelker

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 25 21:43:34 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D77A3106564A
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 21:43:34 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de
	[IPv6:2a01:238:20a:202:53f5::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 3C5898FC1B
	for <freebsd-fs@freebsd.org>; Sat, 25 Feb 2012 21:43:34 +0000 (UTC)
X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mufkcHTOex6w==
X-RZG-CLASS-ID: mo05
Received: from [192.168.179.39]
	(hmbg-5f766895.pool.mediaWays.net [95.118.104.149])
	by smtp.strato.de (jimi mo42) (RZmta 27.7 DYNA|AUTH)
	with (DHE-RSA-AES256-SHA encrypted) ESMTPA id J02071o1PKvObr ;
	Sat, 25 Feb 2012 22:43:25 +0100 (MET)
Message-ID: <4F4955FC.6040308@brockmann-consult.de>
Date: Sat, 25 Feb 2012 22:43:24 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
References: <3E3E4094-77E2-490B-9574-5B95ECDED447@pean.org>
	<4F48A402.70009@brockmann-consult.de>
	<alpine.GSO.2.01.1202250943480.6378@freddy.simplesystems.org>
In-Reply-To: <alpine.GSO.2.01.1202250943480.6378@freddy.simplesystems.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: glabel, gpart and zfs confusion.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2012 21:43:34 -0000

Am 25.02.2012 16:56, schrieb Bob Friesenhahn:
> On Sat, 25 Feb 2012, Peter Maloney wrote:
>
>> In Solaris, I've read that the IO system is designed such that a some
>> commands (eg. flush of a partition) does not necessarily flush the
>> disk's write cache... like the command can't move up the chain. So if
>> you put zfs on a partition, you can get data loss (eg. transaction
>> rollback required and probably no corruption).
>
> I wonder where you read that since it seems like bad information?  In
> Solaris, if zfs uses a partition (rather than the whole disk), the
> disk write cache is not enabled by default due to the possibility that
> some other partition uses a legacy filesystem like UFS, which could
> become inconsistent and corrupted if the write cache is enabled.  The
> drawback then becomes that zfs writes are likely to incur more latency.
No idea. I was just trying to point out where this recommendation to
keep it separate comes from... but I don't know the details. But what
you said makes sense. But I am sure that among the random things I read
that sounded semi-credible (eg. by some guy claiming to be a ZFS
engineer), it wasn't only about performance; it was more about
corruption. (but then again, there are lots of doomsayers saying ZFS
will somehow fail you, even though when they explain it, it is usually
user error)

And thanks for your criticism; looking back at this document:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
It looks like they just talk about the cache and not corruption, even if
I look at very old versions of the page. So either what I read before
was likely quite wrong, or just opinion based eg. some bad experience of
some tester or admin.