From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 03:00:36 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 9F467264
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 03:00:36 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 379C01DDC
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 03:00:35 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEAGkaalGDaFvO/2dsb2JhbABQgzyDML4dgRp0gh8BAQEDAQEBASArIAsbGAICDRkCKQEJGAENBggHBAEcBIdtBgyodJFugSOMQn40B4IugRMDkziBDIJBgSGPcIMnIDKBBTU
X-IronPort-AV: E=Sophos;i="4.87,469,1363147200"; d="scan'208";a="23804207"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 13 Apr 2013 23:00:34 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4BC5AB3F19;
 Sat, 13 Apr 2013 23:00:34 -0400 (EDT)
Date: Sat, 13 Apr 2013 23:00:34 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Paul van der Zwan <paulz@vanderzwan.org>
Message-ID: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <495AEA10-9B8F-4A03-B706-79BF43539482@vanderzwan.org>
Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 03:00:36 -0000

Paul van der Zwan wrote:
> On 12 Apr 2013, at 16:28 , Paul van der Zwan <paulz@vanderzwan.org>
> wrote:
> 
> >
> > I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server
> > and I noticed that make buildworld seem to take much longer
> > when the clients mount /usr/src and /usr/obj over NFS V4 than when
> > they use V3.
> > Unfortunately I have to use V4 as a buildworld on V3 hangs the
> > server completely...
> > I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of
> > a few thousand per second
> > and if I snoop the traffic I see the same filenames appear over and
> > over again.
> > It looks like the client is not caching anything at all and using a
> > server request everytime.
> > I use the default mount options:
> > 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls)
> > 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls)
> > 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls)
> >
> >
> 
> I had a look with dtrace
> $ sudo dtrace -n '::getattr:start { @[stack()]=count();}'
> and it seems the vast majority of the calls to getattr are from open()
> and close() system calls.:
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_getattr+0x280
> kernel`VOP_GETATTR_APV+0x74
> kernel`nfs_lookup+0x3cc
> kernel`VOP_LOOKUP_APV+0x74
> kernel`lookup+0x69e
> kernel`namei+0x6df
> kernel`kern_execve+0x47a
> kernel`sys_execve+0x43
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 26
> 
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_close+0x3e9
> kernel`VOP_CLOSE_APV+0x74
> kernel`kern_execve+0x15c5
> kernel`sys_execve+0x43
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 26
> 
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_getattr+0x280
> kernel`VOP_GETATTR_APV+0x74
> kernel`nfs_lookup+0x3cc
> kernel`VOP_LOOKUP_APV+0x74
> kernel`lookup+0x69e
> kernel`namei+0x6df
> kernel`vn_open_cred+0x330
> kernel`vn_open+0x1c
> kernel`kern_openat+0x207
> kernel`kern_open+0x19
> kernel`sys_open+0x18
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 2512
> 
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_close+0x3e9
> kernel`VOP_CLOSE_APV+0x74
> kernel`vn_close+0xee
> kernel`vn_closefile+0xff
> kernel`_fdrop+0x3a
> kernel`closef+0x332
> kernel`kern_close+0x183
> kernel`sys_close+0xb
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 2530
> 
> I had a look at the source of nfs_close and could not find a call to
> nfsrpc_getattr, and I am wondering why close would be calling getattr
> anyway.
> If the file is closed what do we care about it's attributes....
> 
Here are some random statements w.r.t. NFSv3 vs NFSv4 that might help
with an understanding of what is going on. I do address the specific
case of nfs_close() towards the end. (It is kinda long winded, but I
threw out eveything I could think of..)

NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and
Close operations.

In NFSv3, each RPC is defined and usually includes attributes for files
before and after the operation (implicit getattrs not counted in the RPC
counts reported by nfsstat).

For NFSv4, every RPC is a compound built up of a list of Operations like
Getattr. Since the NFSv4 server doesn't know what the compound is doing,
nfsstat reports the counts of Operations for the NFSv4 server, so the counts
will be much higher than with NFSv3, but do not reflect the number of RPCs being done.
To get NFSv4 nfsstat output that can be compared to NFSv3, you need to
do the command on the client(s) and it still is only roughly the same.
(I just realized this should be documented in man nfsstat.)

For the FreeBSD NFSv4 client, the compounds include Getattr operations
similar to what NFSv3 does. It doesn't do a Getattr on the directory
for Lookup, because that would have made the compound much more complex.
I don't think this will have a significant performance impact, but will
result in some additional Getattr RPCs.

I suspect the slowness is caused by the extra overhead of doing the
Open/Close operations against the server. The only way to avoid doing
these against the server for NFSv4 is to enable delegations in both
client and server. How to do this is documented in "man nfsv4". Basically
starting up the nfscbd in the client and setting:
vfs.nfsd.issue_delegations=1
in the server.

Specifically for nfs_close(), the attributes (modify time)
is used for what is called "close to open consistency". This can be
disabled by the "nocto" mount option, if you don't need it for your
build environment. (You only need it if one client is writing a file
and then another client is reading the same file.)

Both the attribute caching and close to open consistency algorithms
in the client are essentially the same for NFSv3 vs NFSv4.

The NFSv4 Close operation(s) are actually done when the v_usecount for
the vnode goes to 0, since mmap'd files can do I/O on pages after
the close syscall. As such, they are only loosely related to the close
syscall. They are actually closing Windows style Openlock(s).

You mention that you see the same file over and over in a packet trace.
You don't give specifics, but I'd suggest that you look at both NFSv3
and NFSv4 for this (and file names are in lookups, not getattrs).

I'd suggest you try enabling delegations in both client and server, plus
trying the "nocto" mount option and see if that helps.

rick

> 
> Paul
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 10:01:49 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 1983AFD
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:01:49 +0000 (UTC)
 (envelope-from mxb@alumni.chalmers.se)
Received: from mail-la0-x232.google.com (mail-la0-x232.google.com
 [IPv6:2a00:1450:4010:c03::232])
 by mx1.freebsd.org (Postfix) with ESMTP id 937BA91A
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:01:47 +0000 (UTC)
Received: by mail-la0-f50.google.com with SMTP id el20so3610308lab.37
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 03:01:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:from:content-type:message-id:mime-version:subject:date
 :references:to:in-reply-to:x-mailer:x-gm-message-state;
 bh=GogMaD4af4lbIF2SZ9kNpDyuew42Dp46GxgGi+0An+w=;
 b=CH54ddnEl6QQ72eU0yhaEUsr6HASvTIpdAdetxc2AkxXeAV55nX5LmMj2dycHyjBsF
 vGZ+ES94wPywgf7Ni1VF7otx0q5zjoL8P7dW6ElXT1XkgwQZeu4FxLDNH+ayDapQwMVM
 is5OOgidf8hwjusbYoodxSYPVyyTMrzz9pRXS0N/bbIbcqTA16va8zhQEgCqM+30uj3O
 DoI111oMQoRjhUYOlFvWXU7B1RJjwS86XkQPVz6FylRLcCFJa7GEt1zPX75ZpZ9UTA1d
 yN5auz/xUOGiCdujAXjjS4knCbDOrYVC5poTITsigltJ4xASIT2ipUAsLg5FPULfV+fW
 pgTw==
X-Received: by 10.112.173.39 with SMTP id bh7mr8339980lbc.62.1365933706960;
 Sun, 14 Apr 2013 03:01:46 -0700 (PDT)
Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se.
 [46.59.74.23])
 by mx.google.com with ESMTPS id xx3sm6016793lbb.14.2013.04.14.03.01.45
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sun, 14 Apr 2013 03:01:46 -0700 (PDT)
From: mxb <mxb@alumni.chalmers.se>
Message-Id: <9EE9328B-40B1-4510-B404-242D0F2C7697@alumni.chalmers.se>
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: ZFS: ZIL device export/import
Date: Sun, 14 Apr 2013 12:01:43 +0200
References: <5A2824CA-2A67-47FA-AB27-20C6EBD2C501@alumni.chalmers.se>
 <51699B8E.7050003@platinum.linux.pl>
 <BCBD7CDE-1BBB-4855-9240-897770FEF822@alumni.chalmers.se>
 <op.wvhrfrhj8527sy@pinky>
 <2DE8AD5E-B84C-4D88-A242-EA30EA4A68FD@alumni.chalmers.se>
 <op.wvhyvkzx8527sy@pinky>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
In-Reply-To: <op.wvhyvkzx8527sy@pinky>
X-Mailer: Apple Mail (2.1503)
X-Gm-Message-State: ALoCoQlY+dacN1cAzXKb6xhmxlpsWUCSOaFyloiV1ajIdog5lbLYRy5FSlYkNG4Qiety2/xtwNSY
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 10:01:49 -0000


Well, I'm trying to preclude any undesired effect in the whole setup, as =
this is going to production.

SAS-link might not be a bottleneck here and I'm overreacting.

Locally ,on per HU basis, I have 6Gbit/s SAS/SATA. Both card and =
disks(SSD) attached to it.
SAS Expander is also 6Gbit/s, attaching 10k RPM SAS mechanical disks on =
JBOD.

I use Intel 520 SSD and Pulsar SSD in this setup.
ZIL resided locally on Intel SSD(per HU), but now will probably move to =
Pulsar SSD(moved to JBOD as those disks have dual SAS/SATA link). L2ARC =
resided on Pulsar (Pulsar was in each HU. eg. I have 2x Pulsar).

Looks like I have to re-design the whole setup, as of ZIL.

//mxb


On 13 apr 2013, at 22:51, Ronald Klop <ronald-freebsd8@klop.yi.org> =
wrote:

> I thought the idea of ZIL is a fast buffer before the write to slow =
disk. Are you really sure the SAS expander is the bottleneck in the =
system instead of the disks?


From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 10:10:46 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 9DD6E1D8
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:10:46 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh1-ve2.go2.pl (moh1-ve2.go2.pl [193.17.41.132])
 by mx1.freebsd.org (Postfix) with ESMTP id 2FA09948
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:10:45 +0000 (UTC)
Received: from moh1-ve2.go2.pl (unknown [10.0.0.132])
 by moh1-ve2.go2.pl (Postfix) with ESMTP id B2B971065D07
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:10:38 +0200 (CEST)
Received: from unknown (unknown [10.0.0.142])
 by moh1-ve2.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:10:37 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IEMXMr;
 Sun, 14 Apr 2013 12:10:36 +0200
Message-ID: <516A8092.2080002@o2.pl>
Date: Sun, 14 Apr 2013 12:10:26 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: support@lists.pcbsd.org
Subject: A failed drive causes system to hang
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 38
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 10:10:46 -0000

Cross-post from freebsd-fs:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs

I have a failing drive in my array. I need to RMA it, but don't have
time and it fails rarely enough to be a yet another annoyance.
The failure is simple: it fails to respond.
When it happens, the only thing I found I can do is switch consoles. Any 
command hangs, login on different consoles hangs, apps hang.
I run PC-BSD 9.1.

On the 1st console I see a series of messages like:

(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED

I've seen it happening even when running an installer from a different 
drive, while preparing installation (don't remember which step).

I have partial dmesg screenshots from an older failure (21st of December 
2012), transcript below:

Screen1:
(ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut)
00
(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut)
00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated

Screen 2:
ahcich0: Timeout on slot 29 port 0
ahcich0: (unreadable, lots of numbers, some text)
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
ahcich0: Timeout on slot 29 port 0
ahcich0: (unreadable, lots of numbers, some text)
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
ahcich0: Timeout on slot 30 port 0
ahcich0: (unreadable, lots of numbers, some text)
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)

Both are from the same event. In general, messages:

(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED.

are the most common.

And one recent, though from a different drive (being a part of the same 
array):
fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19
(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00
(ada1:ata0:0:0:0): CAM status: Command timeout
(ada1:ata0:0:0:0): Retrying command
vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9
linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented
(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00
(ada1:ata0:0:0:0): CAM status: Command timeout
(ada1:ata0:0:0:0): Retrying command

A thing pointed out on freebsd-fs is that driver changed from ahcich0 to 
ata0. I haven't done any configuration here myself. Have you changed 
some defaults?
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 10:18:12 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id AE28841E
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:18:12 +0000 (UTC)
 (envelope-from ronald-freebsd8@klop.yi.org)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 470B8978
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:18:11 +0000 (UTC)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.69) (envelope-from <ronald-freebsd8@klop.yi.org>)
 id 1URK0e-0007Hp-1d; Sun, 14 Apr 2013 12:18:08 +0200
Received: from dhcp-077-251-158-153.chello.nl ([77.251.158.153] helo=pinky)
 by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.72) (envelope-from <ronald-freebsd8@klop.yi.org>)
 id 1URK0d-0005D2-Lj; Sun, 14 Apr 2013 12:18:07 +0200
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: support@lists.pcbsd.org, =?utf-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?=
 <radiomlodychbandytow@o2.pl>
Subject: Re: A failed drive causes system to hang
References: <516A8092.2080002@o2.pl>
Date: Sun, 14 Apr 2013 12:18:07 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.wviz8hya8527sy@pinky>
In-Reply-To: <516A8092.2080002@o2.pl>
User-Agent: Opera Mail/12.15 (Win32)
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: /
X-Spam-Score: 0.8
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.1
X-Scan-Signature: 246115766b56dba7f675551df821dbd2
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 10:18:12 -0000

On Sun, 14 Apr 2013 12:10:26 +0200, Radio młodych bandytów  
<radiomlodychbandytow@o2.pl> wrote:

> Cross-post from freebsd-fs:
> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs
>
> I have a failing drive in my array. I need to RMA it, but don't have
> time and it fails rarely enough to be a yet another annoyance.

Maybe offtopic, but you do have time to write long mails, but not to RMA  
broken disks? I hope your clients don't read this. :-)

Ronald.

> The failure is simple: it fails to respond.
> When it happens, the only thing I found I can do is switch consoles. Any  
> command hangs, login on different consoles hangs, apps hang.
> I run PC-BSD 9.1.
>
> On the 1st console I see a series of messages like:
>
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED
>
> I've seen it happening even when running an installer from a different  
> drive, while preparing installation (don't remember which step).
>
> I have partial dmesg screenshots from an older failure (21st of December  
> 2012), transcript below:
>
> Screen1:
> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
> 00
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut)
> 00
> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut)
> 00
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>
> Screen 2:
> ahcich0: Timeout on slot 29 port 0
> ahcich0: (unreadable, lots of numbers, some text)
> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
> ahcich0: Timeout on slot 29 port 0
> ahcich0: (unreadable, lots of numbers, some text)
> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
> ahcich0: Timeout on slot 30 port 0
> ahcich0: (unreadable, lots of numbers, some text)
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
>
> Both are from the same event. In general, messages:
>
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED.
>
> are the most common.
>
> And one recent, though from a different drive (being a part of the same  
> array):
> fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19
> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00
> (ada1:ata0:0:0:0): CAM status: Command timeout
> (ada1:ata0:0:0:0): Retrying command
> vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9
> linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented
> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00
> (ada1:ata0:0:0:0): CAM status: Command timeout
> (ada1:ata0:0:0:0): Retrying command
>
> A thing pointed out on freebsd-fs is that driver changed from ahcich0 to  
> ata0. I haven't done any configuration here myself. Have you changed  
> some defaults?

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 10:26:35 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 7F267522
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:26:35 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh1-ve1.go2.pl (moh1-ve1.go2.pl [193.17.41.131])
 by mx1.freebsd.org (Postfix) with ESMTP id 4175C9B2
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:26:35 +0000 (UTC)
Received: from moh1-ve1.go2.pl (unknown [10.0.0.131])
 by moh1-ve1.go2.pl (Postfix) with ESMTP id 8046991D216
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:26:28 +0200 (CEST)
Received: from unknown (unknown [10.0.0.42])
 by moh1-ve1.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:26:28 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id UzWvCt;
 Sun, 14 Apr 2013 12:26:25 +0200
Message-ID: <516A8447.90709@o2.pl>
Date: Sun, 14 Apr 2013 12:26:15 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject: Re: A failed drive causes system to hang
References: <516A8092.2080002@o2.pl> <op.wviz8hya8527sy@pinky>
In-Reply-To: <op.wviz8hya8527sy@pinky>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-O2-Trust: 1, 31
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 10:26:35 -0000

On 14/04/2013 12:18, Ronald Klop wrote:
> On Sun, 14 Apr 2013 12:10:26 +0200, Radio młodych bandytów
> <radiomlodychbandytow@o2.pl> wrote:
>
>> Cross-post from freebsd-fs:
>> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs
>>
>>
>> I have a failing drive in my array. I need to RMA it, but don't have
>> time and it fails rarely enough to be a yet another annoyance.
>
> Maybe offtopic, but you do have time to write long mails, but not to RMA
> broken disks? I hope your clients don't read this. :-)
>
> Ronald.
It's my private desktop and it's a semi-test system.
I don't care much if I loose it.
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 10:34:53 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 9C19F612
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:34:53 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh3-ve3.go2.pl (moh3-ve3.go2.pl [193.17.41.87])
 by mx1.freebsd.org (Postfix) with ESMTP id 17A199F7
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:34:52 +0000 (UTC)
Received: from moh3-ve3.go2.pl (unknown [10.0.0.158])
 by moh3-ve3.go2.pl (Postfix) with ESMTP id 42CF6B5A725
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:34:50 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh3-ve3.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:34:50 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id xQrdjj;
 Sun, 14 Apr 2013 12:34:47 +0200
Message-ID: <516A8646.4000101@o2.pl>
Date: Sun, 14 Apr 2013 12:34:46 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Jeremy Chadwick <jdc@koitsu.org>
Subject: Re: A failed drive causes system to hang
References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org>
 <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
In-Reply-To: <20130413000731.GA84309@icarus.home.lan>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 33
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 10:34:53 -0000

On 13/04/2013 02:07, Jeremy Chadwick wrote:
>
>
> On Sat, Apr 13, 2013 at 12:33:10AM +0200, Radio m?odych bandytw wrote:
>> On 13/04/2013 00:03, Jeremy Chadwick wrote:
>>> On Fri, Apr 12, 2013 at 11:52:31PM +0200, Radio m?odych bandytw wrote:
>>>> On 11/04/2013 23:24, Jeremy Chadwick wrote:
>>>>> On Thu, Apr 11, 2013 at 10:47:32PM +0200, Radio m?odych bandytw wrote:
>>>>>> Seeing a ZFS thread, I decided to write about a similar problem that
>>>>>> I experience.
>>>>>> I have a failing drive in my array. I need to RMA it, but don't have
>>>>>> time and it fails rarely enough to be a yet another annoyance.
>>>>>> The failure is simple: it fails to respond.
>>>>>> When it happens, the only thing I found I can do is switch consoles.
>>>>>> Any command fails, login fails, apps hang.
>>>>>>
>>>>>> On the 1st console I see a series of messages like:
>>>>>>
>>>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout
>>>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED
>>>>>>
>>>>>> I use RAIDZ1 and I'd expect that none single failure would cause the
>>>>>> system to fail...
>>>>>
>>>>> You need to provide full output from "dmesg", and you need to define
>>>>> what the word "fails" means (re: "any command fails", "login fails").
>>>> Fails = hangs. When trying to log it, I can type my user name, but
>>>> after I press enter the prompt for password never appear.
>>>> As to dmesg, tough luck. I have 2 photos on my phone and their
>>>> transcripts are all I can give until the problem reappears (which
>>>> should take up to 2 weeks). Photos are blurry and in many cases I'm
>>>> not sure what exactly is there.
>>>>
>>>> Screen1:
>>>> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
>>>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
>>>> 00
>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout
>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut)
>>>> 00
>>>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut)
>>>> 00
>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout
>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>>
>>>>
>>>> Screen 2:
>>>> ahcich0: Timeout on slot 29 port 0
>>>> ahcich0: (unreadable, lots of numbers, some text)
>>>> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
>>>> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
>>>> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
>>>> ahcich0: Timeout on slot 29 port 0
>>>> ahcich0: (unreadable, lots of numbers, some text)
>>>> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
>>>> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
>>>> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
>>>> ahcich0: Timeout on slot 30 port 0
>>>> ahcich0: (unreadable, lots of numbers, some text)
>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout
>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
>>>>
>>>> Both are from the same event. In general, messages:
>>>>
>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout
>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED.
>>>>
>>>> are the most common.
>>>>
>>>> I've waited for more than 1/2 hour once and the system didn't return
>>>> to a working state, the messages kept flowing and pretty much
>>>> nothing was working. What's interesting, I remember that it happened
>>>> to me even when I was using an installer (PC-BSD one), before the
>>>> actual installation began, so the disk stored no program data. And I
>>>> *think* there was no ZFS yet anyway.
>>>>
>>>>>
>>>>> I've already demonstrated that loss of a disk in raidz1 (or even 2 disks
>>>>> in raidz2) does not cause ""the system to fail"" on stable/9.  However,
>>>>> if you lose enough members or vdevs to cause catastrophic failure, there
>>>>> may be anomalies depending on how your system is set up:
>>>>>
>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html
>>>>>
>>>>> If the pool has failmode=wait, any I/O to that pool will block (wait)
>>>>> indefinitely.  This is the default.
>>>>>
>>>>> If the pool has failmode=continue, existing write I/O operations will
>>>>> fail with EIO (I/O error) (and hopefully applications/daemons will
>>>>> handle that gracefully -- if not, that's their fault) but any subsequent
>>>>> I/O (read or write) to that pool will block (wait) indefinitely.
>>>>>
>>>>> If the pool has failmode=panic, the kernel will immediately panic.
>>>>>
>>>>> If the CAM layer is what's wedged, that may be a different issue (and
>>>>> not related to ZFS).  I would suggest running stable/9 as many
>>>>> improvements in this regard have been committed recently (some related
>>>>> to CAM, others related to ZFS and its new "deadman" watcher).
>>>>
>>>> Yeah, because of the installer failure, I don't think it's related to ZFS.
>>>> Even if it is, for now I won't set any ZFS properties in hope it
>>>> repeats and I can get better data.
>>>>>
>>>>> Bottom line: terse output of the problem does not help.  Be verbose,
>>>>> provide all output (commands you type, everything!), as well as any
>>>>> physical actions you take.
>>>>>
>>>> Yep. In fact having little data was what made me hesitate to write
>>>> about it; since I did already, I'll do my best to get more info,
>>>> though for now I can only wait for a repetition.
>>>>
>>>>
>>>> On 12/04/2013 00:08, Quartz wrote:>
>>>>>> Seeing a ZFS thread, I decided to write about a similar problem that I
>>>>>> experience.
>>>>>
>>>>> I'm assuming you're referring to my "Failed pool causes system to hang"
>>>>> thread. I wonder if there's some common issue with zfs where it locks up
>>>>> if it can't write to disks how it wants to.
>>>>>
>>>>> I'm not sure how similar your problem is to mine. What's your pool setup
>>>>> look like? Redundancy options? Are you booting from a pool? I'd be
>>>>> interested to know if you can just yank the cable to the drive and see
>>>>> if the system recovers.
>>>>>
>>>>> You seem to be worse off than me- I can still login and run at least a
>>>>> couple commands. I'm booting from a straight ufs drive though.
>>>>>
>>>>> ______________________________________
>>>>> it has a certain smooth-brained appeal
>>>>>
>>>> Like I said, I don't think it's ZFS-specific, but just in case...:
>>>> RAIDZ1, root on ZFS. I should reduce severity of a pool loss before
>>>> pulling cables, so no tests for now.
>>>
>>> Key points:
>>>
>>> 1. We now know why "commands hang" and anything I/O-related blocks
>>> (waits) for you: because your root filesystem is ZFS.  If the ZFS layer
>>> is waiting on CAM, and CAM is waiting on your hardware, then those I/O
>>> requests are going to block indefinitely.  So now you know the answer to
>>> why that happens.
>>>
>>> 2. I agree that the problem is not likely in ZFS, but rather either with
>>> CAM, the AHCI implementation used, or hardware (either disk or storage
>>> controller).
>>>
>>> 3. Your lack of "dmesg" is going to make this virtually impossible to
>>> solve.  We really, ***really*** need that.  I cannot stress this enough.
>>> This will tell us a lot of information about your system.  We're also
>>> going to need to see "zpool status" output, as well as "zpool get all"
>>> and "zfs get all".  "pciconf -lvbc" would also be useful.
>>>
>>> There are some known "gotchas" with certain models of hard disks or AHCI
>>> controllers (which is responsible is unknown at this time), but I don't
>>> want to start jumping to conclusions until full details can be provided
>>> first.
>>>
>>> I would recommend formatting a USB flash drive as FAT/FAT32, booting
>>> into single-user mode, then mounting the USB flash drive and issuing
>>> the above commands + writing the output to files on the flash drive,
>>> then provide those here.
>>>
>>> We really need this information.
>>>
>>> 4. Please involve the PC-BSD folks in this discussion.  They need to be
>>> made aware of issues like this so they (and iXSystems, potentially) can
>>> investigate from their side.
>>>
>> OK, thanks for the info.
>> Since dmesg is so important, I'd say the best thing is to wait for
>> the problem to happen again. When it does, I'll restart the thread
>> with every information that you requested here and with a PC-BSD
>> cross-post.
>>
>> However, I just got a different hang just a while ago. This time it
>> was temporary, I don't know, I switched to console0 after ~10
>> seconds, there were 2 errors. Nothing appeared for ~1 minute, so I
>> switched back and the system was OK. Different drive, I haven't seen
>> problems with this one. And I think they used to be ahci, here's
>> ata.
>>
>> dmesg:
>>
>> fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19
>> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00
>> (ada1:ata0:0:0:0): CAM status: Command timeout
>> (ada1:ata0:0:0:0): Retrying command
>> vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9
>> linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented
>> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00
>> (ada1:ata0:0:0:0): CAM status: Command timeout
>> (ada1:ata0:0:0:0): Retrying command
>>
>> {another 150KBytes of data snipped}
>
> The above output indicates that there was a timeout when trying to issue
> a 48-bit DMA request to the disk.  The disk did not respond to the
> request within 30 seconds.
>
> If you were using AHCI, we'd be able to see if the AHCI layer was
> reporting signalling problems or other anomalies that could explain the
> behaviour.  With ATA, such is significantly limited.  It's worse if
> you're hiding/not showing us the entire information.
>
> The classic FreeBSD ATA driver does not provide command queueing (NCQ),
> while AHCI via CAM does.  The difference is that command queueing causes
> xxx_FPDMA_QUEUED CDBs to be issued to the disk.
>
> I'm going to repeat myself -- for the last time: CAN YOU PLEASE JUST
> PROVIDE "DMESG" FROM THE SYSTEM?  Like after a fresh reboot?  If you're
> able to provide all of the above, I don't know why you can't provide
> dmesg.  It is the most important information that there is.  I am sick
> and tired of stressing this point.
Sorry. I thought just the error was important. So here you are:
dmesg.boot:
http://pastebin.com/LFXPusMX
>
> Furthermore, please stop changing ATA vs. AHCI interface drivers.
> The more you change/screw around with, the less likely people are going
> to help.  CHANGE NOTHING ON THE SYSTEM.  Leave it how it is.  Do not
> fiddle with things or start flipping switches/changing settings/etc. to
> "try and relieve the problem".  You're asking other people for help,
> which means you need to be patient and follow what we ask.
I haven't changed one bit myself. It may have been a change of defaults 
in PC-BSD. I just asked them about it.
Or maybe different drives use different drivers.

>
> Thank you for the rest of the output, however.  It looks like this is
> another system with an ATI-based controller (which is usually the kind
> involved in my aforementioned "gotchas"), but there still isn't enough
> information that can help.  I have a gut feeling of what's about to
> come, but I need to see dmesg output before I can determine that.
>
> Furthermore, can you please provide this information with its formatting
> intact?  Your Email client is screwing up "long lines" and causing
> unnecesary wrapping.
>
> The mailing list will nuke attachments, so please use pastebin or some
> similar service + provide URLs.
pciconf -lvbc:
http://pastebin.com/vvCKAWm1
zpool status:
http://pastebin.com/D3Av7x9X
zfs get all:
http://pastebin.com/4sT37VqZ
zpool get all tank1:
http://pastebin.com/HZJTJPa2
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 10:35:48 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 286D66AA
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:35:48 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh3-ve1.go2.pl (moh3-ve1.go2.pl [193.17.41.30])
 by mx1.freebsd.org (Postfix) with ESMTP id DF97BA09
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 10:35:47 +0000 (UTC)
Received: from moh3-ve1.go2.pl (unknown [10.0.0.117])
 by moh3-ve1.go2.pl (Postfix) with ESMTP id 8C83CA6A029
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:35:46 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh3-ve1.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 12:35:46 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id QtpKGS;
 Sun, 14 Apr 2013 12:35:44 +0200
Message-ID: <516A8680.2020107@o2.pl>
Date: Sun, 14 Apr 2013 12:35:44 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Charles Sprickman <spork@bway.net>
Subject: Re: A failed drive causes system to hang
References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org>
 <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <51691524.4050009@sneakertech.com>
 <4617BC69-842C-422E-9616-3BCDC11C0048@bway.net>
In-Reply-To: <4617BC69-842C-422E-9616-3BCDC11C0048@bway.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 31
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 10:35:48 -0000

On 13/04/2013 20:14, Charles Sprickman wrote:
> On Apr 13, 2013, at 4:19 AM, Quartz wrote:
>
>>
>>> As to dmesg, tough luck. I have 2 photos on my phone and their
>>> transcripts are all I can give until the problem reappears
>>
>> I think there's a communication gap here.
>>
>> While a messages and logs from the time the incident happens are ideal, Jeremy *also* just needs to see the generic info about your hardware, which can be found in any dmesg taken at any time.
>
> More specifically, I think the OP did supply the full output of the 'dmesg' *command*, but what I think is wanted is the contents of /var/run/dmesg.boot.
>
> Charles
>
>>
>> ______________________________________
>> it has a certain smooth-brained appeal
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>
>
Yeah. Thanks, I didn't even know about /var/run/dmesg.boot existence.
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 11:08:28 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 92D969DD
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 11:08:28 +0000 (UTC)
 (envelope-from paulz@vanderzwan.org)
Received: from cpsmtpb-ews10.kpnxchange.com (cpsmtpb-ews10.kpnxchange.com
 [213.75.39.15]) by mx1.freebsd.org (Postfix) with ESMTP id E81C1ABA
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 11:08:27 +0000 (UTC)
Received: from cpsps-ews08.kpnxchange.com ([10.94.84.175]) by
 cpsmtpb-ews10.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); 
 Sun, 14 Apr 2013 13:08:17 +0200
Received: from CPSMTPM-TLF102.kpnxchange.com ([195.121.3.5]) by
 cpsps-ews08.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); 
 Sun, 14 Apr 2013 13:08:17 +0200
Received: from mailvm.vanderzwan.org ([77.172.189.82]) by
 CPSMTPM-TLF102.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); 
 Sun, 14 Apr 2013 13:08:17 +0200
Received: from gaspode.vanderzwan.org (gaspode.vanderzwan.org [192.168.178.22])
 (authenticated bits=0)
 by mailvm.vanderzwan.org (8.14.6/8.14.6) with ESMTP id r3EB8FUR077658
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
 Sun, 14 Apr 2013 13:08:16 +0200 (CEST)
 (envelope-from paulz@vanderzwan.org)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
From: Paul van der Zwan <paulz@vanderzwan.org>
In-Reply-To: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca>
Date: Sun, 14 Apr 2013 13:08:15 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <2B576479-C83A-4D3F-B486-475625383E9C@vanderzwan.org>
References: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1503)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.3.9
 (mailvm.vanderzwan.org [192.168.178.25]);
 Sun, 14 Apr 2013 13:08:16 +0200 (CEST)
X-OriginalArrivalTime: 14 Apr 2013 11:08:17.0347 (UTC)
 FILETIME=[5D99D930:01CE3900]
X-RcptDomain: freebsd.org
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 11:08:28 -0000


On 14 Apr 2013, at 5:00 , Rick Macklem <rmacklem@uoguelph.ca> wrote:


Thanks for taking the effort to send such an extensive reply.

> Paul van der Zwan wrote:
>> On 12 Apr 2013, at 16:28 , Paul van der Zwan <paulz@vanderzwan.org>
>> wrote:
>>=20
>>>=20
>>> I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server
>>> and I noticed that make buildworld seem to take much longer
>>> when the clients mount /usr/src and /usr/obj over NFS V4 than when
>>> they use V3.
>>> Unfortunately I have to use V4 as a buildworld on V3 hangs the
>>> server completely...
>>> I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of
>>> a few thousand per second
>>> and if I snoop the traffic I see the same filenames appear over and
>>> over again.
>>> It looks like the client is not caching anything at all and using a
>>> server request everytime.
>>> I use the default mount options:
>>> 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls)
>>> 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls)
>>> 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls)
>>>=20
>>>=20
>>=20
>> I had a look with dtrace
>> $ sudo dtrace -n '::getattr:start { @[stack()]=3Dcount();}'
>> and it seems the vast majority of the calls to getattr are from =
open()
>> and close() system calls.:
>> kernel`newnfs_request+0x631
>> kernel`nfscl_request+0x75
>> kernel`nfsrpc_getattr+0xbe
>> kernel`nfs_getattr+0x280
>> kernel`VOP_GETATTR_APV+0x74
>> kernel`nfs_lookup+0x3cc
>> kernel`VOP_LOOKUP_APV+0x74
>> kernel`lookup+0x69e
>> kernel`namei+0x6df
>> kernel`kern_execve+0x47a
>> kernel`sys_execve+0x43
>> kernel`amd64_syscall+0x3bf
>> kernel`0xffffffff80784947
>> 26
>>=20
>> kernel`newnfs_request+0x631
>> kernel`nfscl_request+0x75
>> kernel`nfsrpc_getattr+0xbe
>> kernel`nfs_close+0x3e9
>> kernel`VOP_CLOSE_APV+0x74
>> kernel`kern_execve+0x15c5
>> kernel`sys_execve+0x43
>> kernel`amd64_syscall+0x3bf
>> kernel`0xffffffff80784947
>> 26
>>=20
>> kernel`newnfs_request+0x631
>> kernel`nfscl_request+0x75
>> kernel`nfsrpc_getattr+0xbe
>> kernel`nfs_getattr+0x280
>> kernel`VOP_GETATTR_APV+0x74
>> kernel`nfs_lookup+0x3cc
>> kernel`VOP_LOOKUP_APV+0x74
>> kernel`lookup+0x69e
>> kernel`namei+0x6df
>> kernel`vn_open_cred+0x330
>> kernel`vn_open+0x1c
>> kernel`kern_openat+0x207
>> kernel`kern_open+0x19
>> kernel`sys_open+0x18
>> kernel`amd64_syscall+0x3bf
>> kernel`0xffffffff80784947
>> 2512
>>=20
>> kernel`newnfs_request+0x631
>> kernel`nfscl_request+0x75
>> kernel`nfsrpc_getattr+0xbe
>> kernel`nfs_close+0x3e9
>> kernel`VOP_CLOSE_APV+0x74
>> kernel`vn_close+0xee
>> kernel`vn_closefile+0xff
>> kernel`_fdrop+0x3a
>> kernel`closef+0x332
>> kernel`kern_close+0x183
>> kernel`sys_close+0xb
>> kernel`amd64_syscall+0x3bf
>> kernel`0xffffffff80784947
>> 2530
>>=20
>> I had a look at the source of nfs_close and could not find a call to
>> nfsrpc_getattr, and I am wondering why close would be calling getattr
>> anyway.
>> If the file is closed what do we care about it's attributes....
>>=20
> Here are some random statements w.r.t. NFSv3 vs NFSv4 that might help
> with an understanding of what is going on. I do address the specific
> case of nfs_close() towards the end. (It is kinda long winded, but I
> threw out eveything I could think of..)
>=20
> NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and
> Close operations.
>=20
> In NFSv3, each RPC is defined and usually includes attributes for =
files
> before and after the operation (implicit getattrs not counted in the =
RPC
> counts reported by nfsstat).
>=20
> For NFSv4, every RPC is a compound built up of a list of Operations =
like
> Getattr. Since the NFSv4 server doesn't know what the compound is =
doing,
> nfsstat reports the counts of Operations for the NFSv4 server, so the =
counts
> will be much higher than with NFSv3, but do not reflect the number of =
RPCs being done.
> To get NFSv4 nfsstat output that can be compared to NFSv3, you need to
> do the command on the client(s) and it still is only roughly the same.
> (I just realized this should be documented in man nfsstat.)
>=20
I ran nfsstat -s -v 4 on the server and saw the number of requests being =
done.
They were in the order of a few thousand per second for a single FreeBSD =
9.1 client=20
doing a make build world.

> For the FreeBSD NFSv4 client, the compounds include Getattr operations
> similar to what NFSv3 does. It doesn't do a Getattr on the directory
> for Lookup, because that would have made the compound much more =
complex.
> I don't think this will have a significant performance impact, but =
will
> result in some additional Getattr RPCs.
>=20
I ran snoop on port 2049 on the server and I saw a large number of =
lookups.
A lot of them seem to be for directories which are part of the filenames =
of
the compiler and include files which on the nfs mounted /usr/obj.
The same names keep reappering so it looks like there is no caching =
being done on=20
the client.

> I suspect the slowness is caused by the extra overhead of doing the
> Open/Close operations against the server. The only way to avoid doing
> these against the server for NFSv4 is to enable delegations in both
> client and server. How to do this is documented in "man nfsv4". =
Basically
> starting up the nfscbd in the client and setting:
> vfs.nfsd.issue_delegations=3D1
> in the server.
>=20
> Specifically for nfs_close(), the attributes (modify time)
> is used for what is called "close to open consistency". This can be
> disabled by the "nocto" mount option, if you don't need it for your
> build environment. (You only need it if one client is writing a file
> and then another client is reading the same file.)
>=20
I tried the nocto option in /etc/fstab but it does not show when mount =
shows
the mounted filesystems so I am not sure if it is being used.
On the server netstat shows an active connection to port 7745 on the =
client
but snoop shows no data flowing on that session.
=20
> Both the attribute caching and close to open consistency algorithms
> in the client are essentially the same for NFSv3 vs NFSv4.
>=20
> The NFSv4 Close operation(s) are actually done when the v_usecount for
> the vnode goes to 0, since mmap'd files can do I/O on pages after
> the close syscall. As such, they are only loosely related to the close
> syscall. They are actually closing Windows style Openlock(s).
>=20
I had a look at the code of the NFS v4 client of Illumos ( which is =
basically what
my server is running ) and as far as I  understand it they only do the =
gettatr only when
the close was for a file that  was opened for write and when there was =
actually something=20
written to the file.
The FreeBSD code seems to do the getattr for all close() calls.
For files that were never written, like executables or source files that =
seems
to cause quite a lot of overhead.

> You mention that you see the same file over and over in a packet =
trace.
> You don't give specifics, but I'd suggest that you look at both NFSv3
> and NFSv4 for this (and file names are in lookups, not getattrs).
>=20
> I'd suggest you try enabling delegations in both client and server, =
plus
> trying the "nocto" mount option and see if that helps.
>=20
Tried it but it does not seem to make any noticable difference.

I tried a make buildworld buildkernel with /usr/obj a local FS in the =
Vbox VM
that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem =
it takes
about a day. A twelve fold increase is elapsed time makes using NFSv4 =
unusable=20
for this use case.
Too bad the server hangs when I use nfsv3 mount for /usr/obj.
Having a shared /usr/obj makes it possible to run a make buildworld on a =
single VM
and just run make installworld on the others.

	Paul

> rick
>=20
>>=20
>> Paul
>>=20
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20


From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 14:09:31 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 09964A97
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 14:09:31 +0000 (UTC)
 (envelope-from prvs=1816034565=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id A3D72D9
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 14:09:30 +0000 (UTC)
Received: from r2d2 ([46.65.172.4])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50003265740.msg
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 15:09:23 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Sun, 14 Apr 2013 15:09:23 +0100
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 46.65.172.4
X-Return-Path: prvs=1816034565=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-fs@freebsd.org
Message-ID: <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: =?iso-8859-1?Q?Radio_mlodych_bandyt=F3w?= <radiomlodychbandytow@o2.pl>,
 <support@lists.pcbsd.org>
References: <516A8092.2080002@o2.pl>
Subject: Re: A failed drive causes system to hang
Date: Sun, 14 Apr 2013 15:09:38 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=response
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 14:09:31 -0000


----- Original Message ----- 
From: "Radio mlodych bandyt�w" <radiomlodychbandytow@o2.pl>
To: <support@lists.pcbsd.org>
Cc: <freebsd-fs@freebsd.org>
Sent: Sunday, April 14, 2013 11:10 AM
Subject: A failed drive causes system to hang


> Cross-post from freebsd-fs:
> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs
>
> I have a failing drive in my array. I need to RMA it, but don't have
> time and it fails rarely enough to be a yet another annoyance.
> The failure is simple: it fails to respond.
> When it happens, the only thing I found I can do is switch consoles. Any command hangs, login on different consoles hangs, apps 
> hang.
> I run PC-BSD 9.1.
>
> On the 1st console I see a series of messages like:
>
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED
>
> I've seen it happening even when running an installer from a different drive, while preparing installation (don't remember which 
> step).
>
> I have partial dmesg screenshots from an older failure (21st of December 2012), transcript below:
>
> Screen1:
> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
> 00

smartctl has the ability to print out the queued log file if
the drive supports it. This may give you some more information
on what the problem may be with your drive.

    Regards
    Steve 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 18:31:59 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 03A45B40
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 18:31:59 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh3-ve1.go2.pl (moh3-ve2.go2.pl [193.17.41.86])
 by mx1.freebsd.org (Postfix) with ESMTP id B7DE9CE2
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 18:31:58 +0000 (UTC)
Received: from moh3-ve1.go2.pl (unknown [10.0.0.157])
 by moh3-ve1.go2.pl (Postfix) with ESMTP id 30AF7AF696C
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 20:31:57 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh3-ve1.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 20:31:57 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id zSKbSz;
 Sun, 14 Apr 2013 20:31:55 +0200
Message-ID: <516AF61B.7060204@o2.pl>
Date: Sun, 14 Apr 2013 20:31:55 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
Subject: Re: A failed drive causes system to hang
References: <516A8092.2080002@o2.pl>
 <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
In-Reply-To: <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-O2-Trust: 1, 34
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 18:31:59 -0000

On 14/04/2013 16:09, Steven Hartland wrote:
>
> ----- Original Message ----- From: "Radio mlodych bandytów"
> <radiomlodychbandytow@o2.pl>
> To: <support@lists.pcbsd.org>
> Cc: <freebsd-fs@freebsd.org>
> Sent: Sunday, April 14, 2013 11:10 AM
> Subject: A failed drive causes system to hang
>
>
>> Cross-post from freebsd-fs:
>> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs
>>
>>
>> I have a failing drive in my array. I need to RMA it, but don't have
>> time and it fails rarely enough to be a yet another annoyance.
>> The failure is simple: it fails to respond.
>> When it happens, the only thing I found I can do is switch consoles.
>> Any command hangs, login on different consoles hangs, apps hang.
>> I run PC-BSD 9.1.
>>
>> On the 1st console I see a series of messages like:
>>
>> (ada0:ahcich0:0:0:0): CAM status: Command timeout
>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED
>>
>> I've seen it happening even when running an installer from a different
>> drive, while preparing installation (don't remember which step).
>>
>> I have partial dmesg screenshots from an older failure (21st of
>> December 2012), transcript below:
>>
>> Screen1:
>> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
>> 00
>
> smartctl has the ability to print out the queued log file if
> the drive supports it. This may give you some more information
> on what the problem may be with your drive.
>
>     Regards
>     Steve
>
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of
> misdirection, the recipient is prohibited from using, copying, printing
> or otherwise disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission
> please telephone +44 845 868 1337
> or return the E.mail to postmaster@multiplay.co.uk.
>
>
No errors on any of these drives.
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 18:51:22 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 177C4E4A
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 18:51:22 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta15.emeryville.ca.mail.comcast.net
 (qmta15.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:228])
 by mx1.freebsd.org (Postfix) with ESMTP id EF033D6D
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 18:51:21 +0000 (UTC)
Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88])
 by qmta15.emeryville.ca.mail.comcast.net with comcast
 id PuQK1l0021u4NiLAFurM08; Sun, 14 Apr 2013 18:51:21 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta21.emeryville.ca.mail.comcast.net with comcast
 id PurH1l00y1t3BNj8hurJoF; Sun, 14 Apr 2013 18:51:20 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id B640973A33; Sun, 14 Apr 2013 11:51:17 -0700 (PDT)
Date: Sun, 14 Apr 2013 11:51:17 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?=
 <radiomlodychbandytow@o2.pl>
Subject: Re: A failed drive causes system to hang
Message-ID: <20130414185117.GA38259@icarus.home.lan>
References: <516A8092.2080002@o2.pl>
 <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
 <516AF61B.7060204@o2.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=unknown-8bit
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <516AF61B.7060204@o2.pl>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1365965481;
 bh=aFlhudGBuSyEIQpvUvzPrBEU0hp/uBYCAjeLPdB7KSw=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=ChOvudQdeeWdlUIUmjtao4JIR2ZUjo8pMBS2wfxzjt7tYdVdiSH6VeYl5IKhIAtS5
 I8EypUwTxezsw+KJNIH7JE8NGBTA9OP7g8SSQKov88NDMAcFu78G8l5jhGTkY5rhmG
 GsphLoYvSWzfW3JVr+9bE+v9klkFwKm/53MFt0aH1D7ngWQvBo1uECzmAeXF8ePJOg
 uJrj16JMeXykR7RoBAUcVygfj0jNfW5z7domD0Fk6rrx4AqVmTEQsVSI8dsmjIauaT
 ljh3JWbRtQe/wRJrVFZTP/K/DGQUUK8+sgtP0niekkQK5+4V1yTonoapv0o6BtSo/C
 VGpPXj+L6aIMA==
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 18:51:22 -0000

On Sun, Apr 14, 2013 at 08:31:55PM +0200, Radio m?odych bandytw wrote:
> On 14/04/2013 16:09, Steven Hartland wrote:
> >
> >----- Original Message ----- From: "Radio mlodych bandytów"
> ><radiomlodychbandytow@o2.pl>
> >To: <support@lists.pcbsd.org>
> >Cc: <freebsd-fs@freebsd.org>
> >Sent: Sunday, April 14, 2013 11:10 AM
> >Subject: A failed drive causes system to hang
> >
> >
> >>Cross-post from freebsd-fs:
> >>http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs
> >>
> >>
> >>I have a failing drive in my array. I need to RMA it, but don't have
> >>time and it fails rarely enough to be a yet another annoyance.
> >>The failure is simple: it fails to respond.
> >>When it happens, the only thing I found I can do is switch consoles.
> >>Any command hangs, login on different consoles hangs, apps hang.
> >>I run PC-BSD 9.1.
> >>
> >>On the 1st console I see a series of messages like:
> >>
> >>(ada0:ahcich0:0:0:0): CAM status: Command timeout
> >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED
> >>
> >>I've seen it happening even when running an installer from a different
> >>drive, while preparing installation (don't remember which step).
> >>
> >>I have partial dmesg screenshots from an older failure (21st of
> >>December 2012), transcript below:
> >>
> >>Screen1:
> >>(ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
> >>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
> >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
> >>00
> >
> >smartctl has the ability to print out the queued log file if
> >the drive supports it. This may give you some more information
> >on what the problem may be with your drive.
>
> No errors on any of these drives.

Please provide full output from the following command, and please retain
the formatting (pastebin, etc.):

smartctl -x /dev/ada0

I would also appreciate seeing the same output for the other drives on
the system (specifically /dev/ada1 and /dev/ada2), now that I've seen
the dmesg output.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 18:58:16 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 394ED16B
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 18:58:16 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-vb0-x236.google.com (mail-vb0-x236.google.com
 [IPv6:2607:f8b0:400c:c02::236])
 by mx1.freebsd.org (Postfix) with ESMTP id F1B12DBA
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 18:58:15 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id w16so3304167vbf.27
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 11:58:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=ewlU8n7ARbOs00OqEYT6KyC/XB5MbBDFjacepi2fRA0=;
 b=MxD8xSd1gphZFlXDMcbTXp+Mg6PxTCd2xMUK5Nqw4CkCyaoXOLSx/wlAURh2TnEvF6
 va3rYZyzuUz0Vz60nRzMKVbRMU8VDtCHI4rB0/roFc+FkkbonZxTTHOWKw4J+QGoEwgV
 TpDofOE2wY5pKY4128kL/OMsxs5cR/T02zrZzw8TylajouBo66erex+WfCEiW9FuqEDW
 BdwPGiFPTAofLUSPjK4z/85O3GO5EHdGkZvIgqtZ1Wxr0dTC0AWC4GtYtwOrvMSunNzN
 U3DZPW0B/gzf7zzqNaBTm9MguvyA4jg1mbWuyLchROfxqXnp1nRKEOuwcC3gH1CKdn9V
 1gjw==
MIME-Version: 1.0
X-Received: by 10.52.183.36 with SMTP id ej4mr12056052vdc.95.1365965895509;
 Sun, 14 Apr 2013 11:58:15 -0700 (PDT)
Received: by 10.220.91.83 with HTTP; Sun, 14 Apr 2013 11:58:15 -0700 (PDT)
In-Reply-To: <20130414185117.GA38259@icarus.home.lan>
References: <516A8092.2080002@o2.pl>
 <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
 <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan>
Date: Sun, 14 Apr 2013 14:58:15 -0400
Message-ID: <CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw@mail.gmail.com>
Subject: Re: A failed drive causes system to hang
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: Jeremy Chadwick <jdc@koitsu.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-fs <freebsd-fs@freebsd.org>,
 =?UTF-8?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMSCxYJ3?=
 <radiomlodychbandytow@o2.pl>, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 18:58:16 -0000

I'd like to throw in my two cents here.  I've seen this (drives in RAID-1
configuration) hanging whole systems.  Back in the IDE days, two drives
were connected with one cable --- I largely wrote it off as a deficiency of
IDE hardware and resolved to by SCSI hardware for more important systems.
Of late, the physical hardware for SCSI (SAS) and SATA drives have
converged.  I'm willing to accept that SAS hardware may be built to a
different standard, but I'm suspicious of the fact that a bad SATA drive on
an ACH* controller can hang the whole system.

... it's not complete, however.  Often pulling the drive's cable will
unfreeze things.  It's also not entirely consistent.  Drives I have behind
4:1 port multipliers haven't (so far) hung the system that they're on
(which uses ACH10).  Right now, I have a remote ACH10 system that's hung
hard a couple of times --- and it passes both it's short and long SMART
tests on both drives.

Is there no global timeout we can depend on here?

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 19:11:28 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C20D12BF
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:11:28 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh1-ve1.go2.pl (moh1-ve1.go2.pl [193.17.41.131])
 by mx1.freebsd.org (Postfix) with ESMTP id 839BFE13
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:11:28 +0000 (UTC)
Received: from moh1-ve1.go2.pl (unknown [10.0.0.131])
 by moh1-ve1.go2.pl (Postfix) with ESMTP id 452F691F25D
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 21:11:25 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh1-ve1.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 21:11:25 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id GtzhlA;
 Sun, 14 Apr 2013 21:11:22 +0200
Message-ID: <516AFF5A.9010508@o2.pl>
Date: Sun, 14 Apr 2013 21:11:22 +0200
From: Radio młodych bandytów <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Jeremy Chadwick <jdc@koitsu.org>
Subject: Re: A failed drive causes system to hang
References: <516A8092.2080002@o2.pl>
 <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl>
 <20130414185117.GA38259@icarus.home.lan>
In-Reply-To: <20130414185117.GA38259@icarus.home.lan>
Content-Type: text/plain; charset=unknown-8bit
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 32
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 19:11:28 -0000


From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 19:28:31 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 93D278FA
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:28:31 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta03.emeryville.ca.mail.comcast.net
 (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32])
 by mx1.freebsd.org (Postfix) with ESMTP id 7623BEA6
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:28:31 +0000 (UTC)
Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44])
 by qmta03.emeryville.ca.mail.comcast.net with comcast
 id PuCg1l0090x6nqcA3vUX5l; Sun, 14 Apr 2013 19:28:31 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta12.emeryville.ca.mail.comcast.net with comcast
 id PvUW1l0051t3BNj8YvUWRB; Sun, 14 Apr 2013 19:28:30 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id 37E4A73A33; Sun, 14 Apr 2013 12:28:30 -0700 (PDT)
Date: Sun, 14 Apr 2013 12:28:30 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?=
 <radiomlodychbandytow@o2.pl>
Subject: Re: A failed drive causes system to hang
Message-ID: <20130414192830.GA38338@icarus.home.lan>
References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org>
 <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
 <516A8646.4000101@o2.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <516A8646.4000101@o2.pl>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1365967711;
 bh=Xhq5IrUfIsGMwcteqwHSZ8Q8/UYGbrM8mlXwUKgQxV8=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=qi6dwvBY1NZBMSLmCreMheNr6yEkZRH9bnMOWrAn2C3sD6foIX3cgbVOW5U2JCv5a
 68rcMkLcQho+lQktkEUmc7gfo3F3ZndwPNNCgdjtul6N+JKJY4o7Trl+WxP8oC5zH8
 gl9uSHaORGFtta+MGBw5Fli5BeV/9EDzoqbwRynJavDFq2b0Nhu4615rG7Gf5ZBNCt
 KH6zwoK+s2E7efwVyNZEidaAMrRK1FCcqqvN9b3zB5ihMCt1H7DIRtqojkKHG7Aq2Q
 Zzyun+2wrdbwSk5BD7b66oGLaO5Jfv1GP7bd491wwkeAIgwS+VufOrDtA45nzc1Blu
 XUQIA8JoR9xww==
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 19:28:31 -0000

On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote:
> On 13/04/2013 02:07, Jeremy Chadwick wrote:
> >On Sat, Apr 13, 2013 at 12:33:10AM +0200, Radio m?odych bandytw wrote:
> >>On 13/04/2013 00:03, Jeremy Chadwick wrote:
> >>>On Fri, Apr 12, 2013 at 11:52:31PM +0200, Radio m?odych bandytw wrote:
> >>>>On 11/04/2013 23:24, Jeremy Chadwick wrote:
> >>>>>On Thu, Apr 11, 2013 at 10:47:32PM +0200, Radio m?odych bandytw wrote:
> >>>>>>Seeing a ZFS thread, I decided to write about a similar problem that
> >>>>>>I experience.
> >>>>>>I have a failing drive in my array. I need to RMA it, but don't have
> >>>>>>time and it fails rarely enough to be a yet another annoyance.
> >>>>>>The failure is simple: it fails to respond.
> >>>>>>When it happens, the only thing I found I can do is switch consoles.
> >>>>>>Any command fails, login fails, apps hang.
> >>>>>>
> >>>>>>On the 1st console I see a series of messages like:
> >>>>>>
> >>>>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED
> >>>>>>
> >>>>>>I use RAIDZ1 and I'd expect that none single failure would cause the
> >>>>>>system to fail...
> >>>>>
> >>>>>You need to provide full output from "dmesg", and you need to define
> >>>>>what the word "fails" means (re: "any command fails", "login fails").
> >>>>Fails = hangs. When trying to log it, I can type my user name, but
> >>>>after I press enter the prompt for password never appear.
> >>>>As to dmesg, tough luck. I have 2 photos on my phone and their
> >>>>transcripts are all I can give until the problem reappears (which
> >>>>should take up to 2 weeks). Photos are blurry and in many cases I'm
> >>>>not sure what exactly is there.
> >>>>
> >>>>Screen1:
> >>>>(ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?)
> >>>>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
> >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut)
> >>>>00
> >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut)
> >>>>00
> >>>>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut)
> >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut)
> >>>>00
> >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>
> >>>>
> >>>>Screen 2:
> >>>>ahcich0: Timeout on slot 29 port 0
> >>>>ahcich0: (unreadable, lots of numbers, some text)
> >>>>(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
> >>>>(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
> >>>>ahcich0: Timeout on slot 29 port 0
> >>>>ahcich0: (unreadable, lots of numbers, some text)
> >>>>(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut)
> >>>>(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked
> >>>>ahcich0: Timeout on slot 30 port 0
> >>>>ahcich0: (unreadable, lots of numbers, some text)
> >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
> >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut)
> >>>>
> >>>>Both are from the same event. In general, messages:
> >>>>
> >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout
> >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
> >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED.
> >>>>
> >>>>are the most common.
> >>>>
> >>>>I've waited for more than 1/2 hour once and the system didn't return
> >>>>to a working state, the messages kept flowing and pretty much
> >>>>nothing was working. What's interesting, I remember that it happened
> >>>>to me even when I was using an installer (PC-BSD one), before the
> >>>>actual installation began, so the disk stored no program data. And I
> >>>>*think* there was no ZFS yet anyway.
> >>>>
> >>>>>
> >>>>>I've already demonstrated that loss of a disk in raidz1 (or even 2 disks
> >>>>>in raidz2) does not cause ""the system to fail"" on stable/9.  However,
> >>>>>if you lose enough members or vdevs to cause catastrophic failure, there
> >>>>>may be anomalies depending on how your system is set up:
> >>>>>
> >>>>>http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html
> >>>>>
> >>>>>If the pool has failmode=wait, any I/O to that pool will block (wait)
> >>>>>indefinitely.  This is the default.
> >>>>>
> >>>>>If the pool has failmode=continue, existing write I/O operations will
> >>>>>fail with EIO (I/O error) (and hopefully applications/daemons will
> >>>>>handle that gracefully -- if not, that's their fault) but any subsequent
> >>>>>I/O (read or write) to that pool will block (wait) indefinitely.
> >>>>>
> >>>>>If the pool has failmode=panic, the kernel will immediately panic.
> >>>>>
> >>>>>If the CAM layer is what's wedged, that may be a different issue (and
> >>>>>not related to ZFS).  I would suggest running stable/9 as many
> >>>>>improvements in this regard have been committed recently (some related
> >>>>>to CAM, others related to ZFS and its new "deadman" watcher).
> >>>>
> >>>>Yeah, because of the installer failure, I don't think it's related to ZFS.
> >>>>Even if it is, for now I won't set any ZFS properties in hope it
> >>>>repeats and I can get better data.
> >>>>>
> >>>>>Bottom line: terse output of the problem does not help.  Be verbose,
> >>>>>provide all output (commands you type, everything!), as well as any
> >>>>>physical actions you take.
> >>>>>
> >>>>Yep. In fact having little data was what made me hesitate to write
> >>>>about it; since I did already, I'll do my best to get more info,
> >>>>though for now I can only wait for a repetition.
> >>>>
> >>>>
> >>>>On 12/04/2013 00:08, Quartz wrote:>
> >>>>>>Seeing a ZFS thread, I decided to write about a similar problem that I
> >>>>>>experience.
> >>>>>
> >>>>>I'm assuming you're referring to my "Failed pool causes system to hang"
> >>>>>thread. I wonder if there's some common issue with zfs where it locks up
> >>>>>if it can't write to disks how it wants to.
> >>>>>
> >>>>>I'm not sure how similar your problem is to mine. What's your pool setup
> >>>>>look like? Redundancy options? Are you booting from a pool? I'd be
> >>>>>interested to know if you can just yank the cable to the drive and see
> >>>>>if the system recovers.
> >>>>>
> >>>>>You seem to be worse off than me- I can still login and run at least a
> >>>>>couple commands. I'm booting from a straight ufs drive though.
> >>>>>
> >>>>>______________________________________
> >>>>>it has a certain smooth-brained appeal
> >>>>>
> >>>>Like I said, I don't think it's ZFS-specific, but just in case...:
> >>>>RAIDZ1, root on ZFS. I should reduce severity of a pool loss before
> >>>>pulling cables, so no tests for now.
> >>>
> >>>Key points:
> >>>
> >>>1. We now know why "commands hang" and anything I/O-related blocks
> >>>(waits) for you: because your root filesystem is ZFS.  If the ZFS layer
> >>>is waiting on CAM, and CAM is waiting on your hardware, then those I/O
> >>>requests are going to block indefinitely.  So now you know the answer to
> >>>why that happens.
> >>>
> >>>2. I agree that the problem is not likely in ZFS, but rather either with
> >>>CAM, the AHCI implementation used, or hardware (either disk or storage
> >>>controller).
> >>>
> >>>3. Your lack of "dmesg" is going to make this virtually impossible to
> >>>solve.  We really, ***really*** need that.  I cannot stress this enough.
> >>>This will tell us a lot of information about your system.  We're also
> >>>going to need to see "zpool status" output, as well as "zpool get all"
> >>>and "zfs get all".  "pciconf -lvbc" would also be useful.
> >>>
> >>>There are some known "gotchas" with certain models of hard disks or AHCI
> >>>controllers (which is responsible is unknown at this time), but I don't
> >>>want to start jumping to conclusions until full details can be provided
> >>>first.
> >>>
> >>>I would recommend formatting a USB flash drive as FAT/FAT32, booting
> >>>into single-user mode, then mounting the USB flash drive and issuing
> >>>the above commands + writing the output to files on the flash drive,
> >>>then provide those here.
> >>>
> >>>We really need this information.
> >>>
> >>>4. Please involve the PC-BSD folks in this discussion.  They need to be
> >>>made aware of issues like this so they (and iXSystems, potentially) can
> >>>investigate from their side.
> >>>
> >>OK, thanks for the info.
> >>Since dmesg is so important, I'd say the best thing is to wait for
> >>the problem to happen again. When it does, I'll restart the thread
> >>with every information that you requested here and with a PC-BSD
> >>cross-post.
> >>
> >>However, I just got a different hang just a while ago. This time it
> >>was temporary, I don't know, I switched to console0 after ~10
> >>seconds, there were 2 errors. Nothing appeared for ~1 minute, so I
> >>switched back and the system was OK. Different drive, I haven't seen
> >>problems with this one. And I think they used to be ahci, here's
> >>ata.
> >>
> >>dmesg:
> >>
> >>fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19
> >>(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00
> >>(ada1:ata0:0:0:0): CAM status: Command timeout
> >>(ada1:ata0:0:0:0): Retrying command
> >>vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9
> >>linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented
> >>(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00
> >>(ada1:ata0:0:0:0): CAM status: Command timeout
> >>(ada1:ata0:0:0:0): Retrying command
> >>
> >>{another 150KBytes of data snipped}
> >
> >The above output indicates that there was a timeout when trying to issue
> >a 48-bit DMA request to the disk.  The disk did not respond to the
> >request within 30 seconds.
> >
> >If you were using AHCI, we'd be able to see if the AHCI layer was
> >reporting signalling problems or other anomalies that could explain the
> >behaviour.  With ATA, such is significantly limited.  It's worse if
> >you're hiding/not showing us the entire information.
> >
> >The classic FreeBSD ATA driver does not provide command queueing (NCQ),
> >while AHCI via CAM does.  The difference is that command queueing causes
> >xxx_FPDMA_QUEUED CDBs to be issued to the disk.
> >
> >I'm going to repeat myself -- for the last time: CAN YOU PLEASE JUST
> >PROVIDE "DMESG" FROM THE SYSTEM?  Like after a fresh reboot?  If you're
> >able to provide all of the above, I don't know why you can't provide
> >dmesg.  It is the most important information that there is.  I am sick
> >and tired of stressing this point.
> Sorry. I thought just the error was important. So here you are:
> dmesg.boot:
> http://pastebin.com/LFXPusMX

Thank you.  Please read everything I have written below before doing
anything.

Based on this output, we can see the following:

* AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700
  controller:

  ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0

* The system has 3 disks attached to this controller:

  ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
  ada0: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
  ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
  ada0: Command Queueing enabled
  ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
  ada1 at ata0 bus 0 scbus6 target 0 lun 0
  ada1: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
  ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
  ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
  ada2 at ata0 bus 0 scbus6 target 1 lun 0
  ada2: <ST3640323AS SD13> ATA-8 SATA 2.x device
  ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
  ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C)

Let's talk about ada0 and ada1 first.

The WD15EARS drives have two features that can cause problems during I/O
requests:

- They are "energy efficient" drives ("Green", "GreenPower", or -GP).
  These drives are known to repeatedly and very aggressively park their
  heads, which can cause horrible performance and other I/O anomalies,
  particularly timeouts during I/O operations (reads or writes).

- Their physical sector size is 4096 bytes, but like all drives, logically
  advertise 512 byte sectors to retain compatibility.  Partitions which
  do not align themselves to 4096-byte boundaries will result in abysmally
  degraded performance, particularly during writes.  The drive
  internally may also have issues trying to deal with this situation
  after prolonged use (in a non-aligned state).

  When using 4KByte sector drives with ZFS, you have to "prep" them
  using gnop(8) first.  Ivan Voras describes this procedure here:

  http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html

  When using 4KByte sector drives with UFS/UFS2, the easiest method is
  to use the GPT partitioning scheme (rather than classic MBR) to ensure
  alignment.  This can be done manually through gpart(8).  Warren
  Block's article on this is recommended (for both SSDs and MHDDs)<
  though for MHDDs you can change the alignment size to "4k" rather than
  "1m":

  http://www.wonkity.com/~wblock/docs/html/ssd.html

- ada1 is only negotiating SATA150 speeds, even though this is a
  SATA300-capable drive (compare it to ada0).  The WD15EARS drives have
  a jumper that can limit the PHY speed to SATA150 speeds.  Please
  shut the system down and remove the disk (ada1) and physically
  examine it to see if that jumper is installed.  If it is, please
  remove it.  Jumper location:
  http://wdc.custhelp.com/app/answers/detail/search/1/a_id/1679#jumper

Now on ada2...

The ST3640323AS is one of Seagate's infamous Barracuda 7200.11 drives,
which are known for:

- Infamous firmware bugs, the most major of which is the drive becoming
  permanently catatonic (this has been covered by the media at great
  length).

- Being "energy efficient", which means excessively parking its heads
  (same issue as the WD "Green" drives, but with no way to disable
  or inhibit the behaviour).

The firmware on your ST3640323AS is version SD13; the latest firmware
is SD1B (8 versions newer).  I would strongly suggest upgrading this
drive ASAP.

Thankfully the ST3640323AS is a true 512-byte sector drive.

Back to the rest of the specifics...

* ZFS is in use for the root filesystem and possibly others:

Trying to mount root from zfs:tank1/ROOT/default []...

* The system is amd64 and has 4GB RAM; ZFS prefetch is therefore
  forcefully disabled:

real memory  = 4294967296 (4096 MB)
avail memory = 4073431040 (3884 MB)
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 5
ZFS storage pool version 28

...now, all that said, much more output is needed.

I would like to see output from the following commands:

- gpart show ada0
- gpart show ada1
- gpart show ada2
- zfs get all (keep reading for why I'm asking for this again)

And also this command, run per every pool you have on the system:

- zdb -C {poolname} | grep ashift

(For readers) I do not need "zpool status" because I've seen it here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/017011.html
http://pastebin.com/D3Av7x9X

(Also for readers) I do not need "zfs get all" or "zpool get all"
because the individual has made them available here:

zfs get all   -- http://pastebin.com/4sT37VqZ
zpool get all -- http://pastebin.com/HZJTJPa2

One thing I do see is that at some point you did enable compression on
your tank1 filesystem (I can tell because "compressratio" is 1.03x
rather than 1.00x), but may have disabled it later.  I'm not going to
get into a debate about this, but my advice is to not use compression or
dedup (either feature) on FreeBSD ZFS.

Finally, as I asked in another post in this thread, I would like you to
provide output from the following command (once per disk):

- smartctl -x /dev/adaX

> >Furthermore, please stop changing ATA vs. AHCI interface drivers.
> >The more you change/screw around with, the less likely people are going
> >to help.  CHANGE NOTHING ON THE SYSTEM.  Leave it how it is.  Do not
> >fiddle with things or start flipping switches/changing settings/etc. to
> >"try and relieve the problem".  You're asking other people for help,
> >which means you need to be patient and follow what we ask.
> I haven't changed one bit myself. It may have been a change of
> defaults in PC-BSD. I just asked them about it.
> Or maybe different drives use different drivers.

If AHCI is enabled in your system BIOS, FreeBSD 9.x will use AHCI with
CAM.  If AHCI is not enabled in your system BIOS, FreeBSD 9.x will use
classic ata(4) with CAM.  In both cases disks will show up as /dev/adaX,
but whether one is controlled with ahcichXX or ataX depends on AHCI
capability.

Your initial post showed this:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016996.html

(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED

Which shows AHCI in use.

But then later, a different post said this:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/017011.html

(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00
(ada1:ata0:0:0:0): CAM status: Command timeout
(ada1:ata0:0:0:0): Retrying command

Which shows AHCI **not** in use.

FreeBSD **does not** use "different drivers per drive".  No OS does
this: period.  This is not how storage subsystems work, nor have ever
worked.

If you ever see the system suddenly reporting "ataX" (read: I said
"atax" not "adaX"), and you are **ABSOLUTELY CERTAIN** AHCI mode is
enabled in your BIOS, then to me that means your motherboard or SATA
controller is behaving very erratically/wrong.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 19:44:41 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 80828AD1
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:44:41 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta01.emeryville.ca.mail.comcast.net
 (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16])
 by mx1.freebsd.org (Postfix) with ESMTP id 63FFDF25
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:44:41 +0000 (UTC)
Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90])
 by qmta01.emeryville.ca.mail.comcast.net with comcast
 id Pvah1l0051wfjNsA1vkhNK; Sun, 14 Apr 2013 19:44:41 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta23.emeryville.ca.mail.comcast.net with comcast
 id Pvkg1l00G1t3BNj8jvkggn; Sun, 14 Apr 2013 19:44:40 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id 36D8773A33; Sun, 14 Apr 2013 12:44:40 -0700 (PDT)
Date: Sun, 14 Apr 2013 12:44:40 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Zaphod Beeblebrox <zbeeble@gmail.com>
Subject: Re: A failed drive causes system to hang
Message-ID: <20130414194440.GB38338@icarus.home.lan>
References: <516A8092.2080002@o2.pl>
 <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
 <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan>
 <CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1365968681;
 bh=0DLeLOXztL+2XP6AuRY3dQhcrDrRHJSz8iTbu3d9Bno=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=KtbYfALT8TkT9r4ArqmVd35mYKsaKk/qmhw+TTr+2TKiRUaX1kSY+cMzJvI2PFYAX
 bwNZ1p1NNxnvRM+WZm6hpurl1mJZJQy+x5sBBcZ4nD7488owDmXDHDVq3Nr42eACjy
 TZrdaWxcy/wB/4fac/PES5dS9K96fljAWKQT9Sjn7z19a31ijU8FTNTPbzOG53uwo5
 hLugpJvLbcmA8ebaPXBtczcU41l1sT8y0qS0h9t1+zM219Z26pjynafOS2CnUoS0CB
 1lpnVPxQZQiZ9EG8htCo7dUXovrJXDYkWJBgZxRI2PPNOtHISxo8M1OOWX6uKPGJQJ
 ebiy9uqHMmFbw==
Cc: freebsd-fs <freebsd-fs@freebsd.org>,
 Radio =?unknown-8bit?B?bcS5P29keWNoIGJhbmR5dMQ/xT93?=
 <radiomlodychbandytow@o2.pl>, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 19:44:41 -0000

On Sun, Apr 14, 2013 at 02:58:15PM -0400, Zaphod Beeblebrox wrote:
> I'd like to throw in my two cents here.  I've seen this (drives in RAID-1
> configuration) hanging whole systems.  Back in the IDE days, two drives
> were connected with one cable --- I largely wrote it off as a deficiency of
> IDE hardware and resolved to by SCSI hardware for more important systems.
> Of late, the physical hardware for SCSI (SAS) and SATA drives have
> converged.  I'm willing to accept that SAS hardware may be built to a
> different standard, but I'm suspicious of the fact that a bad SATA drive on
> an ACH* controller can hang the whole system.

Note to readers: this is borderline off-topic and is going to confuse
the thread even more.  I will respond to this ONLY ONCE, and WILL NOT be
responding to this part of the thread past this point.

I have only seen this happen on very specific controllers (JMicron for
example), where either the AHCI driver was broken/badly written, or the
underlying AHCI option ROM/firmware code was broken/badly written.

> ... it's not complete, however.  Often pulling the drive's cable will
> unfreeze things.  It's also not entirely consistent.  Drives I have
> behind 4:1 port multipliers haven't (so far) hung the system that
> they're on (which uses ACH10).  Right now, I have a remote ACH10
> system that's hung hard a couple of times --- and it passes both it's
> short and long SMART tests on both drives.

PMPs (port multipliers) are a *completely* separate beast, where some
AHCI controllers (at a silicon level) screw up/break.  In fact, the
IXP600/700 is one such controller, and workarounds had to be put into
FreeBSD and Linux for them.  I can dig up the commits if need be.

Rule of thumb (which you know -- this is for other readers): when using
a PM, it's VERY IMPORTANT that be disclosed up front.  These add a
serious complication to analysis of the SATA subsystem as a whole, and
in a lot of cases visibility into details are lost as a result.  PMPs in
general are "bleh".

> Is there no global timeout we can depend on here?

Please see kern.cam.ada.default_timeout (for adaX devices) and
kern.cam.pmp.default_timeout (for I/O requests going across a PMP).
Otherwise Alexander Motin (mav@) would be the guy to ask about PMP
issues, and/or get him hardware + provide a reliable reproduction
methodology for the issue.

All the above said:

Respectfully, please do not conflate your issue with this one.

Please start a new thread (do not reply to this thread and change the
Subject line, please actually start a brand new Email to ensure no
Reference headers are retained) about this issue if you wish.

There is already too much crap going on in this thread with 4 different
people with what are 4 different issues, and nobody at this point is
able to keep track of it all (including the participants).

This situation happens way, WAY too often with storage-related matters
on the list.  ANYTHING ZFS-related and ANYTHING storage-related results
in bandwagon-jumping and threads that spiral out of control/become
almost useless and certainly impossible to follow.  It needs to stop.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 19:52:14 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 486E6BA5
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:52:14 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta03.emeryville.ca.mail.comcast.net
 (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32])
 by mx1.freebsd.org (Postfix) with ESMTP id 2BC66F53
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 19:52:14 +0000 (UTC)
Received: from omta16.emeryville.ca.mail.comcast.net ([76.96.30.72])
 by qmta03.emeryville.ca.mail.comcast.net with comcast
 id PvlN1l0061ZMdJ4A3vsEzT; Sun, 14 Apr 2013 19:52:14 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta16.emeryville.ca.mail.comcast.net with comcast
 id PvsC1l0031t3BNj8cvsCH4; Sun, 14 Apr 2013 19:52:13 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id F2A2873A33; Sun, 14 Apr 2013 12:52:11 -0700 (PDT)
Date: Sun, 14 Apr 2013 12:52:11 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?=
 <radiomlodychbandytow@o2.pl>
Subject: Re: A failed drive causes system to hang
Message-ID: <20130414195211.GA39201@icarus.home.lan>
References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org>
 <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
 <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130414192830.GA38338@icarus.home.lan>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1365969134;
 bh=aN5pjazc8G4sxyH4/0He8A6Tw+BncOmtEMIaLNiGkiQ=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=NJs0/JFwDP1bvE9ujhC4nVuq5ZlVhLDMNH9udeuaNiYdtnBTbESW48dLVZcp509A0
 X2y/5ZdOk6UlpXGgY/54P/eoO1n82zJzHWcg5IundwgRmVx5v0kJ7wWUh7YV+FCL6E
 MkAJm5TwOHtuRkwv5JERNg3tI4B69zYCyLOU2o13/Kl9FW0zijMtumIYFDwbyQrlCn
 HlOXAR1RE+aee5b+uGsCjdeKnpJahU186Yb/vaOm2TvB3+Lx6X0F6aa7wR7ERO6maT
 fgUKxTx2nvh0gU/HzOuAAL3VOIGEW3z8+NEQJXzXWw4KxBBKhAnaJ5q/3P9oMb8x2v
 yHK+Wc1rBpRow==
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 19:52:14 -0000

{snipping lots for brevity}

On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote:
> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote:
> > Sorry. I thought just the error was important. So here you are:
> > dmesg.boot:
> > http://pastebin.com/LFXPusMX
> 
> Thank you.  Please read everything I have written below before doing
> anything.
> 
> Based on this output, we can see the following:
> 
> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700
>   controller:
> 
>   ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0
> 
> * The system has 3 disks attached to this controller:
> 
>   ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
>   ada0: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>   ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>   ada0: Command Queueing enabled
>   ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>   ada1 at ata0 bus 0 scbus6 target 0 lun 0
>   ada1: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>   ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>   ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>   ada2 at ata0 bus 0 scbus6 target 1 lun 0
>   ada2: <ST3640323AS SD13> ATA-8 SATA 2.x device
>   ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>   ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C)
> 
> Let's talk about ada0 and ada1 first.

Hold up a minute -- I just noticed some key information here (see what
happens with big conflated threads?), and it sheds some light on my
concerns with AHCI vs. classic ata(4):

ada0 -- attached to ahcich0
ada1 -- attached to ata0 (presumably a "master" drive)
ada2 -- attached to ata0 (presumably a "slave" drive)

This is extremely confusing, because ata0 is a classic ATA controller (I
can even tell from the classic ISA I/O port ranges):

atapci1: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
ata0: <ATA channel> at channel 0 on atapci1
ata1: <ATA channel> at channel 1 on atapci1

Yet the WD15EARS and ST3640323AS drives are physically SATA drives.

Are you using SATA-to-IDE adapters on these two drives?

If not, this seems to indicate the motherboard and/or SATA controller
is actually only binding 1 disk to AHCI, while the others are bound to
the same controller operating in (possibly) "SATA Enhanced" mode.

This would be the first I've ever seen of this (a controller operating
in both modes simultaneously), but I have a lot more experience with
Intel SATA controllers than I do AMD.

I don't know why a system would do this, unless all of this can be
controlled via the BIOS somehow.  What a mess.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 20:35:54 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 33968776
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 20:35:54 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh3-ve1.go2.pl (moh3-ve2.go2.pl [193.17.41.86])
 by mx1.freebsd.org (Postfix) with ESMTP id A481C12C
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 20:35:53 +0000 (UTC)
Received: from moh3-ve1.go2.pl (unknown [10.0.0.157])
 by moh3-ve1.go2.pl (Postfix) with ESMTP id 4C40AAF696B
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 22:35:52 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh3-ve1.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 22:35:52 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id nEdjtI;
 Sun, 14 Apr 2013 22:35:44 +0200
Message-ID: <516B1315.8060408@o2.pl>
Date: Sun, 14 Apr 2013 22:35:33 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Jeremy Chadwick <jdc@koitsu.org>
Subject: Re: A failed drive causes system to hang
References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org>
 <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
 <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan>
 <20130414195211.GA39201@icarus.home.lan>
In-Reply-To: <20130414195211.GA39201@icarus.home.lan>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 30
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 20:35:54 -0000

On 14/04/2013 21:52, Jeremy Chadwick wrote:
> {snipping lots for brevity}
> 
> On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote:
>> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote:
>>> Sorry. I thought just the error was important. So here you are:
>>> dmesg.boot:
>>> http://pastebin.com/LFXPusMX
>>
>> Thank you.  Please read everything I have written below before doing
>> anything.
>>
>> Based on this output, we can see the following:
>>
>> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700
>>   controller:
>>
>>   ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0
>>
>> * The system has 3 disks attached to this controller:
>>
>>   ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
>>   ada0: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>>   ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>>   ada0: Command Queueing enabled
>>   ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>>   ada1 at ata0 bus 0 scbus6 target 0 lun 0
>>   ada1: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>>   ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>>   ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>>   ada2 at ata0 bus 0 scbus6 target 1 lun 0
>>   ada2: <ST3640323AS SD13> ATA-8 SATA 2.x device
>>   ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>>   ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C)
>>
>> Let's talk about ada0 and ada1 first.
> 
> Hold up a minute -- I just noticed some key information here (see what
> happens with big conflated threads?), and it sheds some light on my
> concerns with AHCI vs. classic ata(4):
> 
> ada0 -- attached to ahcich0
> ada1 -- attached to ata0 (presumably a "master" drive)
> ada2 -- attached to ata0 (presumably a "slave" drive)
> 
> This is extremely confusing, because ata0 is a classic ATA controller (I
> can even tell from the classic ISA I/O port ranges):
> 
> atapci1: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
> ata0: <ATA channel> at channel 0 on atapci1
> ata1: <ATA channel> at channel 1 on atapci1
> 
> Yet the WD15EARS and ST3640323AS drives are physically SATA drives.
> 
> Are you using SATA-to-IDE adapters on these two drives?
No.
> 
> If not, this seems to indicate the motherboard and/or SATA controller
> is actually only binding 1 disk to AHCI, while the others are bound to
> the same controller operating in (possibly) "SATA Enhanced" mode.
> 
> This would be the first I've ever seen of this (a controller operating
> in both modes simultaneously), but I have a lot more experience with
> Intel SATA controllers than I do AMD.
> 
> I don't know why a system would do this, unless all of this can be
> controlled via the BIOS somehow.  What a mess.
> 
I looked into BIOS and it can be controlled. 6 ports are divided into 2
triples and I can switch mode of each triple independently. One drive is
connected to one and two to the other.
Looks like there's a bug because both triples are set to ATA.
I left them like that for now.

Anyway, I got the hang again, so I can provide dmesg. I was not at the
computer when it happened, so there's only the last screen though...
pastebin.com/bjYtzPgs
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 14 21:24:42 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C953DC9
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 21:24:42 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta01.emeryville.ca.mail.comcast.net
 (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16])
 by mx1.freebsd.org (Postfix) with ESMTP id AA4512A7
 for <freebsd-fs@freebsd.org>; Sun, 14 Apr 2013 21:24:42 +0000 (UTC)
Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19])
 by qmta01.emeryville.ca.mail.comcast.net with comcast
 id PxPV1l0040QkzPwA1xQimh; Sun, 14 Apr 2013 21:24:42 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta02.emeryville.ca.mail.comcast.net with comcast
 id PxQg1l00F1t3BNj8NxQgnr; Sun, 14 Apr 2013 21:24:42 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id 4F0D673A33; Sun, 14 Apr 2013 14:24:40 -0700 (PDT)
Date: Sun, 14 Apr 2013 14:24:40 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?=
 <radiomlodychbandytow@o2.pl>
Subject: Re: A failed drive causes system to hang
Message-ID: <20130414212440.GA40325@icarus.home.lan>
References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
 <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan>
 <20130414195211.GA39201@icarus.home.lan> <516B1315.8060408@o2.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <516B1315.8060408@o2.pl>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1365974682;
 bh=tqDJomj+Mbsm94ct5aQI5JZO6ZTot4nO+yJwvUUFusE=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=LpFX2Kg8y3t6nXM+Dhr1vBtJ/OSPYdbKQqhZFqSsYMO0OVTnSEUw+MzVozTItB8Lg
 aHNXzixx/0WBur76N+4m6YpmbqsGiQ7hWuvM2g6CM/OLnQfJ9RrzhywvwBLIE1Z9sM
 4yv+Yz2mzhIw9fODEKjZFA77ZqZ91fktBAOongnWT94rehVXxoBLEYVanls7sCs4NR
 iXtzxQVfxD36rQ285BUx9qDmF0qRhU78HxNTJKqh4XhL1+Up6ygcCA8k+N3E9qi4Kw
 Z3Y0wRKMk9RzgVQVMussUEm7ZXrXeV+ONwCuo7142UxTaa2Cw7LYQcWH9eE/th1wI/
 i/TtMMiKtLP5w==
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Apr 2013 21:24:42 -0000

On Sun, Apr 14, 2013 at 10:35:33PM +0200, Radio m?odych bandytw wrote:
> On 14/04/2013 21:52, Jeremy Chadwick wrote:
> > {snipping lots for brevity}
> > 
> > On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote:
> >> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote:
> >>> Sorry. I thought just the error was important. So here you are:
> >>> dmesg.boot:
> >>> http://pastebin.com/LFXPusMX
> >>
> >> Thank you.  Please read everything I have written below before doing
> >> anything.
> >>
> >> Based on this output, we can see the following:
> >>
> >> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700
> >>   controller:
> >>
> >>   ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0
> >>
> >> * The system has 3 disks attached to this controller:
> >>
> >>   ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
> >>   ada0: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
> >>   ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >>   ada0: Command Queueing enabled
> >>   ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
> >>   ada1 at ata0 bus 0 scbus6 target 0 lun 0
> >>   ada1: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
> >>   ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
> >>   ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
> >>   ada2 at ata0 bus 0 scbus6 target 1 lun 0
> >>   ada2: <ST3640323AS SD13> ATA-8 SATA 2.x device
> >>   ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
> >>   ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C)
> >>
> >> Let's talk about ada0 and ada1 first.
> > 
> > Hold up a minute -- I just noticed some key information here (see what
> > happens with big conflated threads?), and it sheds some light on my
> > concerns with AHCI vs. classic ata(4):
> > 
> > ada0 -- attached to ahcich0
> > ada1 -- attached to ata0 (presumably a "master" drive)
> > ada2 -- attached to ata0 (presumably a "slave" drive)
> > 
> > This is extremely confusing, because ata0 is a classic ATA controller (I
> > can even tell from the classic ISA I/O port ranges):
> > 
> > atapci1: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
> > ata0: <ATA channel> at channel 0 on atapci1
> > ata1: <ATA channel> at channel 1 on atapci1
> > 
> > Yet the WD15EARS and ST3640323AS drives are physically SATA drives.
> > 
> > Are you using SATA-to-IDE adapters on these two drives?
> No.
> > 
> > If not, this seems to indicate the motherboard and/or SATA controller
> > is actually only binding 1 disk to AHCI, while the others are bound to
> > the same controller operating in (possibly) "SATA Enhanced" mode.
> > 
> > This would be the first I've ever seen of this (a controller operating
> > in both modes simultaneously), but I have a lot more experience with
> > Intel SATA controllers than I do AMD.
> > 
> > I don't know why a system would do this, unless all of this can be
> > controlled via the BIOS somehow.  What a mess.
> > 
> I looked into BIOS and it can be controlled. 6 ports are divided into 2
> triples and I can switch mode of each triple independently. One drive is
> connected to one and two to the other.
> Looks like there's a bug because both triples are set to ATA.
> I left them like that for now.

What exact motherboard model is this?  I'd like to review the manual.

> Anyway, I got the hang again, so I can provide dmesg. I was not at the
> computer when it happened, so there's only the last screen though...
> pastebin.com/bjYtzPgs

Thank you.  Sadly the log snippet doesn't have timestamps but this is
what transpired:

The log snippet you showed indicates the following:

* An NCQ-based write CDB (WRITE_FPDMA_QUEUED) was issued to the
  ada0 drive attached to channel ahcich0 of controller ahci0, and
  the disk or controller did not respond within 30 seconds (I'm
  assuming PC-BSD did not change kern.cam.ada.default_timeout from
  the default of 30 seconds)

* The same request was resubmit to the controller (CAM will try
  submission of a CDB up to 5 times (i.e. 4 retries), which is
  controlled with kern.cam.ada.retry_count).

* The AHCI controller (rather the specific channel of the AHCI
  controller) also reported that the underlying disk/device was
  not responding (re: "Timeout on slot X port X").  I see no
  SERR condition.

* An ATA_IDENTIFY CDB was issued to the ada0 drive attached to
  channel ahcich0 of controller ahci0, and this also timed out
  after 30 seconds.  My gut feeling is that this system is
  running smartd(8); it's possible the kernel itself could submit
  the CDB to the drive, but in this condition/state I don't know
  why it'd do that.

* Rinse lather repeat.

To me, at first glance, this looks like the ada0 disk is going
catatonic.  The controller itself seems to be responding fine, just that
the disk attached to ahcich0 is locking up hard.  I see no sign of an
AHCI reset ("AHCI reset..." message) either.

So why does your system "hang" (meaning why can't you log in, why do
applications stop working, etc.) when this happens?  Simple:

You're using ZFS for your root filesystem, as shown here:

Trying to mount root from zfs:tank1/ROOT/default []...

Your ZFS pool called tank1 consists of a raidz1 pool of 3 devices
(more specifically partitions): ada0, ada1, and something that is
missing.  Recap:

http://pastebin.com/D3Av7x9X

The pool is already degraded, and as you know, raidz1 can only suffer up
to loss of one vdev (in this case a disk) before ZFS will begin behaving
based on what the pool's "failmode" property is.

In effect, when this happens, you're down to only 1 disk: ada1, and
that's not sufficient.  So ZFS does exactly what it should (with
failmode=wait, the default): it waits indefinitely, hoping that things
recover.

Because this is your root filesystem, as well as tons of other
filesystems (including /usr, /var, /var/log, and so on):

http://pastebin.com/4sT37VqZ

...any I/O submit to filesystems part of pool tank1 will indefinitely
block/wait until things recover.  Except they don't recover (and that
isn't the fault of ZFS).

I imagine if you let the system sit for roughly 5*30 seconds (see above
for how I calculated that), you would eventually see a message on the
console that looks something like this:

(ada0:ahcich0:0:0:0): lost device
(ada0:ahcich0:0:0:0): removing device entry

So, the crux of your problem is:

1. Your disks fall off the bus for reasons unknown at this time -- I'm
still waiting on smartctl -x output for each of your disks.  Your disks
themselves may actually be okay (I need to review the output to
determine that) and the issue may be with SATA cabling, a faulty PSU, or
a completely broken SATA controller or motherboard (bad traces, etc.).
I am not going to help diagnose those problems, because the only
reliable method is to start replacing each part, piece by piece, and see
if the issue goes away.

2. Your array is already degraded/broken yet you don't care to fix it.
If the array was in decent shape and ada0 fell off the bus, things would
still work because ada1 and ada2 would be functioning (re: raidz1).

If you were using UFS instead of ZFS for your root filesystem, you would
still have the same issue, just that the system would kernel panic.  You
can induce that behaviour with ZFS as well using failmode=panic.

There isn't much more for me to say.  Everything is behaving how it's
designed, from what I can tell.

When you lose your root filesystem, you really can't expect the system
to be in some "magical usable state".

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 15 01:32:10 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 085512C4
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 01:32:10 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 86B7A9F8
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 01:32:08 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEAJNXa1GDaFvO/2dsb2JhbABQgzyDML4RgR50gh8BAQEDAQEBASArIAsFFhgCAg0ZAikBCRgBDQYIBwQBHASHbQYMqCmRXYEjjEJ+NAeCLoETA5M4gQyCQYEhj3CDJyAygQU1
X-IronPort-AV: E=Sophos;i="4.87,472,1363147200"; d="scan'208";a="23873831"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 14 Apr 2013 21:32:01 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 575C1B3F15;
 Sun, 14 Apr 2013 21:32:01 -0400 (EDT)
Date: Sun, 14 Apr 2013 21:32:01 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Paul van der Zwan <paulz@vanderzwan.org>
Message-ID: <1091296771.826148.1365989521302.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <2B576479-C83A-4D3F-B486-475625383E9C@vanderzwan.org>
Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Apr 2013 01:32:10 -0000

Paul van der Zwan wrote:
> On 14 Apr 2013, at 5:00 , Rick Macklem <rmacklem@uoguelph.ca> wrote:
> 
> 
> Thanks for taking the effort to send such an extensive reply.
> 
> > Paul van der Zwan wrote:
> >> On 12 Apr 2013, at 16:28 , Paul van der Zwan <paulz@vanderzwan.org>
> >> wrote:
> >>
> >>>
> >>> I am running a few VirtualBox VMs with 9.1 on my OpenIndiana
> >>> server
> >>> and I noticed that make buildworld seem to take much longer
> >>> when the clients mount /usr/src and /usr/obj over NFS V4 than when
> >>> they use V3.
> >>> Unfortunately I have to use V4 as a buildworld on V3 hangs the
> >>> server completely...
> >>> I noticed the number of PUTFH/GETATTR/GETFH calls in in the order
> >>> of
> >>> a few thousand per second
> >>> and if I snoop the traffic I see the same filenames appear over
> >>> and
> >>> over again.
> >>> It looks like the client is not caching anything at all and using
> >>> a
> >>> server request everytime.
> >>> I use the default mount options:
> >>> 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls)
> >>> 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls)
> >>> 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls)
> >>>
> >>>
> >>
> >> I had a look with dtrace
> >> $ sudo dtrace -n '::getattr:start { @[stack()]=count();}'
> >> and it seems the vast majority of the calls to getattr are from
> >> open()
> >> and close() system calls.:
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_getattr+0x280
> >> kernel`VOP_GETATTR_APV+0x74
> >> kernel`nfs_lookup+0x3cc
> >> kernel`VOP_LOOKUP_APV+0x74
> >> kernel`lookup+0x69e
> >> kernel`namei+0x6df
> >> kernel`kern_execve+0x47a
> >> kernel`sys_execve+0x43
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 26
> >>
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_close+0x3e9
> >> kernel`VOP_CLOSE_APV+0x74
> >> kernel`kern_execve+0x15c5
> >> kernel`sys_execve+0x43
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 26
> >>
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_getattr+0x280
> >> kernel`VOP_GETATTR_APV+0x74
> >> kernel`nfs_lookup+0x3cc
> >> kernel`VOP_LOOKUP_APV+0x74
> >> kernel`lookup+0x69e
> >> kernel`namei+0x6df
> >> kernel`vn_open_cred+0x330
> >> kernel`vn_open+0x1c
> >> kernel`kern_openat+0x207
> >> kernel`kern_open+0x19
> >> kernel`sys_open+0x18
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 2512
> >>
> >> kernel`newnfs_request+0x631
> >> kernel`nfscl_request+0x75
> >> kernel`nfsrpc_getattr+0xbe
> >> kernel`nfs_close+0x3e9
> >> kernel`VOP_CLOSE_APV+0x74
> >> kernel`vn_close+0xee
> >> kernel`vn_closefile+0xff
> >> kernel`_fdrop+0x3a
> >> kernel`closef+0x332
> >> kernel`kern_close+0x183
> >> kernel`sys_close+0xb
> >> kernel`amd64_syscall+0x3bf
> >> kernel`0xffffffff80784947
> >> 2530
> >>
> >> I had a look at the source of nfs_close and could not find a call
> >> to
> >> nfsrpc_getattr, and I am wondering why close would be calling
> >> getattr
> >> anyway.
> >> If the file is closed what do we care about it's attributes....
> >>
> > Here are some random statements w.r.t. NFSv3 vs NFSv4 that might
> > help
> > with an understanding of what is going on. I do address the specific
> > case of nfs_close() towards the end. (It is kinda long winded, but I
> > threw out eveything I could think of..)
> >
> > NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and
> > Close operations.
> >
> > In NFSv3, each RPC is defined and usually includes attributes for
> > files
> > before and after the operation (implicit getattrs not counted in the
> > RPC
> > counts reported by nfsstat).
> >
> > For NFSv4, every RPC is a compound built up of a list of Operations
> > like
> > Getattr. Since the NFSv4 server doesn't know what the compound is
> > doing,
> > nfsstat reports the counts of Operations for the NFSv4 server, so
> > the counts
> > will be much higher than with NFSv3, but do not reflect the number
> > of RPCs being done.
> > To get NFSv4 nfsstat output that can be compared to NFSv3, you need
> > to
> > do the command on the client(s) and it still is only roughly the
> > same.
> > (I just realized this should be documented in man nfsstat.)
> >
> I ran nfsstat -s -v 4 on the server and saw the number of requests
> being done.
> They were in the order of a few thousand per second for a single
> FreeBSD 9.1 client
> doing a make build world.
> 
Yes, but as I noted above, for NFSv4, these are counts of operations,
not RPCs. Each RPC in NFSv4 consists of several operations. For example,
for read it is something like:
- PutFH, Read, Getattr

As such, you need to do "nfsstat -e -c" on the client in order to
see how many RPCs are happening.

> > For the FreeBSD NFSv4 client, the compounds include Getattr
> > operations
> > similar to what NFSv3 does. It doesn't do a Getattr on the directory
> > for Lookup, because that would have made the compound much more
> > complex.
> > I don't think this will have a significant performance impact, but
> > will
> > result in some additional Getattr RPCs.
> >
> I ran snoop on port 2049 on the server and I saw a large number of
> lookups.
> A lot of them seem to be for directories which are part of the
> filenames of
> the compiler and include files which on the nfs mounted /usr/obj.
> The same names keep reappering so it looks like there is no caching
> being done on
> the client.
> 
Well, the name caching code is virtually identical to what is used
for NFSv3 and I have compared RPC counts (using client stats) in
the past (some while ago), to see if they are comparable.

A name cache entry (like everything in NFS caching) is only valid
for some amount of time (there are mount options for adjusting the
cache timeouts).

Now, I`m not saying it isn`t broken. I`ll take a look when I get
home and it is also rather hard to tell when it is broken.
Since NFS has no cache coherency protocol any amount of caching
can break correctness when files are being modified by another
client. The longer you cache, the more likely you are to see
a breakage.

> > I suspect the slowness is caused by the extra overhead of doing the
> > Open/Close operations against the server. The only way to avoid
> > doing
> > these against the server for NFSv4 is to enable delegations in both
> > client and server. How to do this is documented in "man nfsv4".
> > Basically
> > starting up the nfscbd in the client and setting:
> > vfs.nfsd.issue_delegations=1
> > in the server.
> >
> > Specifically for nfs_close(), the attributes (modify time)
> > is used for what is called "close to open consistency". This can be
> > disabled by the "nocto" mount option, if you don't need it for your
> > build environment. (You only need it if one client is writing a file
> > and then another client is reading the same file.)
> >
> I tried the nocto option in /etc/fstab but it does not show when mount
> shows
> the mounted filesystems so I am not sure if it is being used.
Head (and I think stable9) is patched so that ``nfsstat -m`` shows
all the options actually being used. For 9.1, you just have to trust
that it has been set.

> On the server netstat shows an active connection to port 7745 on the
> client
> but snoop shows no data flowing on that session.
> 
That connection is the callback path and is only used when delegations
are in use. You can check to see if the server is issuing delegations
via ``nfsstat -e -s`` on the server and looking at the delegation count,
to see if it is greater than 0.
Remember that you must enable delegations on the server by setting the
sysctl:
vfs.nfsd.issue_delegations=1

> > Both the attribute caching and close to open consistency algorithms
> > in the client are essentially the same for NFSv3 vs NFSv4.
> >
> > The NFSv4 Close operation(s) are actually done when the v_usecount
> > for
> > the vnode goes to 0, since mmap'd files can do I/O on pages after
> > the close syscall. As such, they are only loosely related to the
> > close
> > syscall. They are actually closing Windows style Openlock(s).
> >
> I had a look at the code of the NFS v4 client of Illumos ( which is
> basically what
> my server is running ) and as far as I understand it they only do the
> gettatr only when
> the close was for a file that was opened for write and when there was
> actually something
> written to the file.
> The FreeBSD code seems to do the getattr for all close() calls.
> For files that were never written, like executables or source files
> that seems
> to cause quite a lot of overhead.
> 
Well, what would happen for the Illumos client if another client had
just written to the file and closed it just before Illumos opens it
for reading. For cto, the client needs to see an up to date modify
time for the close to open consistency check to work, including opens
for reading.

As hinted at above, there is no correct answer to these questions. It
is all about correctness vs caching for better performance. (That`s why
jhb added the nocto option to turn it off if you don`t need this to
work correctly.)

> > You mention that you see the same file over and over in a packet
> > trace.
> > You don't give specifics, but I'd suggest that you look at both
> > NFSv3
> > and NFSv4 for this (and file names are in lookups, not getattrs).
> >
> > I'd suggest you try enabling delegations in both client and server,
> > plus
> > trying the "nocto" mount option and see if that helps.
> >
> Tried it but it does not seem to make any noticable difference.
> 
> I tried a make buildworld buildkernel with /usr/obj a local FS in the
> Vbox VM
> that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem
> it takes
> about a day. A twelve fold increase is elapsed time makes using NFSv4
> unusable
> for this use case.
Source builds on NFS mounts are notoriously slow. A big part of this is
the synchronous writes that get done because there is only one dirty
byte range for a block and the loader loves to write small non-contiguous
areas of its output file.

I have a patch that extends this to a list of dirty byte ranges, but it
has not been committed to head. I should try and get back to it this
summer.

> Too bad the server hangs when I use nfsv3 mount for /usr/obj.
Try this mount command:
mount -t nfs -o nfsv3,nolockd ...
(I do builds of the src tree NFS mounted, so the only reason I can
 think that it would hang would be a rpc.lockd issue.)
If this works, I suspect it will still be slow, but it would be nice to
find out how much slower NFSv4 is for your case.

rick

> Having a shared /usr/obj makes it possible to run a make buildworld on
> a single VM
> and just run make installworld on the others.
> 
> Paul
> 
> > rick
> >
> >>
> >> Paul
> >>
> >> _______________________________________________
> >> freebsd-fs@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >> To unsubscribe, send any mail to
> >> "freebsd-fs-unsubscribe@freebsd.org"
> >

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 15 06:24:43 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EC179899
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 06:24:43 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh2-ve2.go2.pl (moh2-ve2.go2.pl [193.17.41.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 682846B7
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 06:24:43 +0000 (UTC)
Received: from moh2-ve2.go2.pl (unknown [10.0.0.200])
 by moh2-ve2.go2.pl (Postfix) with ESMTP id CA82FB010E1
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 08:24:26 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh2-ve2.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 08:24:26 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IbXflU;
 Mon, 15 Apr 2013 08:24:26 +0200
Message-ID: <516B9D19.2030909@o2.pl>
Date: Mon, 15 Apr 2013 08:24:25 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Jeremy Chadwick <jdc@koitsu.org>
Subject: Re: A failed drive causes system to hang
References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
 <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan>
 <20130414195211.GA39201@icarus.home.lan> <516B1315.8060408@o2.pl>
 <20130414212440.GA40325@icarus.home.lan>
In-Reply-To: <20130414212440.GA40325@icarus.home.lan>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 35
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Apr 2013 06:24:44 -0000

On 14/04/2013 23:24, Jeremy Chadwick wrote:
> On Sun, Apr 14, 2013 at 10:35:33PM +0200, Radio m?odych bandytw wrote:
>> On 14/04/2013 21:52, Jeremy Chadwick wrote:
>>> {snipping lots for brevity}
>>>
>>> On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote:
>>>> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote:
>>>>> Sorry. I thought just the error was important. So here you are:
>>>>> dmesg.boot:
>>>>> http://pastebin.com/LFXPusMX
>>>>
>>>> Thank you.  Please read everything I have written below before doing
>>>> anything.
>>>>
>>>> Based on this output, we can see the following:
>>>>
>>>> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700
>>>>   controller:
>>>>
>>>>   ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0
>>>>
>>>> * The system has 3 disks attached to this controller:
>>>>
>>>>   ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
>>>>   ada0: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>>>>   ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>>>>   ada0: Command Queueing enabled
>>>>   ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>>>>   ada1 at ata0 bus 0 scbus6 target 0 lun 0
>>>>   ada1: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>>>>   ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>>>>   ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>>>>   ada2 at ata0 bus 0 scbus6 target 1 lun 0
>>>>   ada2: <ST3640323AS SD13> ATA-8 SATA 2.x device
>>>>   ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>>>>   ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C)
>>>>
>>>> Let's talk about ada0 and ada1 first.
>>>
>>> Hold up a minute -- I just noticed some key information here (see what
>>> happens with big conflated threads?), and it sheds some light on my
>>> concerns with AHCI vs. classic ata(4):
>>>
>>> ada0 -- attached to ahcich0
>>> ada1 -- attached to ata0 (presumably a "master" drive)
>>> ada2 -- attached to ata0 (presumably a "slave" drive)
>>>
>>> This is extremely confusing, because ata0 is a classic ATA controller (I
>>> can even tell from the classic ISA I/O port ranges):
>>>
>>> atapci1: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
>>> ata0: <ATA channel> at channel 0 on atapci1
>>> ata1: <ATA channel> at channel 1 on atapci1
>>>
>>> Yet the WD15EARS and ST3640323AS drives are physically SATA drives.
>>>
>>> Are you using SATA-to-IDE adapters on these two drives?
>> No.
>>>
>>> If not, this seems to indicate the motherboard and/or SATA controller
>>> is actually only binding 1 disk to AHCI, while the others are bound to
>>> the same controller operating in (possibly) "SATA Enhanced" mode.
>>>
>>> This would be the first I've ever seen of this (a controller operating
>>> in both modes simultaneously), but I have a lot more experience with
>>> Intel SATA controllers than I do AMD.
>>>
>>> I don't know why a system would do this, unless all of this can be
>>> controlled via the BIOS somehow.  What a mess.
>>>
>> I looked into BIOS and it can be controlled. 6 ports are divided into 2
>> triples and I can switch mode of each triple independently. One drive is
>> connected to one and two to the other.
>> Looks like there's a bug because both triples are set to ATA.
>> I left them like that for now.
> 
> What exact motherboard model is this?  I'd like to review the manual.
> 
>> Anyway, I got the hang again, so I can provide dmesg. I was not at the
>> computer when it happened, so there's only the last screen though...
>> pastebin.com/bjYtzPgs
> 
> Thank you.  Sadly the log snippet doesn't have timestamps but this is
> what transpired:
> 
> The log snippet you showed indicates the following:
> 
> * An NCQ-based write CDB (WRITE_FPDMA_QUEUED) was issued to the
>   ada0 drive attached to channel ahcich0 of controller ahci0, and
>   the disk or controller did not respond within 30 seconds (I'm
>   assuming PC-BSD did not change kern.cam.ada.default_timeout from
>   the default of 30 seconds)
Checked: default.
> 
> * The same request was resubmit to the controller (CAM will try
>   submission of a CDB up to 5 times (i.e. 4 retries), which is
>   controlled with kern.cam.ada.retry_count).
> 
Checked: default.
> * The AHCI controller (rather the specific channel of the AHCI
>   controller) also reported that the underlying disk/device was
>   not responding (re: "Timeout on slot X port X").  I see no
>   SERR condition.
> 
> * An ATA_IDENTIFY CDB was issued to the ada0 drive attached to
>   channel ahcich0 of controller ahci0, and this also timed out
>   after 30 seconds.  My gut feeling is that this system is
>   running smartd(8); it's possible the kernel itself could submit
>   the CDB to the drive, but in this condition/state I don't know
>   why it'd do that.
Nope, smartd doesn't run.
> 
> * Rinse lather repeat.
> 
> To me, at first glance, this looks like the ada0 disk is going
> catatonic.  The controller itself seems to be responding fine, just that
> the disk attached to ahcich0 is locking up hard.  I see no sign of an
> AHCI reset ("AHCI reset..." message) either.
> 
> So why does your system "hang" (meaning why can't you log in, why do
> applications stop working, etc.) when this happens?  Simple:
> 
> You're using ZFS for your root filesystem, as shown here:
> 
> Trying to mount root from zfs:tank1/ROOT/default []...
> 
> Your ZFS pool called tank1 consists of a raidz1 pool of 3 devices
> (more specifically partitions): ada0, ada1, and something that is
> missing.  Recap:
> 
> http://pastebin.com/D3Av7x9X
> 
> The pool is already degraded, and as you know, raidz1 can only suffer up
> to loss of one vdev (in this case a disk) before ZFS will begin behaving
> based on what the pool's "failmode" property is.
> 
> In effect, when this happens, you're down to only 1 disk: ada1, and
> that's not sufficient.  So ZFS does exactly what it should (with
> failmode=wait, the default): it waits indefinitely, hoping that things
> recover.
> 
> Because this is your root filesystem, as well as tons of other
> filesystems (including /usr, /var, /var/log, and so on):
> 
> http://pastebin.com/4sT37VqZ
> 
> ...any I/O submit to filesystems part of pool tank1 will indefinitely
> block/wait until things recover.  Except they don't recover (and that
> isn't the fault of ZFS).
> 
> I imagine if you let the system sit for roughly 5*30 seconds (see above
> for how I calculated that), you would eventually see a message on the
> console that looks something like this:
> 
> (ada0:ahcich0:0:0:0): lost device
> (ada0:ahcich0:0:0:0): removing device entry
I don't think so. I'm nearly sure it took me longer than that to write
the errors down alone. And when I discovered the system lockup it was in
such state already. The next time I can take precise time measurements.
> 
> So, the crux of your problem is:
> 
> 1. Your disks fall off the bus for reasons unknown at this time -- I'm
> still waiting on smartctl -x output for each of your disks.  Your disks
> themselves may actually be okay (I need to review the output to
> determine that) and the issue may be with SATA cabling, a faulty PSU, or
> a completely broken SATA controller or motherboard (bad traces, etc.).
> I am not going to help diagnose those problems, because the only
> reliable method is to start replacing each part, piece by piece, and see
> if the issue goes away.
> 
> 2. Your array is already degraded/broken yet you don't care to fix it.
> If the array was in decent shape and ada0 fell off the bus, things would
> still work because ada1 and ada2 would be functioning (re: raidz1).
> 
> If you were using UFS instead of ZFS for your root filesystem, you would
> still have the same issue, just that the system would kernel panic.  You
> can induce that behaviour with ZFS as well using failmode=panic.
> 
> There isn't much more for me to say.  Everything is behaving how it's
> designed, from what I can tell.
> 
> When you lose your root filesystem, you really can't expect the system
> to be in some "magical usable state".
> 
The disk is out of the array because I didn't put it back after RMA.
I RMA'd it because it used to cause precisely this kind of lockups on a
non-degraded array. And I've seen it in the installer running from an
entirely different device too.
I guess that after reproducing the issue and taking time measurements, I
should put the RMA'd drive back. I expect the problem to keep happening.
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 15 07:11:53 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 6FBD7EC0
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 07:11:53 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-vc0-f171.google.com (mail-vc0-f171.google.com
 [209.85.220.171]) by mx1.freebsd.org (Postfix) with ESMTP id 323C18B0
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 07:11:52 +0000 (UTC)
Received: by mail-vc0-f171.google.com with SMTP id ha12so3571713vcb.30
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 00:11:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=afVyt8kwLlcc4OxZUOw3rc18xJVOOYIy0xAvUE8/+Ac=;
 b=ueLxWq5yaENF4yuT6LpaMFd3WTrPNlbiNFjEf79t/draMPz9qqP2B8Exao1MlIR07e
 QYP6RRCza7L56KamZzez7j5Hbrabiv8AChmV3B5hmU3M8hba6pillwx+kH/qWHYT8QhD
 fxHy9VL43bQg2W3TV8kFnRqDUf1W4HLWutvtv39HVy/kNpUhZfrt74JHbbnDF9h78I4d
 xbAyrLrlfwDPXPO70fskkqK+ilLZ37E8yw7Hk32bcV6DtFXrd7ppm6BmCobkRHbHKPLI
 OVBkTKiGyxWz/yF+hJl935NHdUZ/VumfcitU15wccSAsqLdrTOgL+kHsj9SLkQNDbtKN
 dZ4A==
MIME-Version: 1.0
X-Received: by 10.52.75.8 with SMTP id y8mr12937656vdv.2.1366009912224; Mon,
 15 Apr 2013 00:11:52 -0700 (PDT)
Received: by 10.220.91.83 with HTTP; Mon, 15 Apr 2013 00:11:52 -0700 (PDT)
In-Reply-To: <20130414194440.GB38338@icarus.home.lan>
References: <516A8092.2080002@o2.pl>
 <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk>
 <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan>
 <CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw@mail.gmail.com>
 <20130414194440.GB38338@icarus.home.lan>
Date: Mon, 15 Apr 2013 03:11:52 -0400
Message-ID: <CACpH0Md4SLz99Pk7JTHrqXWYLODNEEroCgE0BqAwYN=jKUC=FQ@mail.gmail.com>
Subject: Re: A failed drive causes system to hang
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: Jeremy Chadwick <jdc@koitsu.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-fs <freebsd-fs@freebsd.org>,
 =?ISO-8859-1?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMQ/xT93?=
 <radiomlodychbandytow@o2.pl>, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Apr 2013 07:11:53 -0000

If I was using my plain old UN*X mailer, I'd try to honor your request for
a new thread (by editing the headers)... but I don't see any method by
which google allows this.  Anyways... rather than discuss my (admittedly
vague) "me too" on the drive issue, I'd like to comment on the meta issue
you raise.

On Sun, Apr 14, 2013 at 3:44 PM, Jeremy Chadwick <jdc@koitsu.org> wrote:

>
> There is already too much crap going on in this thread with 4 different
> people with what are 4 different issues, and nobody at this point is
> able to keep track of it all (including the participants).
>
> This situation happens way, WAY too often with storage-related matters
> on the list.  ANYTHING ZFS-related and ANYTHING storage-related results
> in bandwagon-jumping and threads that spiral out of control/become
> almost useless and certainly impossible to follow.  It needs to stop.
>

 I think what's happening here is that the whole storage subsystem is (at
this point) good enough that people who have problems are encountering
fairly obscure but serious corner cases... but that since there isn't much
hardware advice from core anymore, it's assumed by the sufferers that these
issues must conflate since general experience leaves us thinking there are
very few issues.

When I say hardware advice... many common list readers might pick up on
hardware opinions dropped here but it's easy to miss them and they remain
uncollected.  Worse, when software workarounds and/or fixed hardware
revisions occur, there is again no reflection.

Some driver man pages make some statements about hardware capabilities...
but other hardware has none.

... and since I'm saying this, I'll volunteer...

We need for each class of hardware a simple table of information.  As an
example, the columns for block storage might be:

   - chipset (list)
   - driver (name)
   - hot swap (y/n)
   - known to hang on drive failures (y/n)
   - pmp (y/n, 1:n)
   - queuing (type)
   - block sizes (512, 4k, ...)
   - relative performance (cpu heavy, scatter-gather, etc)
   - memory support (32 bit, 64 bit, bounce buffers)
   - "recommended"

Similar lists can easily be generated for NICs, motherboards, video (a
particular mess) and whatnot.  There isn't an incentive for a computer
retailer to put together working hardware as lists of components could then
easily be bought ... undercutting the margin --- it seems to me that
knowledge inside the community needs to be fostered.

So... what am I volunteering for?  I would be happy to maintain a portion
of the FreeBSD wiki with hardware information from components right up to
systems in this form, but I would need input from the driver writers ...
who are in the best position to know ... what works and what doesn't.

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 15 10:28:40 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2DA33E1D
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 10:28:40 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
 (fallbackmx07.syd.optusnet.com.au [211.29.132.9])
 by mx1.freebsd.org (Postfix) with ESMTP id 9DFD91CE
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 10:28:39 +0000 (UTC)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
 [211.29.132.184])
 by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
 r3FASVmN005203
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 20:28:31 +1000
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3FASITk007638
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 15 Apr 2013 20:28:20 +1000
Date: Mon, 15 Apr 2013 20:28:18 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
In-Reply-To: <1091296771.826148.1365989521302.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20130415184639.V1081@besplex.bde.org>
References: <1091296771.826148.1365989521302.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=Ov0XUFDt c=1 sm=1 a=xj4t0lYZ87oA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=r-ufCDgtFiYA:10
 a=jyafP8MNAAAA:8 a=gzcLvKzMasgLP25ndJgA:9 a=CjuIK1q_8ugA:10
 a=gmjzRuXrkl8A:10 a=nRcZRO9L01eZmAkF:21 a=88WQ7wBtLHMueRgm:21
 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Apr 2013 10:28:40 -0000

On Sun, 14 Apr 2013, Rick Macklem wrote:

> Paul van der Zwan wrote:
>> On 14 Apr 2013, at 5:00 , Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>
>> Thanks for taking the effort to send such an extensive reply.
>>
>>> Paul van der Zwan wrote:
>>>> On 12 Apr 2013, at 16:28 , Paul van der Zwan <paulz@vanderzwan.org>
>>>> wrote:
> ...
>>> In NFSv3, each RPC is defined and usually includes attributes for
>>> files
>>> before and after the operation (implicit getattrs not counted in the
>>> RPC
>>> counts reported by nfsstat).
>>>
>>> For NFSv4, every RPC is a compound built up of a list of Operations
>>> like
>>> Getattr. Since the NFSv4 server doesn't know what the compound is
>>> doing,
>>> nfsstat reports the counts of Operations for the NFSv4 server, so
>>> the counts
>>> will be much higher than with NFSv3, but do not reflect the number
>>> of RPCs being done.
>>> To get NFSv4 nfsstat output that can be compared to NFSv3, you need
>>> to
>>> do the command on the client(s) and it still is only roughly the
>>> same.
>>> (I just realized this should be documented in man nfsstat.)
>>>
>> I ran nfsstat -s -v 4 on the server and saw the number of requests
>> being done.
>> They were in the order of a few thousand per second for a single
>> FreeBSD 9.1 client
>> doing a make build world.
>>
> Yes, but as I noted above, for NFSv4, these are counts of operations,
> not RPCs. Each RPC in NFSv4 consists of several operations. For example,
> for read it is something like:
> - PutFH, Read, Getattr
>
> As such, you need to do "nfsstat -e -c" on the client in order to
> see how many RPCs are happening.

Does it show the number of physical RPC or only "roughly the same"?

>>> For the FreeBSD NFSv4 client, the compounds include Getattr
>>> operations
>>> similar to what NFSv3 does. It doesn't do a Getattr on the directory
>>> for Lookup, because that would have made the compound much more
>>> complex.
>>> I don't think this will have a significant performance impact, but
>>> will
>>> result in some additional Getattr RPCs.
>>>
>> I ran snoop on port 2049 on the server and I saw a large number of
>> lookups.
>> A lot of them seem to be for directories which are part of the
>> filenames of
>> the compiler and include files which on the nfs mounted /usr/obj.
>> The same names keep reappering so it looks like there is no caching
>> being done on
>> the client.

When I worked on this in ~2007, unnecessary RPCs for lookup was a
large cause of slowness.  This was fixed in at least nfsv3.  Almost
all RPCs for makeworld (closer to 99% than 90%) should now be for open
of the excessively layered and polluted include files, since they are
opened so often compared with other files and every open goes to the
server (except "nocto" should fix this).  There are lots of lookups
for the include files too, but the lookups are properly cached.

>> I tried the nocto option in /etc/fstab but it does not show when mount
>> shows
>> the mounted filesystems so I am not sure if it is being used.
> Head (and I think stable9) is patched so that ``nfsstat -m`` shows
> all the options actually being used. For 9.1, you just have to trust
> that it has been set.

This doesn't work on ref10-amd64 running 10.0-CURRENT Apr 5.  nfsstat -m
gives null output.  Plain nfsstat confirms that there are some nfs mounts,
with so much activity on them that man of the cache counts are negative
after 9 days of uptime.

> ...
>> I tried a make buildworld buildkernel with /usr/obj a local FS in the
>> Vbox VM
>> that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem
>> it takes
>> about a day. A twelve fold increase is elapsed time makes using NFSv4
>> unusable
>> for this use case.

That is extremely slow.  Here I am unhappy with the makeworld time over
nfs staying about 13 minutes despite attempts to improve this, but I
only have old slow hardware (2 core 2GHz Turion laptop).  I also have
a modified FreeBSD-5, which avoids some of the bloat in -current.  My best
time without excessive tuning was:

@ --------------------------------------------------------------
@ >>> make world completed on Fri Nov  2 23:35:11 EST 2007
@                    (started Fri Nov  2 23:21:27 EST 2007)
@ --------------------------------------------------------------
@       823.53 real      1295.80 user       192.46 sys
@ 
@  Lookup  Read Access Fsstat Other   Total
@  127134 23214 624060  24764    99  799271

The kernel was current at the time, but userland was ~5.2.  Newer
kernels (1-2 years old) are only a bit slower and don't require any
modifications to get similar RPC counts (with Getattr.nstead of Access)
/usr including /usr/bin and /usr/src was on nfs, but /bin and /usr/obj
were local.  Everything fits in RAM caches so there was no disk activity
except for new reads and new writes.  Network latency was tuned to 60
usec (min for ping).

When nfs was pessimized, the above RPC counts blew out to no more than 2
million.  Suppose you have 2 million RPCs with a latency of just 65 usec.
That gives a latency of 130 seconds.  Not too bad, but large compared with
823 seconds.  They latency is amortized by having more than 1 CPU and/or
building concurrently.  Then progress can usually be made in some threads
while others are blocked waiting for the RPCs.  However, many networks
have latencies much larger than 65 usec.  On the freebsd cluster now, the
min latency is about 250 usec, and since it it has multiple users the
latency is sometimes over 1 msec.  2 million RPCs with a latency of 1 msec
take 2000 seconds, which is a lot compared with a build time of 823 seconds.

I consider "nocto" as excessive tuning, since although it would help
makeworld benchmarks it is unsafe in general.  Of course I tried my
version of it in the above.  (They above RPC counts are with the following
critical modifications that weren't in FreeBSD at the time:
- negative caching
- fix for broken dotdot caching
- fix for broken "cto".  It did twice as many RPCs as needed.)
Adding the equivalent of "nocto" reduced the RPC counts significantly,
but only reduced the real time by about 20 (?) seconds.

> Source builds on NFS mounts are notoriously slow. A big part of this is

Only when misconfigured.  The nfs build time in the above is between 5%
and 10% slower than the local build time.

> the synchronous writes that get done because there is only one dirty
> byte range for a block and the loader loves to write small non-contiguous
> areas of its output file.

Writing to nfs would be slow, but I made /usr/obj local to avoid it.  Also,
in other (kernel build) tests where object files are written to the current
directory which is on nfs, the non-separate object directory is mounted
async on the server so it is fast enough.  Now my reference is building
a FreeBSD-4 kernel.  My best times were:
- 32+ seconds (src and obj on nfs, async, -j4)
- 30- seconds (src and obj of ffs, async, -j4)
- 64+ (?) seconds (src and obj on nfs, async, -j1)
- 58 (?) seconds (src and obj on ffs, async, -j1)
(/usr on nfs, /bin on ffs).  Without parallelism, everything has to wait
for the RPCs, and even with low network latency this costs 5-10%.

>> Too bad the server hangs when I use nfsv3 mount for /usr/obj.
> Try this mount command:
> mount -t nfs -o nfsv3,nolockd ...
> (I do builds of the src tree NFS mounted, so the only reason I can
> think that it would hang would be a rpc.lockd issue.)
> If this works, I suspect it will still be slow, but it would be nice to
> find out how much slower NFSv4 is for your case.

Needed to localize the slowness anyway.  It might be just in the server.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 15 11:06:43 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 65C98959
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Apr 2013 11:06:43 +0000 (UTC)
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 4A5C8799
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Apr 2013 11:06:43 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3FB6hxN015080
 for <freebsd-fs@FreeBSD.org>; Mon, 15 Apr 2013 11:06:43 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3FB6gaN015078
 for freebsd-fs@FreeBSD.org; Mon, 15 Apr 2013 11:06:42 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 15 Apr 2013 11:06:42 GMT
Message-Id: <201304151106.r3FB6gaN015078@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-fs@FreeBSD.org
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Apr 2013 11:06:43 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/177658  fs         [ufs] FreeBSD panics after get full filesystem with uf
o kern/177536  fs         [zfs] zfs livelock (deadlock) with high write-to-disk 
o kern/177445  fs         [hast] HAST panic
o kern/177240  fs         [zfs] zpool import failed with state UNAVAIL but all d
o kern/176978  fs         [zfs] [panic] zfs send -D causes "panic: System call i
o kern/176857  fs         [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic 
o bin/176253   fs         zpool(8): zfs pool indentation is misleading/wrong
o kern/176141  fs         [zfs] sharesmb=on makes errors for sharenfs, and still
o kern/175950  fs         [zfs] Possible deadlock in zfs after long uptime
o kern/175897  fs         [zfs] operations on readonly zpool hang
o kern/175179  fs         [zfs] ZFS may attach wrong device on move
o kern/175071  fs         [ufs] [panic] softdep_deallocate_dependencies: unrecov
o kern/174372  fs         [zfs] Pagefault appears to be related to ZFS
o kern/174315  fs         [zfs] chflags uchg not supported
o kern/174310  fs         [zfs] root point mounting broken on CURRENT with multi
o kern/174279  fs         [ufs] UFS2-SU+J journal and filesystem corruption
o kern/174060  fs         [ext2fs] Ext2FS system crashes (buffer overflow?)
o kern/173830  fs         [zfs] Brain-dead simple change to ZFS error descriptio
o kern/173718  fs         [zfs] phantom directory in zraid2 pool
f kern/173657  fs         [nfs] strange UID map with nfsuserd
o kern/173363  fs         [zfs] [panic] Panic on 'zpool replace' on readonly poo
o kern/173136  fs         [unionfs] mounting above the NFS read-only share panic
o kern/172942  fs         [smbfs] Unmounting a smb mount when the server became 
o kern/172348  fs         [unionfs] umount -f of filesystem in use with readonly
o kern/172334  fs         [unionfs] unionfs permits recursive union mounts; caus
o kern/171626  fs         [tmpfs] tmpfs should be noisier when the requested siz
o kern/171415  fs         [zfs] zfs recv fails with "cannot receive incremental 
o kern/170945  fs         [gpt] disk layout not portable between direct connect 
o bin/170778   fs         [zfs] [panic] FreeBSD panics randomly
o kern/170680  fs         [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA
o kern/170497  fs         [xfs][panic] kernel will panic whenever I ls a mounted
o kern/169945  fs         [zfs] [panic] Kernel panic while importing zpool (afte
o kern/169480  fs         [zfs] ZFS stalls on heavy I/O
o kern/169398  fs         [zfs] Can't remove file with permanent error
o kern/169339  fs         panic while " : > /etc/123"
o kern/169319  fs         [zfs] zfs resilver can't complete
o kern/168947  fs         [nfs] [zfs] .zfs/snapshot directory is messed up when 
o kern/168942  fs         [nfs] [hang] nfsd hangs after being restarted (not -HU
o kern/168158  fs         [zfs] incorrect parsing of sharenfs options in zfs (fs
o kern/167979  fs         [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste
o kern/167977  fs         [smbfs] mount_smbfs results are differ when utf-8 or U
o kern/167688  fs         [fusefs] Incorrect signal handling with direct_io
o kern/167685  fs         [zfs] ZFS on USB drive prevents shutdown / reboot
o kern/167612  fs         [portalfs] The portal file system gets stuck inside po
o kern/167272  fs         [zfs] ZFS Disks reordering causes ZFS to pick the wron
o kern/167260  fs         [msdosfs] msdosfs disk was mounted the second time whe
o kern/167109  fs         [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene
o kern/167105  fs         [nfs] mount_nfs can not handle source exports wiht mor
o kern/167067  fs         [zfs] [panic] ZFS panics the server
o kern/167065  fs         [zfs] boot fails when a spare is the boot disk
o kern/167048  fs         [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF
o kern/166912  fs         [ufs] [panic] Panic after converting Softupdates to jo
o kern/166851  fs         [zfs] [hang] Copying directory from the mounted UFS di
o kern/166477  fs         [nfs] NFS data corruption.
o kern/165950  fs         [ffs] SU+J and fsck problem
o kern/165521  fs         [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31
o kern/165392  fs         Multiple mkdir/rmdir fails with errno 31
o kern/165087  fs         [unionfs] lock violation in unionfs
o kern/164472  fs         [ufs] fsck -B panics on particular data inconsistency
o kern/164370  fs         [zfs] zfs destroy for snapshot fails on i386 and sparc
o kern/164261  fs         [nullfs] [patch] fix panic with NFS served from NULLFS
o kern/164256  fs         [zfs] device entry for volume is not created after zfs
o kern/164184  fs         [ufs] [panic] Kernel panic with ufs_makeinode
o kern/163801  fs         [md] [request] allow mfsBSD legacy installed in 'swap'
o kern/163770  fs         [zfs] [hang] LOR between zfs&syncer + vnlru leading to
o kern/163501  fs         [nfs] NFS exporting a dir and a subdir in that dir to 
o kern/162944  fs         [coda] Coda file system module looks broken in 9.0
o kern/162860  fs         [zfs] Cannot share ZFS filesystem to hosts with a hyph
o kern/162751  fs         [zfs] [panic] kernel panics during file operations
o kern/162591  fs         [nullfs] cross-filesystem nullfs does not work as expe
o kern/162519  fs         [zfs] "zpool import" relies on buggy realpath() behavi
o kern/161968  fs         [zfs] [hang] renaming snapshot with -r including a zvo
o kern/161864  fs         [ufs] removing journaling from UFS partition fails on 
o bin/161807   fs         [patch] add option for explicitly specifying metadata 
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
f kern/160860  fs         [ufs] Random UFS root filesystem corruption with SU+J 
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         amd(8) ICMP storm and unkillable process.
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
p kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o bin/153142   fs         [zfs] ls -l outputs `ls: ./.zfs: Operation not support
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
f bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
p kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o kern/118318  fs         [nfs] NFS server hangs under special circumstances
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

305 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 02:19:32 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 0CB062F8
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 02:19:32 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9A5BA6CE
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 02:19:30 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAD20bFGDaFvO/2dsb2JhbABQhmy9VYEedIIfAQEFI1YbDgoCAg0ZAlkGE4gUqVOSYYEjjUA0B2OBS4ETA5M4g02REYJ+KSCBbA
X-IronPort-AV: E=Sophos;i="4.87,480,1363147200"; d="scan'208";a="25925760"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 15 Apr 2013 22:19:23 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3B475B406C;
 Mon, 15 Apr 2013 22:19:23 -0400 (EDT)
Date: Mon, 15 Apr 2013 22:19:23 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <1236177219.867591.1366078763224.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20130415184639.V1081@besplex.bde.org>
Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 02:19:32 -0000

Bruce Evans wrote:
> On Sun, 14 Apr 2013, Rick Macklem wrote:
> 
> > Paul van der Zwan wrote:
> >> On 14 Apr 2013, at 5:00 , Rick Macklem <rmacklem@uoguelph.ca>
> >> wrote:
> >>
> >> Thanks for taking the effort to send such an extensive reply.
> >>
> >>> Paul van der Zwan wrote:
> >>>> On 12 Apr 2013, at 16:28 , Paul van der Zwan
> >>>> <paulz@vanderzwan.org>
> >>>> wrote:
> > ...
> >>> In NFSv3, each RPC is defined and usually includes attributes for
> >>> files
> >>> before and after the operation (implicit getattrs not counted in
> >>> the
> >>> RPC
> >>> counts reported by nfsstat).
> >>>
> >>> For NFSv4, every RPC is a compound built up of a list of
> >>> Operations
> >>> like
> >>> Getattr. Since the NFSv4 server doesn't know what the compound is
> >>> doing,
> >>> nfsstat reports the counts of Operations for the NFSv4 server, so
> >>> the counts
> >>> will be much higher than with NFSv3, but do not reflect the number
> >>> of RPCs being done.
> >>> To get NFSv4 nfsstat output that can be compared to NFSv3, you
> >>> need
> >>> to
> >>> do the command on the client(s) and it still is only roughly the
> >>> same.
> >>> (I just realized this should be documented in man nfsstat.)
> >>>
> >> I ran nfsstat -s -v 4 on the server and saw the number of requests
> >> being done.
> >> They were in the order of a few thousand per second for a single
> >> FreeBSD 9.1 client
> >> doing a make build world.
> >>
> > Yes, but as I noted above, for NFSv4, these are counts of
> > operations,
> > not RPCs. Each RPC in NFSv4 consists of several operations. For
> > example,
> > for read it is something like:
> > - PutFH, Read, Getattr
> >
> > As such, you need to do "nfsstat -e -c" on the client in order to
> > see how many RPCs are happening.
> 
> Does it show the number of physical RPC or only "roughly the same"?
> 
Yes, for NFSv4, the client side counts are for the RPCs. The roughly
referred to the fact that the NFSv4 compound doesn't do exactly the
same thing as the NFSv3 RPC, although they tend to be very similar.

> >>> For the FreeBSD NFSv4 client, the compounds include Getattr
> >>> operations
> >>> similar to what NFSv3 does. It doesn't do a Getattr on the
> >>> directory
> >>> for Lookup, because that would have made the compound much more
> >>> complex.
> >>> I don't think this will have a significant performance impact, but
> >>> will
> >>> result in some additional Getattr RPCs.
> >>>
> >> I ran snoop on port 2049 on the server and I saw a large number of
> >> lookups.
> >> A lot of them seem to be for directories which are part of the
> >> filenames of
> >> the compiler and include files which on the nfs mounted /usr/obj.
> >> The same names keep reappering so it looks like there is no caching
> >> being done on
> >> the client.
> 
> When I worked on this in ~2007, unnecessary RPCs for lookup was a
> large cause of slowness. This was fixed in at least nfsv3. Almost
> all RPCs for makeworld (closer to 99% than 90%) should now be for open
> of the excessively layered and polluted include files, since they are
> opened so often compared with other files and every open goes to the
> server (except "nocto" should fix this). There are lots of lookups
> for the include files too, but the lookups are properly cached.
> 
> >> I tried the nocto option in /etc/fstab but it does not show when
> >> mount
> >> shows
> >> the mounted filesystems so I am not sure if it is being used.
> > Head (and I think stable9) is patched so that ``nfsstat -m`` shows
> > all the options actually being used. For 9.1, you just have to trust
> > that it has been set.
> 
> This doesn't work on ref10-amd64 running 10.0-CURRENT Apr 5. nfsstat
> -m
> gives null output. Plain nfsstat confirms that there are some nfs
> mounts,
> with so much activity on them that man of the cache counts are
> negative
> after 9 days of uptime.
> 
I both the kernel and nfsstat binary are Apr. 5, I think it should work.
(It will only do the new/default NFS mounts, not oldnfs ones.)

I'll take another look, in case something got missed for the commit.

rick

> > ...
> >> I tried a make buildworld buildkernel with /usr/obj a local FS in
> >> the
> >> Vbox VM
> >> that completed in about 2 hours. With /usr/obj on an NFS v4
> >> filesystem
> >> it takes
> >> about a day. A twelve fold increase is elapsed time makes using
> >> NFSv4
> >> unusable
> >> for this use case.
> 
> That is extremely slow. Here I am unhappy with the makeworld time over
> nfs staying about 13 minutes despite attempts to improve this, but I
> only have old slow hardware (2 core 2GHz Turion laptop). I also have
> a modified FreeBSD-5, which avoids some of the bloat in -current. My
> best
> time without excessive tuning was:
> 
> @ --------------------------------------------------------------
> @ >>> make world completed on Fri Nov 2 23:35:11 EST 2007
> @ (started Fri Nov 2 23:21:27 EST 2007)
> @ --------------------------------------------------------------
> @ 823.53 real 1295.80 user 192.46 sys
> @
> @ Lookup Read Access Fsstat Other Total
> @ 127134 23214 624060 24764 99 799271
> 
> The kernel was current at the time, but userland was ~5.2. Newer
> kernels (1-2 years old) are only a bit slower and don't require any
> modifications to get similar RPC counts (with Getattr.nstead of
> Access)
> /usr including /usr/bin and /usr/src was on nfs, but /bin and /usr/obj
> were local. Everything fits in RAM caches so there was no disk
> activity
> except for new reads and new writes. Network latency was tuned to 60
> usec (min for ping).
> 
> When nfs was pessimized, the above RPC counts blew out to no more than
> 2
> million. Suppose you have 2 million RPCs with a latency of just 65
> usec.
> That gives a latency of 130 seconds. Not too bad, but large compared
> with
> 823 seconds. They latency is amortized by having more than 1 CPU
> and/or
> building concurrently. Then progress can usually be made in some
> threads
> while others are blocked waiting for the RPCs. However, many networks
> have latencies much larger than 65 usec. On the freebsd cluster now,
> the
> min latency is about 250 usec, and since it it has multiple users the
> latency is sometimes over 1 msec. 2 million RPCs with a latency of 1
> msec
> take 2000 seconds, which is a lot compared with a build time of 823
> seconds.
> 
> I consider "nocto" as excessive tuning, since although it would help
> makeworld benchmarks it is unsafe in general. Of course I tried my
> version of it in the above. (They above RPC counts are with the
> following
> critical modifications that weren't in FreeBSD at the time:
> - negative caching
> - fix for broken dotdot caching
> - fix for broken "cto". It did twice as many RPCs as needed.)
> Adding the equivalent of "nocto" reduced the RPC counts significantly,
> but only reduced the real time by about 20 (?) seconds.
> 
> > Source builds on NFS mounts are notoriously slow. A big part of this
> > is
> 
> Only when misconfigured. The nfs build time in the above is between 5%
> and 10% slower than the local build time.
> 
> > the synchronous writes that get done because there is only one dirty
> > byte range for a block and the loader loves to write small
> > non-contiguous
> > areas of its output file.
> 
> Writing to nfs would be slow, but I made /usr/obj local to avoid it.
> Also,
> in other (kernel build) tests where object files are written to the
> current
> directory which is on nfs, the non-separate object directory is
> mounted
> async on the server so it is fast enough. Now my reference is building
> a FreeBSD-4 kernel. My best times were:
> - 32+ seconds (src and obj on nfs, async, -j4)
> - 30- seconds (src and obj of ffs, async, -j4)
> - 64+ (?) seconds (src and obj on nfs, async, -j1)
> - 58 (?) seconds (src and obj on ffs, async, -j1)
> (/usr on nfs, /bin on ffs). Without parallelism, everything has to
> wait
> for the RPCs, and even with low network latency this costs 5-10%.
> 
> >> Too bad the server hangs when I use nfsv3 mount for /usr/obj.
> > Try this mount command:
> > mount -t nfs -o nfsv3,nolockd ...
> > (I do builds of the src tree NFS mounted, so the only reason I can
> > think that it would hang would be a rpc.lockd issue.)
> > If this works, I suspect it will still be slow, but it would be nice
> > to
> > find out how much slower NFSv4 is for your case.
> 
> Needed to localize the slowness anyway. It might be just in the
> server.
> 
> Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 05:59:05 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 068FEB8A
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 05:59:05 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 by mx1.freebsd.org (Postfix) with ESMTP id 843DFE38
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 05:59:03 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r3G5wudp038487
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 08:58:57 +0300 (EEST)
 (envelope-from daniel@digsys.bg)
Message-ID: <516CE8A0.3070808@digsys.bg>
Date: Tue, 16 Apr 2013 08:58:56 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130415 Thunderbird/17.0.5
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: ZFS-inly server and dedicated ZIL
References: <alpine.BSF.2.00.1304101713530.38433@woozle.rinet.ru>
 <D416AA98-D78A-4743-A1E6-AA2C28B9A602@bway.net>
 <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
In-Reply-To: <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 05:59:05 -0000


On 12.04.13 12:44, Dmitry Morozovsky wrote:
> No, this will be 8*SAS in 4 mirrored pairs + 2*SSD for mirrored ZIL and striped
> l2arc. Like the following (this is from other machine, but similar in setup):
>
>
>          NAME        STATE     READ WRITE CKSUM
>          pn          ONLINE       0     0     0
>            mirror-0  ONLINE       0     0     0
>              da0     ONLINE       0     0     0
>              da6     ONLINE       0     0     0
>            mirror-2  ONLINE       0     0     0
>              da1     ONLINE       0     0     0
>              da7     ONLINE       0     0     0
>          logs
>            mirror-1  ONLINE       0     0     0
>              da2d    ONLINE       0     0     0
>              da3d    ONLINE       0     0     0
>          cache
>            da2e      ONLINE       0     0     0
>            da3e      ONLINE       0     0     0
>
>

As already mentioned, multiple vdev zpool for root works just fine on 
FreeBSD. There is however one "restriction" -- all of the drives should 
be visible as BIOS drives at boot. If your BIOS (HBA SAS BIOS) supports 
that, then you should not expect problems.

Just make sure you test it with all the drives, before you build the pool.

On the other hand, using the same SSD for SLOG and L2ARC is not always 
good idea because those two have quite opposite requirements.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 08:53:08 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id BCDD3D47
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 08:53:08 +0000 (UTC)
 (envelope-from marck@rinet.ru)
Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68])
 by mx1.freebsd.org (Postfix) with ESMTP id 4A4C2C3D
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 08:53:06 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r3G8qjse059998;
 Tue, 16 Apr 2013 12:52:45 +0400 (MSK) (envelope-from marck@rinet.ru)
Date: Tue, 16 Apr 2013 12:52:45 +0400 (MSK)
From: Dmitry Morozovsky <marck@rinet.ru>
To: Daniel Kalchev <daniel@digsys.bg>
Subject: Re: ZFS-inly server and dedicated ZIL
In-Reply-To: <516CE8A0.3070808@digsys.bg>
Message-ID: <alpine.BSF.2.00.1304161250360.89522@woozle.rinet.ru>
References: <alpine.BSF.2.00.1304101713530.38433@woozle.rinet.ru>
 <D416AA98-D78A-4743-A1E6-AA2C28B9A602@bway.net>
 <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
 <516CE8A0.3070808@digsys.bg>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
X-NCC-RegID: ru.rinet
X-OpenPGP-Key-ID: 6B691B03
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 08:53:08 -0000

On Tue, 16 Apr 2013, Daniel Kalchev wrote:

> As already mentioned, multiple vdev zpool for root works just fine on FreeBSD.

Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does 
not allow to add ZIL to boot pool

> There is however one "restriction" -- all of the drives should be visible as
> BIOS drives at boot. If your BIOS (HBA SAS BIOS) supports that, then you
> should not expect problems.
> 
> Just make sure you test it with all the drives, before you build the pool.

I think I'll drift to UFS mirrored /boot I suppose, as I did for several 
servers already

> On the other hand, using the same SSD for SLOG and L2ARC is not always good
> idea because those two have quite opposite requirements.

I know, but price and drive count restrictions do apply as well :)


-- 
Sincerely,
D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer:                                 marck@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 11:02:20 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 06D7286A
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Apr 2013 11:02:20 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 34B5B1EA
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Apr 2013 11:02:18 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
 [212.40.38.101])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA14784;
 Tue, 16 Apr 2013 14:02:00 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Message-ID: <516D2FA8.2000901@FreeBSD.org>
Date: Tue, 16 Apr 2013 14:02:00 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130404 Thunderbird/17.0.5
MIME-Version: 1.0
To: Dmitry Morozovsky <marck@rinet.ru>
Subject: Re: ZFS-inly server and dedicated ZIL
References: <alpine.BSF.2.00.1304101713530.38433@woozle.rinet.ru>
 <D416AA98-D78A-4743-A1E6-AA2C28B9A602@bway.net>
 <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
 <516CE8A0.3070808@digsys.bg>
 <alpine.BSF.2.00.1304161250360.89522@woozle.rinet.ru>
In-Reply-To: <alpine.BSF.2.00.1304161250360.89522@woozle.rinet.ru>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 11:02:20 -0000

on 16/04/2013 11:52 Dmitry Morozovsky said the following:
> Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does 
> not allow to add ZIL to boot pool

I think that there is no reason for that.
And a trivial workaround was already offered to you.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 11:50:19 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 57C1588E
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 11:50:19 +0000 (UTC)
 (envelope-from toasty@dragondata.com)
Received: from mail-ie0-x22e.google.com (mail-ie0-x22e.google.com
 [IPv6:2607:f8b0:4001:c03::22e])
 by mx1.freebsd.org (Postfix) with ESMTP id 26BCE5FB
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 11:50:19 +0000 (UTC)
Received: by mail-ie0-f174.google.com with SMTP id 10so360260ied.5
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 04:50:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=dragondata.com; s=google;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer;
 bh=w9nf0PvL5XAHn75U94Gtue8TkhN573QnevsLQZbduUs=;
 b=WGexmsIGEkEw43lhR6fhVkjPmxupkUozaHEGyjmxt4wbTAPARrBEGQ0m0c0nCgGLgI
 BHgcePMjOjaqVSSWUOGomkc5fDvVvheY31hUjQUMuiWetCNc0Xd2irCMM0MPcs5mSRAI
 x7CX1fKPKa5rlTkFq+f4TL845FMuApiwIpDk4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=w9nf0PvL5XAHn75U94Gtue8TkhN573QnevsLQZbduUs=;
 b=JrIFBbPQXFkd2vapJxXMlADaOvK5/Vg23O0XxT8MJISjB26K0PBL2C9SZWPtD3MMNx
 +YuCdVmM1cyzYpz1DiW/en2jCcVF5fm32U8psUhKbReOxAEqMsgCZVTncoGoweVLb4/f
 tCk6x+g2TiLYe1XcUdo8ja2rY4Td5EJ5sqP3HKeMK4d5Wif8w0Q0tjiiSpV/tMA4l5Ys
 Y3ePA3NRAg3QkXKb0dyHxUZDnxLpnkO2ngldvLLlPvKJ4ooexf2UXmQxEhB6O6lkyBCV
 TJNn0X8w8vRZiARkv+1B2QecIA3t3eB0XdjNZhV4nL2IWhK3jglIWqd7HK8mBHcGhz3S
 8zaw==
X-Received: by 10.50.77.110 with SMTP id r14mr1017932igw.85.1366113017804;
 Tue, 16 Apr 2013 04:50:17 -0700 (PDT)
Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132])
 by mx.google.com with ESMTPS id ua6sm15157222igb.0.2013.04.16.04.50.15
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 16 Apr 2013 04:50:16 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: Does sync(8) really flush everything? Lost writes with journaled
 SU after sync+power cycle
From: Kevin Day <toasty@dragondata.com>
In-Reply-To: <20130411160253.V1041@besplex.bde.org>
Date: Tue, 16 Apr 2013 06:50:13 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <E227CDC9-4DCE-4050-BAEC-6350D8E1B6F0@dragondata.com>
References: <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com>
 <20130411160253.V1041@besplex.bde.org>
To: Bruce Evans <brde@optusnet.com.au>
X-Mailer: Apple Mail (2.1503)
X-Gm-Message-State: ALoCoQm4ncTb/06dkLrHUQi8miztqtF4RIy9vUKwtbyyUAnFaxfBbL3hLHF4Z4eLwXay8Z/65TIh
Cc: "freebsd-fs@FreeBSD.org Filesystems" <freebsd-fs@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 11:50:19 -0000


On Apr 11, 2013, at 1:30 AM, Bruce Evans <brde@optusnet.com.au> wrote:
> sync(2) only schedules all writing of all modified buffers to disk.  =
Its
> man page even says this.  It doesn't wait for any of the writes to =
complete.

A very kind person has pointed out to me (off-list) that doing:

mount -u -o ro /

(without -f)

causes mount to force a flush, waits for completion, THEN bails out =
because there are open files preventing the read-only downgrade. We've =
been testing this here and it seems to be a usable workaround.

I'm also pointing out for at least our purposes, this problem (sync(2) =
doesn't seem to actually cause any writes) only seems to be causing lost =
directories if I'm using journaling. I'm attempting to narrow down why =
journaling appears to make sync into a no-op.=20

-- Kevin


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 12:05:55 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C1042F34;
 Tue, 16 Apr 2013 12:05:55 +0000 (UTC) (envelope-from marck@rinet.ru)
Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68])
 by mx1.freebsd.org (Postfix) with ESMTP id 48960758;
 Tue, 16 Apr 2013 12:05:54 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r3GC5o2D077669;
 Tue, 16 Apr 2013 16:05:50 +0400 (MSK) (envelope-from marck@rinet.ru)
Date: Tue, 16 Apr 2013 16:05:50 +0400 (MSK)
From: Dmitry Morozovsky <marck@rinet.ru>
To: Andriy Gapon <avg@freebsd.org>
Subject: Re: ZFS-inly server and dedicated ZIL
In-Reply-To: <516D2FA8.2000901@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1304161603400.69185@woozle.rinet.ru>
References: <alpine.BSF.2.00.1304101713530.38433@woozle.rinet.ru>
 <D416AA98-D78A-4743-A1E6-AA2C28B9A602@bway.net>
 <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
 <516CE8A0.3070808@digsys.bg>
 <alpine.BSF.2.00.1304161250360.89522@woozle.rinet.ru>
 <516D2FA8.2000901@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
X-NCC-RegID: ru.rinet
X-OpenPGP-Key-ID: 6B691B03
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 12:05:55 -0000

On Tue, 16 Apr 2013, Andriy Gapon wrote:

> on 16/04/2013 11:52 Dmitry Morozovsky said the following:
> > Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does 
> > not allow to add ZIL to boot pool
> 
> I think that there is no reason for that.

Escuse me, ECANTPARS ;)

No reason for disallowing ZIL on boot pool -- or no reason to have such config?

> And a trivial workaround was already offered to you.

Possibly I've missed that, could you please point me?

What I definitely do not want is dedicating pair of expensive SAS as system 
mirror

-- 
Sincerely,
D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer:                                 marck@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 12:11:47 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 2EFE91A9
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Apr 2013 12:11:47 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 74AD77B3
 for <freebsd-fs@FreeBSD.org>; Tue, 16 Apr 2013 12:11:46 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
 [212.40.38.101])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA15427;
 Tue, 16 Apr 2013 15:11:44 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Message-ID: <516D3FFF.9010906@FreeBSD.org>
Date: Tue, 16 Apr 2013 15:11:43 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130404 Thunderbird/17.0.5
MIME-Version: 1.0
To: Dmitry Morozovsky <marck@rinet.ru>
Subject: Re: ZFS-inly server and dedicated ZIL
References: <alpine.BSF.2.00.1304101713530.38433@woozle.rinet.ru>
 <D416AA98-D78A-4743-A1E6-AA2C28B9A602@bway.net>
 <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
 <516CE8A0.3070808@digsys.bg>
 <alpine.BSF.2.00.1304161250360.89522@woozle.rinet.ru>
 <516D2FA8.2000901@FreeBSD.org>
 <alpine.BSF.2.00.1304161603400.69185@woozle.rinet.ru>
In-Reply-To: <alpine.BSF.2.00.1304161603400.69185@woozle.rinet.ru>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 12:11:47 -0000

on 16/04/2013 15:05 Dmitry Morozovsky said the following:
> On Tue, 16 Apr 2013, Andriy Gapon wrote:
> 
>> on 16/04/2013 11:52 Dmitry Morozovsky said the following:
>>> Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does 
>>> not allow to add ZIL to boot pool
>>
>> I think that there is no reason for that.
> 
> Escuse me, ECANTPARS ;)
> 
> No reason for disallowing ZIL on boot pool -- or no reason to have such config?

No reason for disallowing.  I promise to axe the check when I get some time.

>> And a trivial workaround was already offered to you.
> 
> Possibly I've missed that, could you please point me?

Отмотка треда - $100, старым друзьям бесплатно :-)
It's here:
http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/17669/focus=17759
Search for "You do this by".

> What I definitely do not want is dedicating pair of expensive SAS as system 
> mirror

It's possible use partitions / slices for a pool.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 16 12:18:22 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id CE0352C0;
 Tue, 16 Apr 2013 12:18:22 +0000 (UTC) (envelope-from marck@rinet.ru)
Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68])
 by mx1.freebsd.org (Postfix) with ESMTP id 56EE1802;
 Tue, 16 Apr 2013 12:18:21 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r3GCILIg078721;
 Tue, 16 Apr 2013 16:18:21 +0400 (MSK) (envelope-from marck@rinet.ru)
Date: Tue, 16 Apr 2013 16:18:21 +0400 (MSK)
From: Dmitry Morozovsky <marck@rinet.ru>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: ZFS-inly server and dedicated ZIL
In-Reply-To: <516D3FFF.9010906@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1304161617030.69185@woozle.rinet.ru>
References: <alpine.BSF.2.00.1304101713530.38433@woozle.rinet.ru>
 <D416AA98-D78A-4743-A1E6-AA2C28B9A602@bway.net>
 <alpine.BSF.2.00.1304121342090.58348@woozle.rinet.ru>
 <516CE8A0.3070808@digsys.bg>
 <alpine.BSF.2.00.1304161250360.89522@woozle.rinet.ru>
 <516D2FA8.2000901@FreeBSD.org>
 <alpine.BSF.2.00.1304161603400.69185@woozle.rinet.ru>
 <516D3FFF.9010906@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
X-NCC-RegID: ru.rinet
X-OpenPGP-Key-ID: 6B691B03
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=UTF-8
Content-Transfer-Encoding: 8BIT
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Apr 2013 12:18:22 -0000

On Tue, 16 Apr 2013, Andriy Gapon wrote:

> > No reason for disallowing ZIL on boot pool -- or no reason to have such config?
> 
> No reason for disallowing.  I promise to axe the check when I get some time.

Great, thank you.

> >> And a trivial workaround was already offered to you.
> > 
> > Possibly I've missed that, could you please point me?
> 
> Отмотка треда - $100, старым друзьям бесплатно :-)

;-P

> It's here:
> http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/17669/focus=17759
> Search for "You do this by".

Wow. I actually *did* missed that. Will try.


-- 
Sincerely,
D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer:                                 marck@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 06:33:22 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 1516C91E
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 06:33:22 +0000 (UTC)
 (envelope-from slovichon@gmail.com)
Received: from mail-ve0-f170.google.com (mail-ve0-f170.google.com
 [209.85.128.170]) by mx1.freebsd.org (Postfix) with ESMTP id CC8A7E18
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 06:33:21 +0000 (UTC)
Received: by mail-ve0-f170.google.com with SMTP id 14so1153423vea.1
 for <freebsd-fs@freebsd.org>; Tue, 16 Apr 2013 23:33:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:date:from:to:subject:message-id:mime-version
 :content-type:content-disposition;
 bh=SA/ZOtFQysbww8IwLBrR80YRkt083yYyFTCjkqMks3U=;
 b=Q1hNDSshWyVOOI2xfJNL+efbD+weWbnY7TCTqKIm43bCBsw+hjhwAKe9ZiMRy+xBWz
 BedF2dqb8m8f7wqvhtjN4p5SQHtYkWb2GuUp7nUNLctjAyLSEiYH0nduucBhKQhUP7az
 79pwO345B+RL5r1WAt4TqlUN1KaWEZEzUdZbGXfl1xCJwowsHCt0bEEhEHum8PaCPeXJ
 aazGJtp4QgYDy/r1LCeOO+5KpKbBUXNOPZaXRARnBMxd113TgZbDMQY1xByeGZsg0l/y
 M8ftlkht2iLKgd12NuZ0fuOxV65e8xr7FcyST1D1PXWQ1NxCJOR+bLzyCa2xArBF4FCY
 mQIQ==
X-Received: by 10.58.224.101 with SMTP id rb5mr3941730vec.17.1366180401166;
 Tue, 16 Apr 2013 23:33:21 -0700 (PDT)
Received: from localhost (c-24-131-65-84.hsd1.pa.comcast.net. [24.131.65.84])
 by mx.google.com with ESMTPS id j5sm4645802vdv.13.2013.04.16.23.33.19
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Tue, 16 Apr 2013 23:33:20 -0700 (PDT)
Date: Wed, 17 Apr 2013 02:33:18 -0400
From: Jared Yanovich <slovichon@gmail.com>
To: freebsd-fs@freebsd.org
Subject: nfs client readdir eofflag
Message-ID: <20130417063318.GK14599@nightderanger.bender.mtx>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="W2ydbIOJmkm74tJ2"
Content-Disposition: inline
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 06:33:22 -0000


--W2ydbIOJmkm74tJ2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi, is there a reason why eofflag isn't set in nfsclient readdir()?

This now allows union mounts to work for NFS above NFS. =20

/sys/fs/nfsclient

Index: nfs_clvnops.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- nfs_clvnops.c	(revision 249568)
+++ nfs_clvnops.c	(working copy)
@@ -2221,6 +2221,7 @@
 			    !NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) {
 				mtx_unlock(&np->n_mtx);
 				NFSINCRGLOBAL(newnfsstats.direofcache_hits);
+	    			*ap->a_eofflag =3D 1;
 				return (0);
 			} else
 				mtx_unlock(&np->n_mtx);
@@ -2233,8 +2234,10 @@
 	tresid =3D uio->uio_resid;
 	error =3D ncl_bioread(vp, uio, 0, ap->a_cred);
=20
-	if (!error && uio->uio_resid =3D=3D tresid)
+	if (!error && uio->uio_resid =3D=3D tresid) {
 		NFSINCRGLOBAL(newnfsstats.direofcache_misses);
+	    	*ap->a_eofflag =3D 1;
+	}
 	return (error);
 }
=20

--W2ydbIOJmkm74tJ2
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (OpenBSD)

iQEcBAEBAgAGBQJRbkIuAAoJEPT+vgUENeYM0VMH+gI9DO9ReGDzWMA0gnC9clq8
NSy7rTvGCZ+0/BqAJ1e+COZLfrxX70GarigQNMkLKG1mrGv/lXzSFbE/KgXXZYNJ
lCPl/Cw2WyIobfNgXlbq4tFVZFmz3Lg1VRT8RezxyGeFruxI1aEtGP5ox+moImEu
+Qf2UdVP3R6sHbvT/ktxp98kwGH7r8rD3eg3J5H27SVSsQTa3QPNytaPliY4boI9
PtnS7iZ8s8MN5d9PXuXHAciWOyztMQqcniUzJ+EtbhcjS/68MuB1mj+UqwDHnPXL
LSBlcUPg7rUAk/oxQ0PeeRBnyxFUP/QwqDa3/LGMDursuVm45Zmt/bTh7r+YQlc=
=QQUV
-----END PGP SIGNATURE-----

--W2ydbIOJmkm74tJ2--

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 18:05:09 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 71D3642B
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 18:05:09 +0000 (UTC)
 (envelope-from zaphod@berentweb.com)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
 by mx1.freebsd.org (Postfix) with ESMTP id 5A93EA1F
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 18:05:08 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
 by sam.nabble.com with esmtp (Exim 4.72)
 (envelope-from <zaphod@berentweb.com>) id 1USWjD-0001X1-RF
 for freebsd-fs@freebsd.org; Wed, 17 Apr 2013 11:05:07 -0700
Date: Wed, 17 Apr 2013 11:05:07 -0700 (PDT)
From: Beeblebrox <zaphod@berentweb.com>
To: freebsd-fs@freebsd.org
Message-ID: <1366221907838-5804517.post@n5.nabble.com>
Subject: [ZFS] recover destroyed zpool with ZDB
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 18:05:09 -0000

I destroyed my zpool but forgot to take the tar backup of /home folder. This
was a single-HDD pool and I first did 'zpool destroy' then 'gpart destroy'
before realizing my error.

Since then, I have manually re-created the GPT partitions to the size they
were (testdisk did not correctly identify the geom) and there have been no
writes to the HDD.

After a lengthly discussion here:
http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-what-are-the-available-options-td5800299.html
and getting no result with:
# zpool import -D -f -R /bsdr -N -F -X 12018916494219117471 rescue =>
cannot import 'bsdr' as 'rescue': no such pool or dataset. Destroy and
re-create the pool from a backup source. 

I sent an email to an expert and was advised to look into zdb and the -F &
-X flags. Good news and bad news there. '# zdb -e -F 12018916494219117471'
gives a lot of output but this is conflicting because although there are no
errors, %used is showing zero:
       Traversing all blocks to verify checksums and verify nothing leaked
...
	No leaks (block sum matches space maps exactly)
	bp count:              43
	bp logical:        357888      avg:   8322
	bp physical:        36352      avg:    845     compression:   9.85
	bp allocated:       93184      avg:   2167     compression:   3.84
	bp deduped:             0    ref>1:      0   deduplication:   1.00
	SPA allocated:      93184     used:  0.00%

The zdb -F command is giving the internal info for the zpool but it is not
importing it, nor does it change the status to importable.
What can I read or change in the zdb command to get this to come online? The
zdb output is available as a link if needed.

Thanks and regards.


-----
10-Current-amd64-using ccache-portstree merged with marcuscom.gnome3 & xorg.devel

--
View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-with-ZDB-tp5804517.html
Sent from the freebsd-fs mailing list archive at Nabble.com.

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 18:53:41 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C22DE752
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 18:53:41 +0000 (UTC)
 (envelope-from amvandemore@gmail.com)
Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com
 [IPv6:2a00:1450:400c:c05::234])
 by mx1.freebsd.org (Postfix) with ESMTP id 62EEBEE4
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 18:53:41 +0000 (UTC)
Received: by mail-wi0-f180.google.com with SMTP id h11so808993wiv.7
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 11:53:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=TApCRvhVSbH/UbSTa5Mv0Ukm6f0W3ooKESoWZZQJS1o=;
 b=Bj5gm4i4RvL+O9H/nND0vM0NSZ25VL3Bf51gMFv09s57dahihCfcAxWWcBiCkQdNzE
 jG7MIAKW8LKY6a8zhIOdSRS5xujzgFW244CGc9pN63K8EPPcXrWA+zvqM83qt85Byon9
 VD7vU3bdcQqnyFeiuN4M41XjeyLFATvvNXTH43khjTCCIDRQmeA4R156CSjRFZgg3ZPc
 67l+Y42UemA05IiAQoLinQP+J843kzzsU+bWcdEkybHV10m06DJGOc/VXSz/mlBM2Kct
 102YxRdf/JdkQ7DXol75HW8g7ICVlYacXx8HgWYgC1qxQksmi7FIaWqHW+59wMPeR3wj
 kBIw==
MIME-Version: 1.0
X-Received: by 10.194.176.165 with SMTP id cj5mr13527854wjc.37.1366224820552; 
 Wed, 17 Apr 2013 11:53:40 -0700 (PDT)
Received: by 10.194.242.101 with HTTP; Wed, 17 Apr 2013 11:53:40 -0700 (PDT)
In-Reply-To: <1366221907838-5804517.post@n5.nabble.com>
References: <1366221907838-5804517.post@n5.nabble.com>
Date: Wed, 17 Apr 2013 13:53:40 -0500
Message-ID: <CA+tpaK24z2uF7yWp1wmJjYhN4ZFydqRRVSfKNGNQbhmf-P8+Dw@mail.gmail.com>
Subject: Re: [ZFS] recover destroyed zpool with ZDB
From: Adam Vande More <amvandemore@gmail.com>
To: Beeblebrox <zaphod@berentweb.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 18:53:41 -0000

On Wed, Apr 17, 2013 at 1:05 PM, Beeblebrox <zaphod@berentweb.com> wrote:

> I destroyed my zpool but forgot to take the tar backup of /home folder.
> This
> was a single-HDD pool and I first did 'zpool destroy' then 'gpart destroy'
> before realizing my error.
>
> Since then, I have manually re-created the GPT partitions to the size they
> were (testdisk did not correctly identify the geom) and there have been no
> writes to the HDD.
>
> After a lengthly discussion here:
>
> http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-what-are-the-available-options-td5800299.html
> and getting no result with:
> # zpool import -D -f -R /bsdr -N -F -X 12018916494219117471 rescue =>
> cannot import 'bsdr' as 'rescue': no such pool or dataset. Destroy and
> re-create the pool from a backup source.
>
> I sent an email to an expert and was advised to look into zdb and the -F &
> -X flags. Good news and bad news there. '# zdb -e -F 12018916494219117471'
> gives a lot of output but this is conflicting because although there are no
> errors, %used is showing zero:
>

One thing is that you keep using zpool import -D when the pool isn't in a
destroyed state.

-- 
Adam Vande More

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 19:16:21 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id BA6FD1D5
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 19:16:21 +0000 (UTC)
 (envelope-from zaphod@berentweb.com)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
 by mx1.freebsd.org (Postfix) with ESMTP id A12E7FFC
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 19:16:21 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
 by sam.nabble.com with esmtp (Exim 4.72)
 (envelope-from <zaphod@berentweb.com>) id 1USXq8-0007ps-Kr
 for freebsd-fs@freebsd.org; Wed, 17 Apr 2013 12:16:20 -0700
Date: Wed, 17 Apr 2013 12:16:20 -0700 (PDT)
From: Beeblebrox <zaphod@berentweb.com>
To: freebsd-fs@freebsd.org
Message-ID: <1366226180639-5804603.post@n5.nabble.com>
In-Reply-To: <CA+tpaK24z2uF7yWp1wmJjYhN4ZFydqRRVSfKNGNQbhmf-P8+Dw@mail.gmail.com>
References: <1366221907838-5804517.post@n5.nabble.com>
 <CA+tpaK24z2uF7yWp1wmJjYhN4ZFydqRRVSfKNGNQbhmf-P8+Dw@mail.gmail.com>
Subject: [ZFS] recover destroyed zpool with ZDB
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 19:16:21 -0000

Hi,
It's a long story by now and i was following volodymyr's suggestions.
Anyway, 'zpool list' no-longer shows the bsdr pool at all after having ran
# zdb -e -F 12018916494219117471
obviously, since the ada0p2 metadata was written into the zpool.cache file
with the above command, and zpool list reads the cache file.

Regards.


-----
10-Current-amd64-using ccache-portstree merged with marcuscom.gnome3 & xorg.devel

--
View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-with-ZDB-tp5804517p5804603.html
Sent from the freebsd-fs mailing list archive at Nabble.com.

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 19:32:41 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 490DA8DE
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 19:32:41 +0000 (UTC)
 (envelope-from amvandemore@gmail.com)
Received: from mail-wg0-f43.google.com (mail-wg0-f43.google.com [74.125.82.43])
 by mx1.freebsd.org (Postfix) with ESMTP id DC572172
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 19:32:40 +0000 (UTC)
Received: by mail-wg0-f43.google.com with SMTP id c11so1979672wgh.10
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 12:32:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=SHJaDQYu0C1lHjYTDgkliGzC6BK+9yX8Fmw2/1gbyQs=;
 b=EGQIsxfw71ewrGE1OnielJYR/Q7MwVaFyrh8NSS1xGmOnNvDOBqtpT+k+V+trulbmB
 FLF9blj2S/qJzonzSvesSc+rAPYMll1GZcLEUZg9ojZf8MSeDytIzY3fccLJPuF4X9XI
 E1wv/we2GhVwouwBVYhSVXHnKq+KQqizRj4F5b7w2RoIQ0mdnp8nV1RrhMcWiMVkhOFf
 YuRNy4yBr0Z3dkGFAgWyVfBon4PdKaz9bVm3Nc28vddvkZ3Im4igIKEAY7/o2V2qRARm
 tXS0sWK8LjHj6q2q1DfJX643PnbVt9XZSS9zxvSW/5aBxxvIjWNwIKDPC7pw9RlNjO6C
 yHVQ==
MIME-Version: 1.0
X-Received: by 10.194.5.4 with SMTP id o4mr7570099wjo.40.1366227154706; Wed,
 17 Apr 2013 12:32:34 -0700 (PDT)
Received: by 10.194.242.101 with HTTP; Wed, 17 Apr 2013 12:32:34 -0700 (PDT)
In-Reply-To: <1366226180639-5804603.post@n5.nabble.com>
References: <1366221907838-5804517.post@n5.nabble.com>
 <CA+tpaK24z2uF7yWp1wmJjYhN4ZFydqRRVSfKNGNQbhmf-P8+Dw@mail.gmail.com>
 <1366226180639-5804603.post@n5.nabble.com>
Date: Wed, 17 Apr 2013 14:32:34 -0500
Message-ID: <CA+tpaK1UL4RMKFoJdaPLDnfa14xyPjbafVbD-79dDEFVmxtPMg@mail.gmail.com>
Subject: Re: [ZFS] recover destroyed zpool with ZDB
From: Adam Vande More <amvandemore@gmail.com>
To: Beeblebrox <zaphod@berentweb.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 19:32:41 -0000

On Wed, Apr 17, 2013 at 2:16 PM, Beeblebrox <zaphod@berentweb.com> wrote:

> Hi,
> It's a long story by now and i was following volodymyr's suggestions.
> Anyway, 'zpool list' no-longer shows the bsdr pool at all after having ran
> # zdb -e -F 12018916494219117471
> obviously, since the ada0p2 metadata was written into the zpool.cache file
> with the above command, and zpool list reads the cache file.


If you can get it back to faulted state, the official procedure is here:

http://docs.oracle.com/cd/E19963-01/html/821-1448/gbbwl.html


-- 
Adam Vande More

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 21:38:11 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E64125B5
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 21:38:11 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B17478E6
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 21:38:11 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEANoVb1GDaFvO/2dsb2JhbABQhm26MYJqgRp0gh8BAQUjBFIbDgoCAg0ZAlkGiCeqeJJYgSONQwEzB4IzgRMDlwaRFIMnIIFs
X-IronPort-AV: E=Sophos;i="4.87,496,1363147200"; d="scan'208";a="26243098"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 17 Apr 2013 17:37:59 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CB583B4045;
 Wed, 17 Apr 2013 17:37:59 -0400 (EDT)
Date: Wed, 17 Apr 2013 17:37:59 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Jared Yanovich <slovichon@gmail.com>
Message-ID: <1761576953.936301.1366234679793.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20130417063318.GK14599@nightderanger.bender.mtx>
Subject: Re: nfs client readdir eofflag
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0
 (Linux)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 21:38:12 -0000

Jared Yanovich wrote:
> Hi, is there a reason why eofflag isn't set in nfsclient readdir()?
> 
> This now allows union mounts to work for NFS above NFS.
> 
This patch looks ok to me. (I don't know, but my guess is that, since
only the NFS server used eofflag for a long time, the code just didn't
bother setting it.)

If you aren't a src committer (I don't recognize your name), I will
put testing/committing this patch on my "to do" list. (If you are a
src committer, feel free to commit it.)

Thanks for reporting this, rick

> /sys/fs/nfsclient
> 
> Index: nfs_clvnops.c
> ===================================================================
> --- nfs_clvnops.c (revision 249568)
> +++ nfs_clvnops.c (working copy)
> @@ -2221,6 +2221,7 @@
> !NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) {
> mtx_unlock(&np->n_mtx);
> NFSINCRGLOBAL(newnfsstats.direofcache_hits);
> + *ap->a_eofflag = 1;
> return (0);
> } else
> mtx_unlock(&np->n_mtx);
> @@ -2233,8 +2234,10 @@
> tresid = uio->uio_resid;
> error = ncl_bioread(vp, uio, 0, ap->a_cred);
> 
> - if (!error && uio->uio_resid == tresid)
> + if (!error && uio->uio_resid == tresid) {
> NFSINCRGLOBAL(newnfsstats.direofcache_misses);
> + *ap->a_eofflag = 1;
> + }
> return (error);
> }

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 17 21:43:16 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id AB3B37B8
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 21:43:16 +0000 (UTC)
 (envelope-from mxb@alumni.chalmers.se)
Received: from mail-la0-x231.google.com (mail-la0-x231.google.com
 [IPv6:2a00:1450:4010:c03::231])
 by mx1.freebsd.org (Postfix) with ESMTP id 31E3092E
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 21:43:16 +0000 (UTC)
Received: by mail-la0-f49.google.com with SMTP id fs13so1204250lab.36
 for <freebsd-fs@freebsd.org>; Wed, 17 Apr 2013 14:43:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:from:content-type:message-id:mime-version:subject:date
 :references:to:in-reply-to:x-mailer:x-gm-message-state;
 bh=zzqy0WXuz+U2GqljW1+Rix3X0YNycdeAfZ1lpxlQ9Dk=;
 b=ciT6k6eZ2V8w95ZYHjA8LyBj43eQ8VnXFkyDPluGMy9R20jlodIzMwr7GKzC6ic+3O
 Iv96Av57CxHuXAgqK04oQbC0hGVpMLpKbNIDe2z1gGWxo3fF46JbltrN5C4GdFkVT3Oz
 LCIqWQtcKLc3K8MXcLkmCguyQv/KdHG8vVzm3qahr1Qk7KSf/KjSGW4RHFVukrF+G//c
 WKx2fmJ4o14XhzGju/WRg6bxVrMqHDa8XPUrUryXoXksrk3SQuRFswMthz+jcbG/cKCQ
 +hdtVcI+Idnq1Io2LoSdPpN0jfP7qED5TfvqhSPGBVquguIWu+5D9R+T3HisOV7CcBno
 i2ww==
X-Received: by 10.152.3.4 with SMTP id 4mr4411435lay.29.1366234994857;
 Wed, 17 Apr 2013 14:43:14 -0700 (PDT)
Received: from grey.home.unixconn.com (h-75-17.a183.priv.bahnhof.se.
 [46.59.75.17])
 by mx.google.com with ESMTPS id y9sm3301246lae.10.2013.04.17.14.43.13
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 17 Apr 2013 14:43:13 -0700 (PDT)
From: mxb <mxb@alumni.chalmers.se>
Message-Id: <2753912E-0B91-4F73-B956-9D558F16EEAE@alumni.chalmers.se>
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: ZFS: ZIL device export/import
Date: Wed, 17 Apr 2013 23:43:11 +0200
References: <5A2824CA-2A67-47FA-AB27-20C6EBD2C501@alumni.chalmers.se>
 <51699B8E.7050003@platinum.linux.pl>
 <BCBD7CDE-1BBB-4855-9240-897770FEF822@alumni.chalmers.se>
 <op.wvhrfrhj8527sy@pinky>
 <2DE8AD5E-B84C-4D88-A242-EA30EA4A68FD@alumni.chalmers.se>
 <op.wvhyvkzx8527sy@pinky>
 <9EE9328B-40B1-4510-B404-242D0F2C7697@alumni.chalmers.se>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
In-Reply-To: <9EE9328B-40B1-4510-B404-242D0F2C7697@alumni.chalmers.se>
X-Mailer: Apple Mail (2.1503)
X-Gm-Message-State: ALoCoQmzQcwJKbfFRUFR+2d3+nD7u5N/FYki3cxnDhP/r8lYsxzw8LPfYp+GYxrnCod04OwxTthf
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Apr 2013 21:43:16 -0000


Thanks everyone, whom replied!

//mxb

On 14 apr 2013, at 12:01, mxb <mxb@alumni.chalmers.se> wrote:

>=20
> Well, I'm trying to preclude any undesired effect in the whole setup, =
as this is going to production.
>=20
> SAS-link might not be a bottleneck here and I'm overreacting.
>=20
> Locally ,on per HU basis, I have 6Gbit/s SAS/SATA. Both card and =
disks(SSD) attached to it.
> SAS Expander is also 6Gbit/s, attaching 10k RPM SAS mechanical disks =
on JBOD.
>=20
> I use Intel 520 SSD and Pulsar SSD in this setup.
> ZIL resided locally on Intel SSD(per HU), but now will probably move =
to Pulsar SSD(moved to JBOD as those disks have dual SAS/SATA link). =
L2ARC resided on Pulsar (Pulsar was in each HU. eg. I have 2x Pulsar).
>=20
> Looks like I have to re-design the whole setup, as of ZIL.
>=20
> //mxb
>=20
>=20
> On 13 apr 2013, at 22:51, Ronald Klop <ronald-freebsd8@klop.yi.org> =
wrote:
>=20
>> I thought the idea of ZIL is a fast buffer before the write to slow =
disk. Are you really sure the SAS expander is the bottleneck in the =
system instead of the disks?
>=20


From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 18 01:37:00 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 518D896B
 for <freebsd-fs@freebsd.org>; Thu, 18 Apr 2013 01:37:00 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1BA641C2
 for <freebsd-fs@freebsd.org>; Thu, 18 Apr 2013 01:36:59 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEAG9Nb1GDaFvO/2dsb2JhbABQgzyDMb0mgRZ0gh8BAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARwEh20GDKpyklmBI4xFfjQHgjOBEwOTOYEMgkGBI49xgycgMoEFNQ
X-IronPort-AV: E=Sophos;i="4.87,496,1363147200"; d="scan'208";a="24406644"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 17 Apr 2013 21:36:52 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 80C41B403E;
 Wed, 17 Apr 2013 21:36:52 -0400 (EDT)
Date: Wed, 17 Apr 2013 21:36:52 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Paul van der Zwan <paulz@vanderzwan.org>
Message-ID: <986577218.940691.1366249012504.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <495AEA10-9B8F-4A03-B706-79BF43539482@vanderzwan.org>
Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0
 (Linux)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Apr 2013 01:37:00 -0000

Paul van der Zwan wrote:
> On 12 Apr 2013, at 16:28 , Paul van der Zwan <paulz@vanderzwan.org>
> wrote:
> 
> >
> > I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server
> > and I noticed that make buildworld seem to take much longer
> > when the clients mount /usr/src and /usr/obj over NFS V4 than when
> > they use V3.
> > Unfortunately I have to use V4 as a buildworld on V3 hangs the
> > server completely...
> > I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of
> > a few thousand per second
> > and if I snoop the traffic I see the same filenames appear over and
> > over again.
> > It looks like the client is not caching anything at all and using a
> > server request everytime.
> > I use the default mount options:
> > 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls)
> > 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls)
> > 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls)
> >
> >
> 
just fyi, on a kernel build test I just did, I am seeing a
much larger number of Lookups for NFSv4 vs NFSv3.

I'll post again if/when I come up with a fix, rick

> I had a look with dtrace
> $ sudo dtrace -n '::getattr:start { @[stack()]=count();}'
> and it seems the vast majority of the calls to getattr are from open()
> and close() system calls.:
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_getattr+0x280
> kernel`VOP_GETATTR_APV+0x74
> kernel`nfs_lookup+0x3cc
> kernel`VOP_LOOKUP_APV+0x74
> kernel`lookup+0x69e
> kernel`namei+0x6df
> kernel`kern_execve+0x47a
> kernel`sys_execve+0x43
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 26
> 
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_close+0x3e9
> kernel`VOP_CLOSE_APV+0x74
> kernel`kern_execve+0x15c5
> kernel`sys_execve+0x43
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 26
> 
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_getattr+0x280
> kernel`VOP_GETATTR_APV+0x74
> kernel`nfs_lookup+0x3cc
> kernel`VOP_LOOKUP_APV+0x74
> kernel`lookup+0x69e
> kernel`namei+0x6df
> kernel`vn_open_cred+0x330
> kernel`vn_open+0x1c
> kernel`kern_openat+0x207
> kernel`kern_open+0x19
> kernel`sys_open+0x18
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 2512
> 
> kernel`newnfs_request+0x631
> kernel`nfscl_request+0x75
> kernel`nfsrpc_getattr+0xbe
> kernel`nfs_close+0x3e9
> kernel`VOP_CLOSE_APV+0x74
> kernel`vn_close+0xee
> kernel`vn_closefile+0xff
> kernel`_fdrop+0x3a
> kernel`closef+0x332
> kernel`kern_close+0x183
> kernel`sys_close+0xb
> kernel`amd64_syscall+0x3bf
> kernel`0xffffffff80784947
> 2530
> 
> I had a look at the source of nfs_close and could not find a call to
> nfsrpc_getattr, and I am wondering why close would be calling getattr
> anyway.
> If the file is closed what do we care about it's attributes....
> 
> 
> Paul
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 18 05:15:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4F47B93B
 for <freebsd-fs@freebsd.org>; Thu, 18 Apr 2013 05:15:18 +0000 (UTC)
 (envelope-from zaphod@berentweb.com)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
 by mx1.freebsd.org (Postfix) with ESMTP id 33924B0A
 for <freebsd-fs@freebsd.org>; Thu, 18 Apr 2013 05:15:17 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
 by sam.nabble.com with esmtp (Exim 4.72)
 (envelope-from <zaphod@berentweb.com>) id 1UShBl-0008UI-60
 for freebsd-fs@freebsd.org; Wed, 17 Apr 2013 22:15:17 -0700
Date: Wed, 17 Apr 2013 22:15:17 -0700 (PDT)
From: Beeblebrox <zaphod@berentweb.com>
To: freebsd-fs@freebsd.org
Message-ID: <1366262117117-5804714.post@n5.nabble.com>
In-Reply-To: <CA+tpaK1UL4RMKFoJdaPLDnfa14xyPjbafVbD-79dDEFVmxtPMg@mail.gmail.com>
References: <1366221907838-5804517.post@n5.nabble.com>
 <CA+tpaK24z2uF7yWp1wmJjYhN4ZFydqRRVSfKNGNQbhmf-P8+Dw@mail.gmail.com>
 <1366226180639-5804603.post@n5.nabble.com>
 <CA+tpaK1UL4RMKFoJdaPLDnfa14xyPjbafVbD-79dDEFVmxtPMg@mail.gmail.com>
Subject: [ZFS] recover destroyed zpool with ZDB
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Apr 2013 05:15:18 -0000

Thanks, but that document does not appear very relevant to my situation.
Also, the issue is not as straight-forward as it seems. The DEFAULTED status
of the zpool was a 'false positive', because 

A- The "present pool" did not accept any zpool commands and always gave
message like
no such pool or dataset ... recover the pool from a backup source.
B- The more relevant on-disk metadata showed and still shows this:
# zdb -l /dev/ada0p2 => all 4 labels intact and pool_guid:
12018916494219117471
vdev_tree:  type: 'disk'   id: 0    guid: 17860002997423999070

While the pool showing up in the zpool list was/is clearly in a worse state
that the above pool:
# zdb -l /dev/ada0 => only label 2 intact and pool_guid:
16018525702691588432

In my opinion, this problem is more similar to a "Resolving a Missing
Device" problem rather than data corruption. Unfortunately, missing device
repairs focus on mirrored setups and no decent document on missing device of
single-HDD pool.


-----
10-Current-amd64-using ccache-portstree merged with marcuscom.gnome3 & xorg.devel

--
View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-with-ZDB-tp5804517p5804714.html
Sent from the freebsd-fs mailing list archive at Nabble.com.

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 18 18:49:53 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 360AB935;
 Thu, 18 Apr 2013 18:49:53 +0000 (UTC) (envelope-from ken@kdm.org)
Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81])
 by mx1.freebsd.org (Postfix) with ESMTP id BE290138E;
 Thu, 18 Apr 2013 18:49:52 +0000 (UTC)
Received: from nargothrond.kdm.org (localhost [127.0.0.1])
 by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r3IInpvi019293;
 Thu, 18 Apr 2013 12:49:51 -0600 (MDT)
 (envelope-from ken@nargothrond.kdm.org)
Received: (from ken@localhost)
 by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r3IInp7l019292;
 Thu, 18 Apr 2013 12:49:51 -0600 (MDT) (envelope-from ken)
Date: Thu, 18 Apr 2013 12:49:51 -0600
From: "Kenneth D. Merry" <ken@FreeBSD.org>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: patches to add new stat(2) file flags
Message-ID: <20130418184951.GA18777@nargothrond.kdm.org>
References: <20130307000533.GA38950@nargothrond.kdm.org>
 <20130307222553.P981@besplex.bde.org>
 <20130308232155.GA47062@nargothrond.kdm.org>
 <20130310181127.D2309@besplex.bde.org>
 <20130409190838.GA60733@nargothrond.kdm.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130409190838.GA60733@nargothrond.kdm.org>
User-Agent: Mutt/1.4.2i
Cc: arch@FreeBSD.org, fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Apr 2013 18:49:53 -0000

On Tue, Apr 09, 2013 at 13:08:38 -0600, Kenneth D. Merry wrote:
> On Sun, Mar 10, 2013 at 19:21:57 +1100, Bruce Evans wrote:
> > On Fri, 8 Mar 2013, Kenneth D. Merry wrote:
> > 
> > >On Fri, Mar 08, 2013 at 00:37:15 +1100, Bruce Evans wrote:
> > >>On Wed, 6 Mar 2013, Kenneth D. Merry wrote:
> > >>
> > >>>I have attached diffs against head for some additional stat(2) file 
> > >>>flags.
> > >>>
> > >>>The primary purpose of these flags is to improve compatibility with CIFS,
> > >>>both from the client and the server side.
> > >>>...
> > >>
> > >>I missed looking at the diffs in my previous reply.
> > >>
> > >>% --- //depot/users/kenm/FreeBSD-test3/bin/chflags/chflags.1	2013-03-04
> > >>17:51:12.000000000 -0700
> > >>% +++ /usr/home/kenm/perforce4/kenm/FreeBSD-test3/bin/chflags/chflags.1
> > >>2013-03-04 17:51:12.000000000 -0700
> > >>% --- /tmp/tmp.49594.86	2013-03-06 16:42:43.000000000 -0700
> > >>% +++ /usr/home/kenm/perforce4/kenm/FreeBSD-test3/bin/chflags/chflags.1
> > >>2013-03-06 14:47:25.987128763 -0700
> > >>% @@ -117,6 +117,16 @@
> > >>%  set the user immutable flag (owner or super-user only)
> > >>%  .It Cm uunlnk , uunlink
> > >>%  set the user undeletable flag (owner or super-user only)
> > >>% +.It Cm system , usystem
> > >>% +set the Windows system flag (owner or super-user only)
> > >>
> > >>This begins unsorting of the list.
> > >
> > >Fixed.
> > >
> > >>It's not just a Windows flag, since it also works in DOS.
> > >
> > >Fixed.
> > 
> > Thanks.  Hopefully all the simple bugs are fixed now.
> > 
> > >>"Owner or" is too strict for msdosfs, since files can only have a
> > >>single owner so it is controlling access using groups is needed.  I
> > >>use owner root and group msdosfs for msdosfs mounts.  This works for
> > >>normal operations like open/read/write, but fails for most attributes
> > >>including file flags.  msdosfs doesn't support many attributes but
> > >>this change is supposed to add support for 3 new file flags so it would
> > >>be good if it didn't restrict the support to root.
> > >
> > >I wasn't trying to change the existing security model for msdosfs, but if
> > >you've got a suggested patch to fix it I can add that in.
> > 
> > I can't think of anything better than making group write permission enough
> > for attributes.
> > 
> > msdosfs also has some style bugs in this area.  It uses VOP_ACCESS()
> > with VADMIN for the non-VA_UTIMES_NULL case of utimes(), but for all
> > other attributes it hard-codes a direct uid check followed a
> > priv_check_cred() with PRIV_VFS_ADMIN.  VADMIN requires even more than
> > user write permission for POSIX file systems and using it unchanged
> > for all the attributes would be even more restrictive unless we changed
> > it, but it would be easier to make it uniformly less restrictive for
> > msdosfs by using it consistently.
> > 
> > Oops, that was in the old version of ffs.  ffs now has related
> > complications and unnecessary style bugs (verboseness and misformatting)
> > to support ACLs.  It now uses VOP_ACCESSX() with VWRITE_ATTRIBUTES for
> > utimes(), and VOP_ACCESSX() with other VFOO for all attributes except
> > flags.  It still uses VOP_ACCESS() with VADMIN() for flags.
> > 
> > >>...
> > >>%  .It Dv SF_ARCHIVED
> > >>...
> > >>% +Filesystems in FreeBSD may or may not have special handling for this
> > >>flag.
> > >>% +For instance, ZFS tracks changes to files and will clear this bit when 
> > >>a
> > >>% +file is updated.
> > >>% +UFS only stores the flag, and relies on the application to change it 
> > >>when
> > >>% +needed.
> > >>
> > >>I think that is useless, since changing it is needed whenever the file
> > >>changes, and applications can do that (short of running as daemons and
> > >>watching for changes).
> > >
> > >Do you mean applications can't do that or can?
> > 
> > Oops, can't.
> > 
> > It is still hard for users to know how their file system supports.
> > Even programmers don't know that it is backwards :-).
> > 
> > >>% --- //depot/users/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c
> > >>2013-03-04 17:51:12.000000000 -0700
> > >>% +++
> > >>/usr/home/kenm/perforce4/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c
> > >>2013-03-04 17:51:12.000000000 -0700
> > >>% --- /tmp/tmp.49594.370	2013-03-06 16:42:43.000000000 -0700
> > >>% +++
> > >>/usr/home/kenm/perforce4/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c
> > >>2013-03-06 14:49:47.179130318 -0700
> > >>% @@ -345,8 +345,17 @@
> > >>%  		vap->va_birthtime.tv_nsec = 0;
> > >>%  	}
> > >>%  	vap->va_flags = 0;
> > >>% +	/*
> > >>% +	 * The DOS Archive attribute means that a file needs to be
> > >>% +	 * archived.  The BSD SF_ARCHIVED attribute means that a file has
> > >>% +	 * been archived.  Thus the inversion here.
> > >>% +	 */
> > >>
> > >>No need to document it again.  It goes without saying that ARCHIVE
> > >>!= ARCHIVED.
> > >
> > >I disagree.  It wasn't immediately obvious to me that SF_ARCHIVED was
> > >generally used as the inverse of the DOS Archived bit until I started
> > >digging into this.  If this helps anyone figure that out more quickly, it's
> > >useful.
> > 
> > The surprising thing is that it is backwards in FreeBSD and not really
> > supported except in msdosfs.  Now several file systems have the comment
> > about it being inverted, but man pages still don't.
> 
> I made the change to UF_ARCHIVE, and updated the man pages.
> 
> > >>% @@ -420,12 +429,21 @@
> > >>%  			if (error)
> > >>%  				return (error);
> > >>%  		}
> > >>
> > >>The permissions check before this is delicate and was broken and is
> > >>more broken now.  It is still short-circuit to handle setting the
> > >>single flag that used to be supported, and is slightly broken for that:
> > >>- unprivileged user asks to set ARCHIVE by passing !SF_ARCHIVED.  We
> > >>  allow that, although this may toggle the flag and normal semantics
> > >>  for SF flags is to not allow toggling.
> > >>- unprivileged user asks to clear ARCHIVE by passing SF_ARCHIVED.  We
> > >>  don't allow that.  But we should allow preserving ARCHIVE if it is
> > >>  already clear.
> > >>The bug wasn't very important when only 1 flag was supported.  Now it
> > >>prevents unprivileged users managing the new UF flags if ARCHIVE is
> > >>clear.  Fortunately, this is the unusual case.  Anyway, unprivileged
> > >>users can set ARCHIVE by doing some other operation.  Even the chflags()
> > >>operation should set ARCHIVE and thus allow further chflags()'s that now
> > >>keep ARCHIVE set.  Except it is very confusing if a chflags() asks for
> > >>ARCHIVE to be clear.  This request might be just to try to preserve
> > >>the current setting and not want it if other things are changed, or
> > >>it might be to purposely clear it.  Changing it from set to clear should
> > >>still be privileged.
> > >
> > >I changed it to allow setting or clearing SF_ARCHIVED.  Now I can set or
> > >clear the flag as non-root:
> > 
> > Actually, it seems OK, since there are no old or new SF_ immututable flags.
> > Some of the actions are broken in the old and new code for directories --
> > see below.
> > 
> > >>See the more complicated permissions check in ffs.  It would be safest
> > >>to duplicate most of it, to get different permissions checking for the
> > >>SF and UF flags.  Then decide if we want to keep allowing setting
> > >>ARCHIVE without privilege.
> > >
> > >I think we should allow getting and setting SF_ARCHIVED without special
> > >privileges.  Given how it is generally used, I don't think it should be
> > >restricted to the super-user.
> > 
> > I don't really like that since changing the flags is mainly needed for
> > the failry privileged operation of managing other OS's file systems.
> > However, since we're mapping the SYSTEM flag to a UF_ flag, the SYSTEM
> > flag will require less privilege than the ARCHIVE flag.  This is backwards,
> > so we might as well require less privilege for ARCHIVE too.  I think we,
> > that is, you should use a new UF_ARCHIVE flag with the correct sense.
> 
> Okay, done.  The patches are attached with UF_ARCHIVE used instead of
> SF_ARCHIVED, with the sense reversed.
> 
> > >Can you provide some code demonstrating how the permissions code should
> > >be changed in msdosfs?  I don't know that much about that sort of thing,
> > >so I'll probably spend an inordinate amount of time stumbling
> > >through it.
> > 
> > Now I think only cleanups are needed.
> 
> Okay.
> 
> > >>%  			return EOPNOTSUPP;
> > >>%  		if (vap->va_flags & SF_ARCHIVED)
> > >>%  			dep->de_Attributes &= ~ATTR_ARCHIVE;
> > >>%  		else if (!(dep->de_Attributes & ATTR_DIRECTORY))
> > >>%  			dep->de_Attributes |= ATTR_ARCHIVE;
> > >>
> > >>The comment before this says that we ignore attmps to set ATTR_ARCHIVED
> > >>for directories.  However, it is out of date.  WinXP allows setting it
> > >>and all the new flags for directories, and so do we.
> > >
> > >Do you mean we allow setting it in UFS, or where?  Obviously the code above
> > >won't set it on a directory.
> > 
> > I meant it here.  Actually, the comment matches the code -- I somehow missed
> > the test in the code.  However, the code is wrong.  All directories except
> > the root directory have this and other attributes, but FreeBSD refuses to
> > set them.  More below.
> > 
> > >>The WinXP attrib command (at least on a FAT32 fs) doesn't allow setting
> > >>or clearing ARCHIVE (even if it is already set or clear) if any of
> > >>HIDDEN, READONLY or SYSTEM is already set and remains set after the
> > >>command.  Thus the HRS attributes act a bit like immutable flags, but
> > >>subtly differently.  (ffs has the quite different and worse behaviour
> > >>of allowing chflags() irrespective of immutable flags being set before
> > >>or after, provided there is enough privilege to change the immutable
> > >>flags.) Anyway, they should all give some aspects of immutability.
> > >
> > >We could do that for msdosfs, but why add more things for the user to trip
> > >over given how the filesystem is typically used?  Most people probably
> > >use it for USB thumb drives these days.  Or perahps on a dual boot system
> > >to access their Windows partition.
> > 
> > The small data drives won't have many files with attributes (except
> > ARCHIVE).  For multiple-boot, I think the permssions shouldn't be too
> > much different than the foreign OS's.  I used not to worry about this
> > and liked deleting WinXP files without asking it, but recently I spent
> > a lot of time recovering a WinXP ntfs partition and changed a bit too
> > much using FreeBSD and Cygwin because I didn't understand the
> > permissions (especially ACLs).  ntfs in FreeBSD was less than r/o so it
> > couldn't even back up the permissions (for file flags, it returned the
> > garbage in its internal inode flags without translation...).
> > 
> > >*** src/bin/chflags/chflags.1.orig
> > >--- src/bin/chflags/chflags.1
> > >***************
> > >*** 101,120 ****
> > >  .Bl -tag -offset indent -width ".Cm opaque"
> > >  .It Cm arch , archived
> > >  set the archived flag (super-user only)
> > >  .It Cm opaque
> > >  set the opaque flag (owner or super-user only)
> > >- .It Cm nodump
> > >- set the nodump flag (owner or super-user only)
> > >  .It Cm sappnd , sappend
> > 
> > The opaque flag is UF_ too.
> 
> Yes, but all of the flag descriptions are sorted in alphabetical order.
> How would you suggest sorting them instead?  (SF first and then UF, both in
> some version of alphabetical order?)
> 
> > >+ .It Cm snapshot
> > >+ set the snapshot flag (most filesystems do not allow changing this flag)
> > 
> > I think none do.  It can only be displayed.
> 
> Fixed.
> 
> > chflags(1) doesn't display flags, so this shouldn't be here.  The problem
> > is that this man page is the only place where the flag names are documented.
> > ls(1) and strtofflags(3) just point to here.  strtofflags(3) says that the
> > flag names are documented here, but ls(1) just has an Xref to here.
> 
> I fixed ls(1) at least.
> 
> > >*** src/lib/libc/sys/chflags.2.orig
> > >--- src/lib/libc/sys/chflags.2
> > >--- 71,127 ----
> > >  the following values
> > >  .Pp
> > >  .Bl -tag -width ".Dv SF_IMMUTABLE" -compact -offset indent
> > >! .It Dv SF_APPEND
> > >  The file may only be appended to.
> > >  .It Dv SF_ARCHIVED
> > >! The file has been archived.
> > >! This flag means the opposite of the Windows and CIFS 
> > >FILE_ATTRIBUTE_ARCHIVE
> > 
> > DOS, Windows and CIFS...
> 
> Fixed.
> 
> > >! attribute.
> > >! That attribute means that the file should be archived, whereas
> > >! .Dv SF_ARCHIVED
> > >! means that the file has been archived.
> > >! Filesystems in FreeBSD may or may not have special handling for this 
> > >flag.
> > >! For instance, ZFS tracks changes to files and will clear this bit when a
> > >! file is updated.
> > 
> > Does zfs clear it in other circumstances?  WinXP doesn't for msdosfs (or
> > ntfs?), but FreeBSD clears it when changing some attributes, even for
> > null changes (these are: times except for atimes, and the HIDDEN attribute
> > when it is changed by chmod() -- even for null changes --, but not for
> > the HIDDEN attribute when it is changed (or preserved) by chflags() in
> > your new code).  I want to to be cleared for metadata so that backup
> > utilities can trust the ARCHIVE flag for metadata changes.
> 
> Well, it does look like changing a file or touching it causes the archive
> flag to get set with ZFS:
> 
> # touch foo
> # ls -lao foo
> -rw-r--r--  1 root  wheel  uarch 0 Apr  8 21:45 foo
> # chflags 0 foo
> # ls -lao foo
> -rw-r--r--  1 root  wheel  - 0 Apr  8 21:45 foo
> # echo "hello" >> foo
> # ls -lao foo
> -rw-r--r--  1 root  wheel  uarch 6 Apr  8 21:46 foo
> # chflags 0 foo
> # ls -lao foo
> -rw-r--r--  1 root  wheel  - 6 Apr  8 21:46 foo
> # touch foo
> # ls -lao foo
> -rw-r--r--  1 root  wheel  uarch 6 Apr  8 21:46 foo
> 
> > >+ .It Dv UF_IMMUTABLE
> > >+ The file may not be changed.
> > >+ Filesystems may use this flag to maintain compatibility with the Windows 
> > >and
> > >+ CIFS FILE_ATTRIBUTE_READONLY attribute.
> > 
> > So READONLY is only mapped to UFS_IMMUTABLE if it gives immutability?
> 
> No, it's mapped to whatever the CIFS server decides.  In my changes to
> Likewise, I mapped it to UF_IMMUTABLE.  I mapped UF_IMMUTABLE to the ZFS
> READONLY flag.  As Pawel pointed out, there has been some talk on the
> Illumos developers list about just storing the READONLY bit and not
> enforcing it in ZFS:
> 
> http://www.listbox.com/member/archive/182179/2013/03/sort/time_rev/page/2/?search_for=readonly
> 
> That complicates things somewhat in the Illumos CIFS server, and so I think
> it's a reasonable thing to just record the bit and let the CIFS server
> enforce things where it needs to.
> 
> UFS does honor the UF_IMMUTABLE flag, so it may be that we need to create
> a UF_READONLY flag that corresponds to the DOS readonly flag and is only
> stored, and the enforcement would happen in the CIFS server.  
> 
> > >*** src/sys/fs/msdosfs/msdosfs_vnops.c.orig
> > >--- src/sys/fs/msdosfs/msdosfs_vnops.c
> > >***************
> > >*** 415,431 ****
> > >  		 * set ATTR_ARCHIVE for directories `cp -pr' from a more
> > >  		 * sensible filesystem attempts it a lot.
> > >  		 */
> > >! 		if (vap->va_flags & SF_SETTABLE) {
> > >  			error = priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0);
> > >  			if (error)
> > >  				return (error);
> > >  		}
> > >! 		if (vap->va_flags & ~SF_ARCHIVED)
> > >  			return EOPNOTSUPP;
> > >  		if (vap->va_flags & SF_ARCHIVED)
> > >  			dep->de_Attributes &= ~ATTR_ARCHIVE;
> > >  		else if (!(dep->de_Attributes & ATTR_DIRECTORY))
> > >  			dep->de_Attributes |= ATTR_ARCHIVE;
> > >  		dep->de_flag |= DE_MODIFIED;
> > >  	}
> > >
> > >--- 424,448 ----
> > >  		 * set ATTR_ARCHIVE for directories `cp -pr' from a more
> > >  		 * sensible filesystem attempts it a lot.
> > >  		 */
> > >! 		if (vap->va_flags & (SF_SETTABLE & ~(SF_ARCHIVED))) {
> > 
> > Excessive parentheses.
> 
> Fixed, by moving to UF_ARCHIVE.
> 
> > >  			error = priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0);
> > >  			if (error)
> > >  				return (error);
> > >  		}
> > 
> > VADMIN is still needed, and that is too strict.  This is a general problem
> > and should be fixed separately.
> 
> I took out the check, since I changed the code to use UF_ARCHIVE instead of
> SF_ARCHIVED.
> 
> > >! 		if (vap->va_flags & ~(SF_ARCHIVED | UF_HIDDEN | UF_SYSTEM))
> > >  			return EOPNOTSUPP;
> > >  		if (vap->va_flags & SF_ARCHIVED)
> > >  			dep->de_Attributes &= ~ATTR_ARCHIVE;
> > >  		else if (!(dep->de_Attributes & ATTR_DIRECTORY))
> > >  			dep->de_Attributes |= ATTR_ARCHIVE;
> > >+ 		if (vap->va_flags & UF_HIDDEN)
> > >+ 			dep->de_Attributes |= ATTR_HIDDEN;
> > >+ 		else
> > >+ 			dep->de_Attributes &= ~ATTR_HIDDEN;
> > >+ 		if (vap->va_flags & UF_SYSTEM)
> > >+ 			dep->de_Attributes |= ATTR_SYSTEM;
> > >+ 		else
> > >+ 			dep->de_Attributes &= ~ATTR_SYSTEM;
> > >  		dep->de_flag |= DE_MODIFIED;
> > >  	}
> > 
> > Technical old and new problems with msdosfs:
> > - all directories except the root directory support the 3 attributes
> >   handled above, and READONLY
> > - the special case for the root directory is because before FAT32, the
> >   root directory didn't have an entry for itself (and was otherwise
> >   special).  With FAT32, the root directory is not so special, but
> >   still doesn't have an entry for itself.
> > - thus the old code in the above is wrong for all directories except
> >   the root directory
> > - thus the new code in the above is wrong for the root directory.  It
> >   will make changes to the in-core denode.  These can be seen by stat()
> >   for a while, but go away when the vnode is recycled.
> > - other code is wrong for directories too.  deupdat() refuses to
> >   convert from the in-core denode to the disk directory entry for
> >   directories.  So even when the above changes values for directories,
> >   the changes only get synced to the disk accidentally when there is
> >   a large change (such as for extending the directory), to the directory
> >   entry.
> > - being the root directory is best tested for using VV_ROOT.  I use the
> >   following to fix the corresponding bugs in utimes():
> > 
> > 		/* Was: silently ignore the non-error or error for all dirs. 
> > 		*/
> > 		if (DETOV(dep)->v_vflag & VV_ROOT)
> > 			return (EINVAL);
> > 		/* Otherwise valid. */
> > 
> >   deupdat() needs a similar change to not ignore all directories.
> 
> Okay, I think these issues should now be fixed.  We now refuse to change
> attributes only on the root directory.  And I updatd deupdat() to do the
> same.
> 
> When a directory is created or a file is added, the archive bit is not
> changed on the directory.  Not sure if we need to do that or not.  (Simply
> changing msdosfs_mkdir() to set ATTR_ARCHIVE was not enough to get the
> archive bit set on directory creation.)

Bruce, any comment on this?

Thanks,

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 18 19:14:13 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 62D2938E;
 Thu, 18 Apr 2013 19:14:13 +0000 (UTC)
 (envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 3B3CF160F;
 Thu, 18 Apr 2013 19:14:13 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3IJED8A015352;
 Thu, 18 Apr 2013 19:14:13 GMT
 (envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3IJEDs2015351;
 Thu, 18 Apr 2013 19:14:13 GMT (envelope-from linimon)
Date: Thu, 18 Apr 2013 19:14:13 GMT
Message-Id: <201304181914.r3IJEDs2015351@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Subject: Re: kern/177966: [zfs] resilver completes but subsequent scrub
 reports errors
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Apr 2013 19:14:13 -0000

Synopsis: [zfs] resilver completes but subsequent scrub reports errors

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Thu Apr 18 19:13:57 UTC 2013
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=177966

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 19 17:43:52 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 3C881DFA;
 Fri, 19 Apr 2013 17:43:52 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id F0F985EA;
 Fri, 19 Apr 2013 15:57:28 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7A1061041552;
 Fri, 19 Apr 2013 22:53:51 +1000 (EST)
Date: Fri, 19 Apr 2013 22:53:50 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: "Kenneth D. Merry" <ken@FreeBSD.org>
Subject: Re: patches to add new stat(2) file flags
In-Reply-To: <20130418184951.GA18777@nargothrond.kdm.org>
Message-ID: <20130419215624.L1262@besplex.bde.org>
References: <20130307000533.GA38950@nargothrond.kdm.org>
 <20130307222553.P981@besplex.bde.org>
 <20130308232155.GA47062@nargothrond.kdm.org>
 <20130310181127.D2309@besplex.bde.org>
 <20130409190838.GA60733@nargothrond.kdm.org>
 <20130418184951.GA18777@nargothrond.kdm.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=A8I0pNqG c=1 sm=1 a=n2O7wv11oSwA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=YOiZBDKP_E4A:10
 a=QuKyM733q63FOVycHlwA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: arch@FreeBSD.org, fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Apr 2013 17:43:52 -0000

On Thu, 18 Apr 2013, Kenneth D. Merry wrote:

> On Tue, Apr 09, 2013 at 13:08:38 -0600, Kenneth D. Merry wrote:
>> ...
>> Okay, I think these issues should now be fixed.  We now refuse to change
>> attributes only on the root directory.  And I updatd deupdat() to do the
>> same.
>>
>> When a directory is created or a file is added, the archive bit is not
>> changed on the directory.  Not sure if we need to do that or not.  (Simply
>> changing msdosfs_mkdir() to set ATTR_ARCHIVE was not enough to get the
>> archive bit set on directory creation.)
>
> Bruce, any comment on this?

I didn't get around to looking at it closely.  Just had a quick look at
the msdosfs parts.

Apparently we are already doing the same as WinXP for ATTR_ARCHIVE on
directories.  Not the right thing, but:
- don't set it on directory creation
- don't set it on directory modification
- allow setting and clearing it (with your changes).

@ *** src/lib/libc/sys/chflags.2.orig
@ --- src/lib/libc/sys/chflags.2
@ ***************
@ *** 112,137 ****
@ ...
@ --- 112,170 ----
@ ...
@ + .It Dv UF_IMMUTABLE
@ + The file may not be changed.
@ + Filesystems may use this flag to maintain compatibility with the DOS, Windows
@ + and CIFS FILE_ATTRIBUTE_READONLY attribute.

msdosfs doesn't use this yet.  It uses ATTR_READONLY, and doesn't map this
to or from UF_IMMUTABLE.  I think I want ATTR_READONLY to be a flag and
not affect the file permissions (just like immutable flags normally don't
affect the file permissions.

Does CIFS FILE_ATTRIBUTE_READONLY have exactly the same semantics as
IMMUTABLE?  That is, does it prevent all operations on the file and the
file's metadata except read()?  For IMMUTABLE, the other operations that
it disallows include setattr(), rename() and unlink().

Well it doesn't in WinXP using Cygwin.  I made a directory with attributes
+R, and this didn't prevent creating files in the directory or rmdir of
the directory.  Even attributes +R +H +S didn't prevent these operations.
Maybe +R isn't really used for directories, like +A.  Then for a file with
+R +H +S:
- rm asked before deleting it (+R changed its fake permissions from
   rw-r--r-- to r--r--r--).
- touching it succeeded
- attrib on it succeeded
- writing it failed.
So it seems that in WinXP, ATTR_READONLY is ignored for directories, and
more like the !writeable permission than the immutable flag.

@ *** src/sys/fs/msdosfs/msdosfs_denode.c.orig
@ --- src/sys/fs/msdosfs/msdosfs_denode.c
@ ***************
@ *** 300,307 ****
@   	if ((dep->de_flag & DE_MODIFIED) == 0)
@   		return (0);
@   	dep->de_flag &= ~DE_MODIFIED;
@ ! 	if (dep->de_Attributes & ATTR_DIRECTORY)
@ ! 		return (0);
@   	if (dep->de_refcnt <= 0)
@   		return (0);
@   	error = readde(dep, &bp, &dirp);
@ --- 300,309 ----
@   	if ((dep->de_flag & DE_MODIFIED) == 0)
@   		return (0);
@   	dep->de_flag &= ~DE_MODIFIED;
@ ! 	/* Was: silently ignore attribute changes for all dirs. */
@ ! 	if (DETOV(dep)->v_vflag & VV_ROOT)
@ ! 		return (EINVAL);
@ ! 	/* Otherwise valid. */

Clean up the comments a bit.  Say nothing, or that all attributes apply
to all directories except the root directory.

Perhaps the VV_ROOT case is unreachable because callers filter out this
case.  I have a debugger trap for it.

@   	if (dep->de_refcnt <= 0)
@   		return (0);
@   	error = readde(dep, &bp, &dirp);
@ *** src/sys/fs/msdosfs/msdosfs_vnops.c.orig
@ --- src/sys/fs/msdosfs/msdosfs_vnops.c
@ ***************
@ *** 398,403 ****
@ --- 402,418 ----
@   	if (vap->va_flags != VNOVAL) {
@   		if (vp->v_mount->mnt_flag & MNT_RDONLY)
@   			return (EROFS);
@ + 		/*
@ + 		 * We don't allow setting attributes on the root directory,
@ + 		 * because according to Bruce Evans:  "The special case for
@ + 		 * the root directory is because before FAT32, the root
@ + 		 * directory didn't have an entry for itself (and was
@ + 		 * otherwise special).  With FAT32, the root directory is
@ + 		 * not so special, but still doesn't have an entry for itself."
@ + 		 */
@ + 		if (vp->v_vflag & VV_ROOT)
@ + 			return (EINVAL);
@ + 
@   		if (cred->cr_uid != pmp->pm_uid) {
@   			error = priv_check_cred(cred, PRIV_VFS_ADMIN, 0);
@   			if (error)

No need to give the source.

I prefer the do this check after the permissions check, but if it is done
early then it is best done as a single check for all attributes in
msdosfs_settattr() and not just for flags.  Currently there is:
- no check for ownerships.  We only allow null changes to ownerships.  With
   no check like the above, we allow them even for the root directory, while
   the above disallows null changes to flags for the root directory.
- for truncate(), the error is EISDIR for all directories.
- for file times, we silently ignore changes for all directories, after doing
   permissions checks.  Only the root directory should be special.
- for file permissions, we handle directories as for file times.  Now the
   only possible non-null change is of ATTR_READONLY, and since this
   apparently has no effect in WinXP, ignorig changing it for directories
   is best.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 19 18:22:42 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 0BD2C204
 for <freebsd-fs@freebsd.org>; Fri, 19 Apr 2013 18:22:42 +0000 (UTC)
 (envelope-from matthew.ahrens@delphix.com)
Received: from mail-la0-x229.google.com (mail-la0-x229.google.com
 [IPv6:2a00:1450:4010:c03::229])
 by mx1.freebsd.org (Postfix) with ESMTP id 6F0C5112A
 for <freebsd-fs@freebsd.org>; Fri, 19 Apr 2013 18:22:41 +0000 (UTC)
Received: by mail-la0-f41.google.com with SMTP id er20so3859239lab.0
 for <freebsd-fs@freebsd.org>; Fri, 19 Apr 2013 11:22:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=eGyYth67GAF9lWFYom85IEKgT9nY97BIAeKiCbVWoDY=;
 b=aHFJvGeVkX49hqTklfR/UUiSb+iQH6K4totX0zNNwrrZrIOM1aeDNeRaXpTaKLEWDF
 rFhwKckvTnkMetQdi5ympIZRqjkEfQzflx5rtmtMtVY9v1RA9PwswNqneNoSzIt233IX
 UGDC4o4tMrA87+isYpaQMhhdCZx+R7hhykFI8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type:x-gm-message-state;
 bh=eGyYth67GAF9lWFYom85IEKgT9nY97BIAeKiCbVWoDY=;
 b=YDDYgouyx9DmpSFdReQlKG/h56cGtsQ+xK5BbTQGnzKmZgcGkPuqQECaDDXvfoKfDe
 uB7xBtOJOH9KOD9dgxSyYMGdIUyC4AL8N5Z/Gd74cCZA0ysT0oYgjp1aNKTIeO/ie86v
 f9LqeIx7VmZ96guweoCITKQpy8Qv4DtvUH2jnKjfHwr32edPB23MhBhg3TMM1FGf87Uv
 BpkG6WrvYmLFAcFF10m4fMxg2/txHIl78lMumL9IZGrgtNge6ThWEb/guV0EddhckXxd
 YWlZ/vIBtNdG8xcVMa4KD8phdGw+q2lc4hnZ278G+rmp62JqNHm2LvmSDshl4Qlgwlw6
 pGtQ==
MIME-Version: 1.0
X-Received: by 10.112.167.200 with SMTP id zq8mr8491324lbb.58.1366395760206;
 Fri, 19 Apr 2013 11:22:40 -0700 (PDT)
Received: by 10.114.22.4 with HTTP; Fri, 19 Apr 2013 11:22:40 -0700 (PDT)
In-Reply-To: <5169B0D7.9090607@platinum.linux.pl>
References: <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org>
 <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org>
 <516949C7.4030305@platinum.linux.pl>
 <CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS+fEepNrTWBVezFHeksg@mail.gmail.com>
 <5169B0D7.9090607@platinum.linux.pl>
Date: Fri, 19 Apr 2013 11:22:40 -0700
Message-ID: <CAJjvXiHvbxzpTTNnrXx_XLqUb0WmS=TfMA1DHVEbqZrBEUMyfg@mail.gmail.com>
Subject: Re: ZFS slow reads for unallocated blocks
From: Matthew Ahrens <mahrens@delphix.com>
To: Adam Nowacki <nowakpl@platinum.linux.pl>
X-Gm-Message-State: ALoCoQkbe53M2ILBG4BB3rq1v1hzEbSViUPXubQ4IV5J/iWZc/SLkKk9sBYJToATY0XnLc2j5zF/
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
 illumos-zfs <zfs@lists.illumos.org>, Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Apr 2013 18:22:42 -0000

Sorry I'm late to the game here, just saw this email now.

Yes, this is also a problem on illumos, though much less so on my system,
only about 2x.  It looks like the difference is due to the fact that the
zeroed dbufs are not cached, so we have to zero the entire dbuf (e.g. 128k)
for every read syscall (e.g. 8k).  Increasing the size of the reads to
match the recordsize results in performance parity between reading cached
data and sparse zeros.

You can see this behavior in the following dtrace, which shows that we are
initializing the dbuf in dbuf_read_impl() as many times as we do syscalls:

sudo dtrace -n 'dbuf_read_impl:entry/pid==$target/{@[probefunc] = count()}'
-c 'dd if=t100m of=/dev/null bs=8k'
dtrace: description 'dbuf_read_impl:entry' matched 1 probe
*12800*+0 records in
12800+0 records out
dtrace: pid 29419 has exited

  dbuf_read_impl                                                *12800*

--matt


On Sat, Apr 13, 2013 at 12:24 PM, Adam Nowacki <nowakpl@platinum.linux.pl>wrote:

> Including zfs@illumos on this. To recap:
>
> Reads from sparse files are slow with speed proportional to ratio of read
> size to filesystem recordsize ratio. There is no physical disk I/O.
>
> # zfs create -o atime=off -o recordsize=128k -o compression=off -o
> sync=disabled -o mountpoint=/home/testfs home/testfs
> # dd if=/dev/random of=/home/testfs/random10m bs=1024k count=10
> # truncate -s 10m /home/testfs/trunc10m
> # dd if=/home/testfs/random10m of=/dev/null bs=512
> 10485760 bytes transferred in 0.078637 secs (133344041 bytes/sec)
> # dd if=/home/testfs/trunc10m of=/dev/null bs=512
> 10485760 bytes transferred in 1.011500 secs (10366544 bytes/sec)
>
> # zfs create -o atime=off -o recordsize=8M -o compression=off -o
> sync=disabled -o mountpoint=/home/testfs home/testfs
> # dd if=/home/testfs/random10m of=/dev/null bs=512
> 10485760 bytes transferred in 0.080430 secs (130371205 bytes/sec)
> # dd if=/home/testfs/trunc10m of=/dev/null bs=512
> 10485760 bytes transferred in 72.465486 secs (144700 bytes/sec)
>
> This is from FreeBSD 9.1 and possible solution at
> http://tepeserwery.pl/nowak/**freebsd/zfs_sparse_**
> optimization_v2.patch.txt<http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization_v2.patch.txt>- untested yet, system will be busy building packages for a few more days.
>
>
> On 2013-04-13 19:11, Will Andrews wrote:
>
>> Hi,
>>
>> I think the idea of using a pre-zeroed region as the 'source' is a good
>> one, but probably it would be better to set a special flag on a hole
>> dbuf than to require caller flags.  That way, ZFS can lazily evaluate
>> the hole dbuf (i.e. avoid zeroing db_data until it has to).  However,
>> that could be complicated by the fact that there are many potential
>> users of hole dbufs that would want to write to the dbuf.
>>
>> This sort of optimization should be brought to the illumos zfs list.  As
>> it stands, your patch is also FreeBSD-specific, since 'zero_region' only
>> exists in vm/vm_kern.c.  Given the frequency of zero-copying, however,
>> it's quite possible there are other versions of this region elsewhere.
>>
>> --Will.
>>
>>
>> On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki <nowakpl@platinum.linux.pl
>> <mailto:nowakpl@platinum.**linux.pl <nowakpl@platinum.linux.pl>>> wrote:
>>
>>     Temporary dbufs are created for each missing (unallocated on disk)
>>     record, including indirects if the hole is large enough. Those dbufs
>>     never find way to ARC and are freed at the end of dmu_read_uio.
>>
>>     A small read (from a hole) would in the best case bzero 128KiB
>>     (recordsize, more if missing indirects) ... and I'm running modified
>>     ZFS with record sizes up to 8MiB.
>>
>>     # zfs create -o atime=off -o recordsize=8M -o compression=off -o
>>     mountpoint=/home/testfs home/testfs
>>     # truncate -s 8m /home/testfs/trunc8m
>>     # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1
>>     1+0 records in
>>     1+0 records out
>>     8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec)
>>
>>     # time cat /home/testfs/trunc8m > /dev/null
>>     0.000u 6.111s 0:06.11 100.0%    15+2753k 0+0io 0pf+0w
>>
>>     # time cat /home/testfs/zero8m > /dev/null
>>     0.000u 0.010s 0:00.01 100.0%    12+2168k 0+0io 0pf+0w
>>
>>     600x increase in system time and close to 1MB/s - insanity.
>>
>>     The fix - a lot of the code to efficiently handle this was already
>>     there.
>>
>>     dbuf_hold_impl has int fail_sparse argument to return ENOENT for
>>     holes. Just had to get there and somehow back to dmu_read_uio where
>>     zeroing can happen at byte granularity.
>>
>>     ... didn't have time to actually test it yet.
>>
>>
>>     On 2013-04-13 12:24, Andriy Gapon wrote:
>>
>>         on 13/04/2013 02:35 Adam Nowacki said the following:
>>
>>             http://tepeserwery.pl/nowak/__**freebsd/zfs_sparse___**
>> optimization.patch.txt<http://tepeserwery.pl/nowak/__freebsd/zfs_sparse___optimization.patch.txt>
>>
>>             <http://tepeserwery.pl/nowak/**freebsd/zfs_sparse_**
>> optimization.patch.txt<http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt>
>> >
>>
>>             Does it look sane?
>>
>>
>>         It's hard to tell from a quick look since they change is not
>> small.
>>         What is your idea of the problem and the fix?
>>
>>             On 2013-04-12 09:03, Andriy Gapon wrote:
>>
>>
>>                 ENOTIME to really investigate, but here is a basic
>>                 profile result for those
>>                 interested:
>>                                  kernel`bzero+0xa
>>                                  kernel`dmu_buf_hold_array_by__**
>> _dnode+0x1cf
>>
>>                                  kernel`dmu_read_uio+0x66
>>                                  kernel`zfs_freebsd_read+0x3c0
>>                                  kernel`VOP_READ_APV+0x92
>>                                  kernel`vn_read+0x1a3
>>                                  kernel`vn_io_fault+0x23a
>>                                  kernel`dofileread+0x7b
>>                                  kernel`sys_read+0x9e
>>                                  kernel`amd64_syscall+0x238
>>                                  kernel`0xffffffff80747e4b
>>
>>                 That's where > 99% of time is spent.
>>
>>
>>
>>
>>
>>     ______________________________**___________________
>>     freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org**> mailing list
>>     http://lists.freebsd.org/__**mailman/listinfo/freebsd-fs<http://lists.freebsd.org/__mailman/listinfo/freebsd-fs>
>>
>>     <http://lists.freebsd.org/**mailman/listinfo/freebsd-fs<http://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>> >
>>     To unsubscribe, send any mail to
>>     "freebsd-fs-unsubscribe@__free**bsd.org <http://freebsd.org>
>>     <mailto:freebsd-fs-**unsubscribe@freebsd.org<freebsd-fs-unsubscribe@freebsd.org>
>> >"
>>
>>
>>
> ______________________________**_________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-fs<http://lists.freebsd.org/mailman/listinfo/freebsd-fs>
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org<freebsd-fs-unsubscribe@freebsd.org>
> "
>

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 20 17:06:50 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4665A1FB;
 Sat, 20 Apr 2013 17:06:50 +0000 (UTC)
 (envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 1F813613;
 Sat, 20 Apr 2013 17:06:50 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KH6nPY061216;
 Sat, 20 Apr 2013 17:06:49 GMT
 (envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KH6nut061215;
 Sat, 20 Apr 2013 17:06:49 GMT (envelope-from linimon)
Date: Sat, 20 Apr 2013 17:06:49 GMT
Message-Id: <201304201706.r3KH6nut061215@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Apr 2013 17:06:50 -0000

Old Synopsis: disk usage problem when copying from one zfs dataset to another on the same pool using mv command
New Synopsis: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Sat Apr 20 17:06:38 UTC 2013
Responsible-Changed-Why: 

Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=177985

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 20 17:08:58 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 9CA50334;
 Sat, 20 Apr 2013 17:08:58 +0000 (UTC)
 (envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 76886641;
 Sat, 20 Apr 2013 17:08:58 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KH8w8Q061399;
 Sat, 20 Apr 2013 17:08:58 GMT
 (envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KH8wr4061398;
 Sat, 20 Apr 2013 17:08:58 GMT (envelope-from linimon)
Date: Sat, 20 Apr 2013 17:08:58 GMT
Message-Id: <201304201708.r3KH8wr4061398@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Subject: Re: kern/177971: [nfs] FreeBSD 9.1 nfs client dirlist problem w/
 nfsv3, rsize=4096, wsize=4096
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Apr 2013 17:08:58 -0000

Old Synopsis: FreeBSD 9.1 nfs client dirlist problem w/ nfsv3,rsize=4096,wsize=4096
New Synopsis: [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3,rsize=4096,wsize=4096

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Sat Apr 20 17:08:46 UTC 2013
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=177971

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 20 17:40:03 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 27CB1CBE
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 17:40:03 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 1A83F7D6
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 17:40:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KHe2IS067656
 for <freebsd-fs@freefall.freebsd.org>; Sat, 20 Apr 2013 17:40:02 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KHe2HQ067655;
 Sat, 20 Apr 2013 17:40:02 GMT (envelope-from gnats)
Date: Sat, 20 Apr 2013 17:40:02 GMT
Message-Id: <201304201740.r3KHe2HQ067655@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
Cc: 
From: "Steven Hartland" <smh@freebsd.org>
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Steven Hartland <smh@freebsd.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Apr 2013 17:40:03 -0000

The following reply was made to PR kern/177985; it has been noted by GNATS.

From: "Steven Hartland" <smh@freebsd.org>
To: <bug-followup@freebsd.org>,
	<sybersnake@gmail.com>
Cc:  
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command
Date: Sat, 20 Apr 2013 18:30:26 +0100

 Deletes / frees are lower priority than standard writes / reads so its
 quite possible in the scenario you describe that you could run out of space.
 
 Could you please confirm the exact behaviour by allow mv to process a
 number of files, before suspending and seeing if the free space is correct
 for the current progress after waiting for the pool to sync all outstanding
 requests.
 
     Regards
     Steve

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 20 19:20:02 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 87CB2996
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 19:20:02 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 7A44CB37
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 19:20:02 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KJK2MT088018
 for <freebsd-fs@freefall.freebsd.org>; Sat, 20 Apr 2013 19:20:02 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KJK2Pn088017;
 Sat, 20 Apr 2013 19:20:02 GMT (envelope-from gnats)
Date: Sat, 20 Apr 2013 19:20:02 GMT
Message-Id: <201304201920.r3KJK2Pn088017@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
Cc: 
From: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Andriy Gapon <avg@FreeBSD.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Apr 2013 19:20:02 -0000

The following reply was made to PR kern/177985; it has been noted by GNATS.

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, sybersnake@gmail.com
Cc:  
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
Date: Sat, 20 Apr 2013 22:12:12 +0300

 Sorry, but I do not see any bug reported here.
 mv behaves as it is expected/documented to behave.
 ZFS behaves as it should as well.
 If the behavior is surprising to you then please update your knowledge of the tools.
 If you need a different behavior then you can script it yourself or use
 different tools to accomplish your job.
 
 -- 
 Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 20 19:50:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 68764ECB
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 19:50:01 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 5A96ED01
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 19:50:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KJo1WA093416
 for <freebsd-fs@freefall.freebsd.org>; Sat, 20 Apr 2013 19:50:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KJo1bR093415;
 Sat, 20 Apr 2013 19:50:01 GMT (envelope-from gnats)
Date: Sat, 20 Apr 2013 19:50:01 GMT
Message-Id: <201304201950.r3KJo1bR093415@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
Cc: 
From: Jon <sybersnake@gmail.com>
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Jon <sybersnake@gmail.com>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Apr 2013 19:50:01 -0000

The following reply was made to PR kern/177985; it has been noted by GNATS.

From: Jon <sybersnake@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: "bug-followup@FreeBSD.org" <bug-followup@FreeBSD.org>
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command
Date: Sat, 20 Apr 2013 15:49:41 -0400

 This is not a bug, it is a workflow problem introduced by the difference in b=
 ehavior between ZFS datasets and fixed sized file systems.=20
 
 You should be able to move files from one dataset to another on the same poo=
 l without having to copy it to another pool and back. This all can be accomp=
 lished by deleting copied files more often than it currently does or at leas=
 t adding a flag to turn on synchronized deletes.
 
 After I am done testing the same scenario on Solaris I will run the test Ste=
 ve suggested.=20
 
 Sent from my iPhone
 
 On Apr 20, 2013, at 3:12 PM, Andriy Gapon <avg@FreeBSD.org> wrote:
 
 >=20
 > Sorry, but I do not see any bug reported here.
 > mv behaves as it is expected/documented to behave.
 > ZFS behaves as it should as well.
 > If the behavior is surprising to you then please update your knowledge of t=
 he tools.
 > If you need a different behavior then you can script it yourself or use
 > different tools to accomplish your job.
 >=20
 > --=20
 > Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 20 20:30:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 7D018263
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 20:30:01 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 6FB50DE1
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Sat, 20 Apr 2013 20:30:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KKU1xr001301
 for <freebsd-fs@freefall.freebsd.org>; Sat, 20 Apr 2013 20:30:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KKU1uu001300;
 Sat, 20 Apr 2013 20:30:01 GMT (envelope-from gnats)
Date: Sat, 20 Apr 2013 20:30:01 GMT
Message-Id: <201304202030.r3KKU1uu001300@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
Cc: 
From: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Andriy Gapon <avg@FreeBSD.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Apr 2013 20:30:01 -0000

The following reply was made to PR kern/177985; it has been noted by GNATS.

From: Andriy Gapon <avg@FreeBSD.org>
To: Jon <sybersnake@gmail.com>
Cc: "bug-followup@FreeBSD.org" <bug-followup@FreeBSD.org>
Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs
 dataset to another on the same pool using mv command
Date: Sat, 20 Apr 2013 23:25:40 +0300

 on 20/04/2013 22:49 Jon said the following:
 > This is not a bug, it is a workflow problem introduced by the difference in
 > behavior between ZFS datasets and fixed sized file systems.
 > 
 > You should be able to move files from one dataset to another on the same pool
 > without having to copy it to another pool and back.
 
 You lost me at 'another pool'.
 
 Perhaps moving an object from one zfs dataset to another could be optimized,
 but...  That would definitely require zfs-specific tools.  It is not implemented
 in the code yet, as far as I know.
 
 > This all can be
 > accomplished by deleting copied files more often than it currently does or at
 > least adding a flag to turn on synchronized deletes.
 
 No, it can not be accomplished that way, because it would violate how mv(1)
 across filesystems works.  Perhaps it's indeed the time to read the man page?
 
 > After I am done testing the same scenario on Solaris I will run the test
 > Steve suggested.
 
 Yes, please do.  Personal experience is always more enlightening that someone
 else's words.
 
 > On Apr 20, 2013, at 3:12 PM, Andriy Gapon <avg@FreeBSD.org> wrote:
 > 
 >> 
 >> Sorry, but I do not see any bug reported here. mv behaves as it is
 >> expected/documented to behave. ZFS behaves as it should as well. If the
 >> behavior is surprising to you then please update your knowledge of the
 >> tools. If you need a different behavior then you can script it yourself or
 >> use different tools to accomplish your job.
 >> 
 >> -- Andriy Gapon
 
 
 -- 
 Andriy Gapon