From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 03:00:36 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9F467264 for ; Sun, 14 Apr 2013 03:00:36 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 379C01DDC for ; Sun, 14 Apr 2013 03:00:35 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAGkaalGDaFvO/2dsb2JhbABQgzyDML4dgRp0gh8BAQEDAQEBASArIAsbGAICDRkCKQEJGAENBggHBAEcBIdtBgyodJFugSOMQn40B4IugRMDkziBDIJBgSGPcIMnIDKBBTU X-IronPort-AV: E=Sophos;i="4.87,469,1363147200"; d="scan'208";a="23804207" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 13 Apr 2013 23:00:34 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4BC5AB3F19; Sat, 13 Apr 2013 23:00:34 -0400 (EDT) Date: Sat, 13 Apr 2013 23:00:34 -0400 (EDT) From: Rick Macklem To: Paul van der Zwan Message-ID: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <495AEA10-9B8F-4A03-B706-79BF43539482@vanderzwan.org> Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 03:00:36 -0000 Paul van der Zwan wrote: > On 12 Apr 2013, at 16:28 , Paul van der Zwan > wrote: > > > > > I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server > > and I noticed that make buildworld seem to take much longer > > when the clients mount /usr/src and /usr/obj over NFS V4 than when > > they use V3. > > Unfortunately I have to use V4 as a buildworld on V3 hangs the > > server completely... > > I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of > > a few thousand per second > > and if I snoop the traffic I see the same filenames appear over and > > over again. > > It looks like the client is not caching anything at all and using a > > server request everytime. > > I use the default mount options: > > 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls) > > 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls) > > 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls) > > > > > > I had a look with dtrace > $ sudo dtrace -n '::getattr:start { @[stack()]=count();}' > and it seems the vast majority of the calls to getattr are from open() > and close() system calls.: > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_getattr+0x280 > kernel`VOP_GETATTR_APV+0x74 > kernel`nfs_lookup+0x3cc > kernel`VOP_LOOKUP_APV+0x74 > kernel`lookup+0x69e > kernel`namei+0x6df > kernel`kern_execve+0x47a > kernel`sys_execve+0x43 > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 26 > > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_close+0x3e9 > kernel`VOP_CLOSE_APV+0x74 > kernel`kern_execve+0x15c5 > kernel`sys_execve+0x43 > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 26 > > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_getattr+0x280 > kernel`VOP_GETATTR_APV+0x74 > kernel`nfs_lookup+0x3cc > kernel`VOP_LOOKUP_APV+0x74 > kernel`lookup+0x69e > kernel`namei+0x6df > kernel`vn_open_cred+0x330 > kernel`vn_open+0x1c > kernel`kern_openat+0x207 > kernel`kern_open+0x19 > kernel`sys_open+0x18 > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 2512 > > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_close+0x3e9 > kernel`VOP_CLOSE_APV+0x74 > kernel`vn_close+0xee > kernel`vn_closefile+0xff > kernel`_fdrop+0x3a > kernel`closef+0x332 > kernel`kern_close+0x183 > kernel`sys_close+0xb > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 2530 > > I had a look at the source of nfs_close and could not find a call to > nfsrpc_getattr, and I am wondering why close would be calling getattr > anyway. > If the file is closed what do we care about it's attributes.... > Here are some random statements w.r.t. NFSv3 vs NFSv4 that might help with an understanding of what is going on. I do address the specific case of nfs_close() towards the end. (It is kinda long winded, but I threw out eveything I could think of..) NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and Close operations. In NFSv3, each RPC is defined and usually includes attributes for files before and after the operation (implicit getattrs not counted in the RPC counts reported by nfsstat). For NFSv4, every RPC is a compound built up of a list of Operations like Getattr. Since the NFSv4 server doesn't know what the compound is doing, nfsstat reports the counts of Operations for the NFSv4 server, so the counts will be much higher than with NFSv3, but do not reflect the number of RPCs being done. To get NFSv4 nfsstat output that can be compared to NFSv3, you need to do the command on the client(s) and it still is only roughly the same. (I just realized this should be documented in man nfsstat.) For the FreeBSD NFSv4 client, the compounds include Getattr operations similar to what NFSv3 does. It doesn't do a Getattr on the directory for Lookup, because that would have made the compound much more complex. I don't think this will have a significant performance impact, but will result in some additional Getattr RPCs. I suspect the slowness is caused by the extra overhead of doing the Open/Close operations against the server. The only way to avoid doing these against the server for NFSv4 is to enable delegations in both client and server. How to do this is documented in "man nfsv4". Basically starting up the nfscbd in the client and setting: vfs.nfsd.issue_delegations=1 in the server. Specifically for nfs_close(), the attributes (modify time) is used for what is called "close to open consistency". This can be disabled by the "nocto" mount option, if you don't need it for your build environment. (You only need it if one client is writing a file and then another client is reading the same file.) Both the attribute caching and close to open consistency algorithms in the client are essentially the same for NFSv3 vs NFSv4. The NFSv4 Close operation(s) are actually done when the v_usecount for the vnode goes to 0, since mmap'd files can do I/O on pages after the close syscall. As such, they are only loosely related to the close syscall. They are actually closing Windows style Openlock(s). You mention that you see the same file over and over in a packet trace. You don't give specifics, but I'd suggest that you look at both NFSv3 and NFSv4 for this (and file names are in lookups, not getattrs). I'd suggest you try enabling delegations in both client and server, plus trying the "nocto" mount option and see if that helps. rick > > Paul > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:01:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1983AFD for ; Sun, 14 Apr 2013 10:01:49 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-x232.google.com (mail-la0-x232.google.com [IPv6:2a00:1450:4010:c03::232]) by mx1.freebsd.org (Postfix) with ESMTP id 937BA91A for ; Sun, 14 Apr 2013 10:01:47 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id el20so3610308lab.37 for ; Sun, 14 Apr 2013 03:01:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:content-type:message-id:mime-version:subject:date :references:to:in-reply-to:x-mailer:x-gm-message-state; bh=GogMaD4af4lbIF2SZ9kNpDyuew42Dp46GxgGi+0An+w=; b=CH54ddnEl6QQ72eU0yhaEUsr6HASvTIpdAdetxc2AkxXeAV55nX5LmMj2dycHyjBsF vGZ+ES94wPywgf7Ni1VF7otx0q5zjoL8P7dW6ElXT1XkgwQZeu4FxLDNH+ayDapQwMVM is5OOgidf8hwjusbYoodxSYPVyyTMrzz9pRXS0N/bbIbcqTA16va8zhQEgCqM+30uj3O DoI111oMQoRjhUYOlFvWXU7B1RJjwS86XkQPVz6FylRLcCFJa7GEt1zPX75ZpZ9UTA1d yN5auz/xUOGiCdujAXjjS4knCbDOrYVC5poTITsigltJ4xASIT2ipUAsLg5FPULfV+fW pgTw== X-Received: by 10.112.173.39 with SMTP id bh7mr8339980lbc.62.1365933706960; Sun, 14 Apr 2013 03:01:46 -0700 (PDT) Received: from grey.home.unixconn.com (h-74-23.a183.priv.bahnhof.se. [46.59.74.23]) by mx.google.com with ESMTPS id xx3sm6016793lbb.14.2013.04.14.03.01.45 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 14 Apr 2013 03:01:46 -0700 (PDT) From: mxb Message-Id: <9EE9328B-40B1-4510-B404-242D0F2C7697@alumni.chalmers.se> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: ZFS: ZIL device export/import Date: Sun, 14 Apr 2013 12:01:43 +0200 References: <5A2824CA-2A67-47FA-AB27-20C6EBD2C501@alumni.chalmers.se> <51699B8E.7050003@platinum.linux.pl> <2DE8AD5E-B84C-4D88-A242-EA30EA4A68FD@alumni.chalmers.se> To: "freebsd-fs@freebsd.org" In-Reply-To: X-Mailer: Apple Mail (2.1503) X-Gm-Message-State: ALoCoQlY+dacN1cAzXKb6xhmxlpsWUCSOaFyloiV1ajIdog5lbLYRy5FSlYkNG4Qiety2/xtwNSY Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:01:49 -0000 Well, I'm trying to preclude any undesired effect in the whole setup, as = this is going to production. SAS-link might not be a bottleneck here and I'm overreacting. Locally ,on per HU basis, I have 6Gbit/s SAS/SATA. Both card and = disks(SSD) attached to it. SAS Expander is also 6Gbit/s, attaching 10k RPM SAS mechanical disks on = JBOD. I use Intel 520 SSD and Pulsar SSD in this setup. ZIL resided locally on Intel SSD(per HU), but now will probably move to = Pulsar SSD(moved to JBOD as those disks have dual SAS/SATA link). L2ARC = resided on Pulsar (Pulsar was in each HU. eg. I have 2x Pulsar). Looks like I have to re-design the whole setup, as of ZIL. //mxb On 13 apr 2013, at 22:51, Ronald Klop = wrote: > I thought the idea of ZIL is a fast buffer before the write to slow = disk. Are you really sure the SAS expander is the bottleneck in the = system instead of the disks? From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:10:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9DD6E1D8 for ; Sun, 14 Apr 2013 10:10:46 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh1-ve2.go2.pl (moh1-ve2.go2.pl [193.17.41.132]) by mx1.freebsd.org (Postfix) with ESMTP id 2FA09948 for ; Sun, 14 Apr 2013 10:10:45 +0000 (UTC) Received: from moh1-ve2.go2.pl (unknown [10.0.0.132]) by moh1-ve2.go2.pl (Postfix) with ESMTP id B2B971065D07 for ; Sun, 14 Apr 2013 12:10:38 +0200 (CEST) Received: from unknown (unknown [10.0.0.142]) by moh1-ve2.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 12:10:37 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IEMXMr; Sun, 14 Apr 2013 12:10:36 +0200 Message-ID: <516A8092.2080002@o2.pl> Date: Sun, 14 Apr 2013 12:10:26 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: support@lists.pcbsd.org Subject: A failed drive causes system to hang Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 38 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:10:46 -0000 Cross-post from freebsd-fs: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs I have a failing drive in my array. I need to RMA it, but don't have time and it fails rarely enough to be a yet another annoyance. The failure is simple: it fails to respond. When it happens, the only thing I found I can do is switch consoles. Any command hangs, login on different consoles hangs, apps hang. I run PC-BSD 9.1. On the 1st console I see a series of messages like: (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED I've seen it happening even when running an installer from a different drive, while preparing installation (don't remember which step). I have partial dmesg screenshots from an older failure (21st of December 2012), transcript below: Screen1: (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) 00 (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated Screen 2: ahcich0: Timeout on slot 29 port 0 ahcich0: (unreadable, lots of numbers, some text) (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked ahcich0: Timeout on slot 29 port 0 ahcich0: (unreadable, lots of numbers, some text) (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked ahcich0: Timeout on slot 30 port 0 ahcich0: (unreadable, lots of numbers, some text) (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) Both are from the same event. In general, messages: (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. are the most common. And one recent, though from a different drive (being a part of the same array): fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 (ada1:ata0:0:0:0): CAM status: Command timeout (ada1:ata0:0:0:0): Retrying command vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 (ada1:ata0:0:0:0): CAM status: Command timeout (ada1:ata0:0:0:0): Retrying command A thing pointed out on freebsd-fs is that driver changed from ahcich0 to ata0. I haven't done any configuration here myself. Have you changed some defaults? -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:18:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AE28841E for ; Sun, 14 Apr 2013 10:18:12 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 470B8978 for ; Sun, 14 Apr 2013 10:18:11 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1URK0e-0007Hp-1d; Sun, 14 Apr 2013 12:18:08 +0200 Received: from dhcp-077-251-158-153.chello.nl ([77.251.158.153] helo=pinky) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1URK0d-0005D2-Lj; Sun, 14 Apr 2013 12:18:07 +0200 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: support@lists.pcbsd.org, =?utf-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= Subject: Re: A failed drive causes system to hang References: <516A8092.2080002@o2.pl> Date: Sun, 14 Apr 2013 12:18:07 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <516A8092.2080002@o2.pl> User-Agent: Opera Mail/12.15 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 246115766b56dba7f675551df821dbd2 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:18:12 -0000 On Sun, 14 Apr 2013 12:10:26 +0200, Radio młodych bandytów wrote: > Cross-post from freebsd-fs: > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs > > I have a failing drive in my array. I need to RMA it, but don't have > time and it fails rarely enough to be a yet another annoyance. Maybe offtopic, but you do have time to write long mails, but not to RMA broken disks? I hope your clients don't read this. :-) Ronald. > The failure is simple: it fails to respond. > When it happens, the only thing I found I can do is switch consoles. Any > command hangs, login on different consoles hangs, apps hang. > I run PC-BSD 9.1. > > On the 1st console I see a series of messages like: > > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED > > I've seen it happening even when running an installer from a different > drive, while preparing installation (don't remember which step). > > I have partial dmesg screenshots from an older failure (21st of December > 2012), transcript below: > > Screen1: > (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) > (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) > 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) > 00 > (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) > 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > > Screen 2: > ahcich0: Timeout on slot 29 port 0 > ahcich0: (unreadable, lots of numbers, some text) > (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > (aprobe0:ahcich0:0:0:0): CAM status: Command timeout > (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > ahcich0: Timeout on slot 29 port 0 > ahcich0: (unreadable, lots of numbers, some text) > (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > (aprobe0:ahcich0:0:0:0): CAM status: Command timeout > (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > ahcich0: Timeout on slot 30 port 0 > ahcich0: (unreadable, lots of numbers, some text) > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > > Both are from the same event. In general, messages: > > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. > > are the most common. > > And one recent, though from a different drive (being a part of the same > array): > fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 > (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 > (ada1:ata0:0:0:0): CAM status: Command timeout > (ada1:ata0:0:0:0): Retrying command > vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 > linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented > (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 > (ada1:ata0:0:0:0): CAM status: Command timeout > (ada1:ata0:0:0:0): Retrying command > > A thing pointed out on freebsd-fs is that driver changed from ahcich0 to > ata0. I haven't done any configuration here myself. Have you changed > some defaults? From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:26:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7F267522 for ; Sun, 14 Apr 2013 10:26:35 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh1-ve1.go2.pl (moh1-ve1.go2.pl [193.17.41.131]) by mx1.freebsd.org (Postfix) with ESMTP id 4175C9B2 for ; Sun, 14 Apr 2013 10:26:35 +0000 (UTC) Received: from moh1-ve1.go2.pl (unknown [10.0.0.131]) by moh1-ve1.go2.pl (Postfix) with ESMTP id 8046991D216 for ; Sun, 14 Apr 2013 12:26:28 +0200 (CEST) Received: from unknown (unknown [10.0.0.42]) by moh1-ve1.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 12:26:28 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id UzWvCt; Sun, 14 Apr 2013 12:26:25 +0200 Message-ID: <516A8447.90709@o2.pl> Date: Sun, 14 Apr 2013 12:26:15 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Ronald Klop Subject: Re: A failed drive causes system to hang References: <516A8092.2080002@o2.pl> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-O2-Trust: 1, 31 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:26:35 -0000 On 14/04/2013 12:18, Ronald Klop wrote: > On Sun, 14 Apr 2013 12:10:26 +0200, Radio młodych bandytów > wrote: > >> Cross-post from freebsd-fs: >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs >> >> >> I have a failing drive in my array. I need to RMA it, but don't have >> time and it fails rarely enough to be a yet another annoyance. > > Maybe offtopic, but you do have time to write long mails, but not to RMA > broken disks? I hope your clients don't read this. :-) > > Ronald. It's my private desktop and it's a semi-test system. I don't care much if I loose it. -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:34:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9C19F612 for ; Sun, 14 Apr 2013 10:34:53 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh3-ve3.go2.pl (moh3-ve3.go2.pl [193.17.41.87]) by mx1.freebsd.org (Postfix) with ESMTP id 17A199F7 for ; Sun, 14 Apr 2013 10:34:52 +0000 (UTC) Received: from moh3-ve3.go2.pl (unknown [10.0.0.158]) by moh3-ve3.go2.pl (Postfix) with ESMTP id 42CF6B5A725 for ; Sun, 14 Apr 2013 12:34:50 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh3-ve3.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 12:34:50 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id xQrdjj; Sun, 14 Apr 2013 12:34:47 +0200 Message-ID: <516A8646.4000101@o2.pl> Date: Sun, 14 Apr 2013 12:34:46 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> In-Reply-To: <20130413000731.GA84309@icarus.home.lan> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 33 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:34:53 -0000 On 13/04/2013 02:07, Jeremy Chadwick wrote: > > > On Sat, Apr 13, 2013 at 12:33:10AM +0200, Radio m?odych bandytw wrote: >> On 13/04/2013 00:03, Jeremy Chadwick wrote: >>> On Fri, Apr 12, 2013 at 11:52:31PM +0200, Radio m?odych bandytw wrote: >>>> On 11/04/2013 23:24, Jeremy Chadwick wrote: >>>>> On Thu, Apr 11, 2013 at 10:47:32PM +0200, Radio m?odych bandytw wrote: >>>>>> Seeing a ZFS thread, I decided to write about a similar problem that >>>>>> I experience. >>>>>> I have a failing drive in my array. I need to RMA it, but don't have >>>>>> time and it fails rarely enough to be a yet another annoyance. >>>>>> The failure is simple: it fails to respond. >>>>>> When it happens, the only thing I found I can do is switch consoles. >>>>>> Any command fails, login fails, apps hang. >>>>>> >>>>>> On the 1st console I see a series of messages like: >>>>>> >>>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED >>>>>> >>>>>> I use RAIDZ1 and I'd expect that none single failure would cause the >>>>>> system to fail... >>>>> >>>>> You need to provide full output from "dmesg", and you need to define >>>>> what the word "fails" means (re: "any command fails", "login fails"). >>>> Fails = hangs. When trying to log it, I can type my user name, but >>>> after I press enter the prompt for password never appear. >>>> As to dmesg, tough luck. I have 2 photos on my phone and their >>>> transcripts are all I can give until the problem reappears (which >>>> should take up to 2 weeks). Photos are blurry and in many cases I'm >>>> not sure what exactly is there. >>>> >>>> Screen1: >>>> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) >>>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) >>>> 00 >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) >>>> 00 >>>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) >>>> 00 >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> >>>> >>>> Screen 2: >>>> ahcich0: Timeout on slot 29 port 0 >>>> ahcich0: (unreadable, lots of numbers, some text) >>>> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) >>>> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout >>>> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked >>>> ahcich0: Timeout on slot 29 port 0 >>>> ahcich0: (unreadable, lots of numbers, some text) >>>> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) >>>> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout >>>> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked >>>> ahcich0: Timeout on slot 30 port 0 >>>> ahcich0: (unreadable, lots of numbers, some text) >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) >>>> >>>> Both are from the same event. In general, messages: >>>> >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. >>>> >>>> are the most common. >>>> >>>> I've waited for more than 1/2 hour once and the system didn't return >>>> to a working state, the messages kept flowing and pretty much >>>> nothing was working. What's interesting, I remember that it happened >>>> to me even when I was using an installer (PC-BSD one), before the >>>> actual installation began, so the disk stored no program data. And I >>>> *think* there was no ZFS yet anyway. >>>> >>>>> >>>>> I've already demonstrated that loss of a disk in raidz1 (or even 2 disks >>>>> in raidz2) does not cause ""the system to fail"" on stable/9. However, >>>>> if you lose enough members or vdevs to cause catastrophic failure, there >>>>> may be anomalies depending on how your system is set up: >>>>> >>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html >>>>> >>>>> If the pool has failmode=wait, any I/O to that pool will block (wait) >>>>> indefinitely. This is the default. >>>>> >>>>> If the pool has failmode=continue, existing write I/O operations will >>>>> fail with EIO (I/O error) (and hopefully applications/daemons will >>>>> handle that gracefully -- if not, that's their fault) but any subsequent >>>>> I/O (read or write) to that pool will block (wait) indefinitely. >>>>> >>>>> If the pool has failmode=panic, the kernel will immediately panic. >>>>> >>>>> If the CAM layer is what's wedged, that may be a different issue (and >>>>> not related to ZFS). I would suggest running stable/9 as many >>>>> improvements in this regard have been committed recently (some related >>>>> to CAM, others related to ZFS and its new "deadman" watcher). >>>> >>>> Yeah, because of the installer failure, I don't think it's related to ZFS. >>>> Even if it is, for now I won't set any ZFS properties in hope it >>>> repeats and I can get better data. >>>>> >>>>> Bottom line: terse output of the problem does not help. Be verbose, >>>>> provide all output (commands you type, everything!), as well as any >>>>> physical actions you take. >>>>> >>>> Yep. In fact having little data was what made me hesitate to write >>>> about it; since I did already, I'll do my best to get more info, >>>> though for now I can only wait for a repetition. >>>> >>>> >>>> On 12/04/2013 00:08, Quartz wrote:> >>>>>> Seeing a ZFS thread, I decided to write about a similar problem that I >>>>>> experience. >>>>> >>>>> I'm assuming you're referring to my "Failed pool causes system to hang" >>>>> thread. I wonder if there's some common issue with zfs where it locks up >>>>> if it can't write to disks how it wants to. >>>>> >>>>> I'm not sure how similar your problem is to mine. What's your pool setup >>>>> look like? Redundancy options? Are you booting from a pool? I'd be >>>>> interested to know if you can just yank the cable to the drive and see >>>>> if the system recovers. >>>>> >>>>> You seem to be worse off than me- I can still login and run at least a >>>>> couple commands. I'm booting from a straight ufs drive though. >>>>> >>>>> ______________________________________ >>>>> it has a certain smooth-brained appeal >>>>> >>>> Like I said, I don't think it's ZFS-specific, but just in case...: >>>> RAIDZ1, root on ZFS. I should reduce severity of a pool loss before >>>> pulling cables, so no tests for now. >>> >>> Key points: >>> >>> 1. We now know why "commands hang" and anything I/O-related blocks >>> (waits) for you: because your root filesystem is ZFS. If the ZFS layer >>> is waiting on CAM, and CAM is waiting on your hardware, then those I/O >>> requests are going to block indefinitely. So now you know the answer to >>> why that happens. >>> >>> 2. I agree that the problem is not likely in ZFS, but rather either with >>> CAM, the AHCI implementation used, or hardware (either disk or storage >>> controller). >>> >>> 3. Your lack of "dmesg" is going to make this virtually impossible to >>> solve. We really, ***really*** need that. I cannot stress this enough. >>> This will tell us a lot of information about your system. We're also >>> going to need to see "zpool status" output, as well as "zpool get all" >>> and "zfs get all". "pciconf -lvbc" would also be useful. >>> >>> There are some known "gotchas" with certain models of hard disks or AHCI >>> controllers (which is responsible is unknown at this time), but I don't >>> want to start jumping to conclusions until full details can be provided >>> first. >>> >>> I would recommend formatting a USB flash drive as FAT/FAT32, booting >>> into single-user mode, then mounting the USB flash drive and issuing >>> the above commands + writing the output to files on the flash drive, >>> then provide those here. >>> >>> We really need this information. >>> >>> 4. Please involve the PC-BSD folks in this discussion. They need to be >>> made aware of issues like this so they (and iXSystems, potentially) can >>> investigate from their side. >>> >> OK, thanks for the info. >> Since dmesg is so important, I'd say the best thing is to wait for >> the problem to happen again. When it does, I'll restart the thread >> with every information that you requested here and with a PC-BSD >> cross-post. >> >> However, I just got a different hang just a while ago. This time it >> was temporary, I don't know, I switched to console0 after ~10 >> seconds, there were 2 errors. Nothing appeared for ~1 minute, so I >> switched back and the system was OK. Different drive, I haven't seen >> problems with this one. And I think they used to be ahci, here's >> ata. >> >> dmesg: >> >> fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 >> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 >> (ada1:ata0:0:0:0): CAM status: Command timeout >> (ada1:ata0:0:0:0): Retrying command >> vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 >> linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented >> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 >> (ada1:ata0:0:0:0): CAM status: Command timeout >> (ada1:ata0:0:0:0): Retrying command >> >> {another 150KBytes of data snipped} > > The above output indicates that there was a timeout when trying to issue > a 48-bit DMA request to the disk. The disk did not respond to the > request within 30 seconds. > > If you were using AHCI, we'd be able to see if the AHCI layer was > reporting signalling problems or other anomalies that could explain the > behaviour. With ATA, such is significantly limited. It's worse if > you're hiding/not showing us the entire information. > > The classic FreeBSD ATA driver does not provide command queueing (NCQ), > while AHCI via CAM does. The difference is that command queueing causes > xxx_FPDMA_QUEUED CDBs to be issued to the disk. > > I'm going to repeat myself -- for the last time: CAN YOU PLEASE JUST > PROVIDE "DMESG" FROM THE SYSTEM? Like after a fresh reboot? If you're > able to provide all of the above, I don't know why you can't provide > dmesg. It is the most important information that there is. I am sick > and tired of stressing this point. Sorry. I thought just the error was important. So here you are: dmesg.boot: http://pastebin.com/LFXPusMX > > Furthermore, please stop changing ATA vs. AHCI interface drivers. > The more you change/screw around with, the less likely people are going > to help. CHANGE NOTHING ON THE SYSTEM. Leave it how it is. Do not > fiddle with things or start flipping switches/changing settings/etc. to > "try and relieve the problem". You're asking other people for help, > which means you need to be patient and follow what we ask. I haven't changed one bit myself. It may have been a change of defaults in PC-BSD. I just asked them about it. Or maybe different drives use different drivers. > > Thank you for the rest of the output, however. It looks like this is > another system with an ATI-based controller (which is usually the kind > involved in my aforementioned "gotchas"), but there still isn't enough > information that can help. I have a gut feeling of what's about to > come, but I need to see dmesg output before I can determine that. > > Furthermore, can you please provide this information with its formatting > intact? Your Email client is screwing up "long lines" and causing > unnecesary wrapping. > > The mailing list will nuke attachments, so please use pastebin or some > similar service + provide URLs. pciconf -lvbc: http://pastebin.com/vvCKAWm1 zpool status: http://pastebin.com/D3Av7x9X zfs get all: http://pastebin.com/4sT37VqZ zpool get all tank1: http://pastebin.com/HZJTJPa2 -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:35:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 286D66AA for ; Sun, 14 Apr 2013 10:35:48 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh3-ve1.go2.pl (moh3-ve1.go2.pl [193.17.41.30]) by mx1.freebsd.org (Postfix) with ESMTP id DF97BA09 for ; Sun, 14 Apr 2013 10:35:47 +0000 (UTC) Received: from moh3-ve1.go2.pl (unknown [10.0.0.117]) by moh3-ve1.go2.pl (Postfix) with ESMTP id 8C83CA6A029 for ; Sun, 14 Apr 2013 12:35:46 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh3-ve1.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 12:35:46 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id QtpKGS; Sun, 14 Apr 2013 12:35:44 +0200 Message-ID: <516A8680.2020107@o2.pl> Date: Sun, 14 Apr 2013 12:35:44 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Charles Sprickman Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <51691524.4050009@sneakertech.com> <4617BC69-842C-422E-9616-3BCDC11C0048@bway.net> In-Reply-To: <4617BC69-842C-422E-9616-3BCDC11C0048@bway.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 31 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:35:48 -0000 On 13/04/2013 20:14, Charles Sprickman wrote: > On Apr 13, 2013, at 4:19 AM, Quartz wrote: > >> >>> As to dmesg, tough luck. I have 2 photos on my phone and their >>> transcripts are all I can give until the problem reappears >> >> I think there's a communication gap here. >> >> While a messages and logs from the time the incident happens are ideal, Jeremy *also* just needs to see the generic info about your hardware, which can be found in any dmesg taken at any time. > > More specifically, I think the OP did supply the full output of the 'dmesg' *command*, but what I think is wanted is the contents of /var/run/dmesg.boot. > > Charles > >> >> ______________________________________ >> it has a certain smooth-brained appeal >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > Yeah. Thanks, I didn't even know about /var/run/dmesg.boot existence. -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 11:08:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 92D969DD for ; Sun, 14 Apr 2013 11:08:28 +0000 (UTC) (envelope-from paulz@vanderzwan.org) Received: from cpsmtpb-ews10.kpnxchange.com (cpsmtpb-ews10.kpnxchange.com [213.75.39.15]) by mx1.freebsd.org (Postfix) with ESMTP id E81C1ABA for ; Sun, 14 Apr 2013 11:08:27 +0000 (UTC) Received: from cpsps-ews08.kpnxchange.com ([10.94.84.175]) by cpsmtpb-ews10.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sun, 14 Apr 2013 13:08:17 +0200 Received: from CPSMTPM-TLF102.kpnxchange.com ([195.121.3.5]) by cpsps-ews08.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sun, 14 Apr 2013 13:08:17 +0200 Received: from mailvm.vanderzwan.org ([77.172.189.82]) by CPSMTPM-TLF102.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sun, 14 Apr 2013 13:08:17 +0200 Received: from gaspode.vanderzwan.org (gaspode.vanderzwan.org [192.168.178.22]) (authenticated bits=0) by mailvm.vanderzwan.org (8.14.6/8.14.6) with ESMTP id r3EB8FUR077658 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 14 Apr 2013 13:08:16 +0200 (CEST) (envelope-from paulz@vanderzwan.org) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? From: Paul van der Zwan In-Reply-To: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca> Date: Sun, 14 Apr 2013 13:08:15 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2B576479-C83A-4D3F-B486-475625383E9C@vanderzwan.org> References: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1503) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.3.9 (mailvm.vanderzwan.org [192.168.178.25]); Sun, 14 Apr 2013 13:08:16 +0200 (CEST) X-OriginalArrivalTime: 14 Apr 2013 11:08:17.0347 (UTC) FILETIME=[5D99D930:01CE3900] X-RcptDomain: freebsd.org Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 11:08:28 -0000 On 14 Apr 2013, at 5:00 , Rick Macklem wrote: Thanks for taking the effort to send such an extensive reply. > Paul van der Zwan wrote: >> On 12 Apr 2013, at 16:28 , Paul van der Zwan >> wrote: >>=20 >>>=20 >>> I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server >>> and I noticed that make buildworld seem to take much longer >>> when the clients mount /usr/src and /usr/obj over NFS V4 than when >>> they use V3. >>> Unfortunately I have to use V4 as a buildworld on V3 hangs the >>> server completely... >>> I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of >>> a few thousand per second >>> and if I snoop the traffic I see the same filenames appear over and >>> over again. >>> It looks like the client is not caching anything at all and using a >>> server request everytime. >>> I use the default mount options: >>> 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls) >>> 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls) >>> 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls) >>>=20 >>>=20 >>=20 >> I had a look with dtrace >> $ sudo dtrace -n '::getattr:start { @[stack()]=3Dcount();}' >> and it seems the vast majority of the calls to getattr are from = open() >> and close() system calls.: >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_getattr+0x280 >> kernel`VOP_GETATTR_APV+0x74 >> kernel`nfs_lookup+0x3cc >> kernel`VOP_LOOKUP_APV+0x74 >> kernel`lookup+0x69e >> kernel`namei+0x6df >> kernel`kern_execve+0x47a >> kernel`sys_execve+0x43 >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 26 >>=20 >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_close+0x3e9 >> kernel`VOP_CLOSE_APV+0x74 >> kernel`kern_execve+0x15c5 >> kernel`sys_execve+0x43 >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 26 >>=20 >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_getattr+0x280 >> kernel`VOP_GETATTR_APV+0x74 >> kernel`nfs_lookup+0x3cc >> kernel`VOP_LOOKUP_APV+0x74 >> kernel`lookup+0x69e >> kernel`namei+0x6df >> kernel`vn_open_cred+0x330 >> kernel`vn_open+0x1c >> kernel`kern_openat+0x207 >> kernel`kern_open+0x19 >> kernel`sys_open+0x18 >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 2512 >>=20 >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_close+0x3e9 >> kernel`VOP_CLOSE_APV+0x74 >> kernel`vn_close+0xee >> kernel`vn_closefile+0xff >> kernel`_fdrop+0x3a >> kernel`closef+0x332 >> kernel`kern_close+0x183 >> kernel`sys_close+0xb >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 2530 >>=20 >> I had a look at the source of nfs_close and could not find a call to >> nfsrpc_getattr, and I am wondering why close would be calling getattr >> anyway. >> If the file is closed what do we care about it's attributes.... >>=20 > Here are some random statements w.r.t. NFSv3 vs NFSv4 that might help > with an understanding of what is going on. I do address the specific > case of nfs_close() towards the end. (It is kinda long winded, but I > threw out eveything I could think of..) >=20 > NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and > Close operations. >=20 > In NFSv3, each RPC is defined and usually includes attributes for = files > before and after the operation (implicit getattrs not counted in the = RPC > counts reported by nfsstat). >=20 > For NFSv4, every RPC is a compound built up of a list of Operations = like > Getattr. Since the NFSv4 server doesn't know what the compound is = doing, > nfsstat reports the counts of Operations for the NFSv4 server, so the = counts > will be much higher than with NFSv3, but do not reflect the number of = RPCs being done. > To get NFSv4 nfsstat output that can be compared to NFSv3, you need to > do the command on the client(s) and it still is only roughly the same. > (I just realized this should be documented in man nfsstat.) >=20 I ran nfsstat -s -v 4 on the server and saw the number of requests being = done. They were in the order of a few thousand per second for a single FreeBSD = 9.1 client=20 doing a make build world. > For the FreeBSD NFSv4 client, the compounds include Getattr operations > similar to what NFSv3 does. It doesn't do a Getattr on the directory > for Lookup, because that would have made the compound much more = complex. > I don't think this will have a significant performance impact, but = will > result in some additional Getattr RPCs. >=20 I ran snoop on port 2049 on the server and I saw a large number of = lookups. A lot of them seem to be for directories which are part of the filenames = of the compiler and include files which on the nfs mounted /usr/obj. The same names keep reappering so it looks like there is no caching = being done on=20 the client. > I suspect the slowness is caused by the extra overhead of doing the > Open/Close operations against the server. The only way to avoid doing > these against the server for NFSv4 is to enable delegations in both > client and server. How to do this is documented in "man nfsv4". = Basically > starting up the nfscbd in the client and setting: > vfs.nfsd.issue_delegations=3D1 > in the server. >=20 > Specifically for nfs_close(), the attributes (modify time) > is used for what is called "close to open consistency". This can be > disabled by the "nocto" mount option, if you don't need it for your > build environment. (You only need it if one client is writing a file > and then another client is reading the same file.) >=20 I tried the nocto option in /etc/fstab but it does not show when mount = shows the mounted filesystems so I am not sure if it is being used. On the server netstat shows an active connection to port 7745 on the = client but snoop shows no data flowing on that session. =20 > Both the attribute caching and close to open consistency algorithms > in the client are essentially the same for NFSv3 vs NFSv4. >=20 > The NFSv4 Close operation(s) are actually done when the v_usecount for > the vnode goes to 0, since mmap'd files can do I/O on pages after > the close syscall. As such, they are only loosely related to the close > syscall. They are actually closing Windows style Openlock(s). >=20 I had a look at the code of the NFS v4 client of Illumos ( which is = basically what my server is running ) and as far as I understand it they only do the = gettatr only when the close was for a file that was opened for write and when there was = actually something=20 written to the file. The FreeBSD code seems to do the getattr for all close() calls. For files that were never written, like executables or source files that = seems to cause quite a lot of overhead. > You mention that you see the same file over and over in a packet = trace. > You don't give specifics, but I'd suggest that you look at both NFSv3 > and NFSv4 for this (and file names are in lookups, not getattrs). >=20 > I'd suggest you try enabling delegations in both client and server, = plus > trying the "nocto" mount option and see if that helps. >=20 Tried it but it does not seem to make any noticable difference. I tried a make buildworld buildkernel with /usr/obj a local FS in the = Vbox VM that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem = it takes about a day. A twelve fold increase is elapsed time makes using NFSv4 = unusable=20 for this use case. Too bad the server hangs when I use nfsv3 mount for /usr/obj. Having a shared /usr/obj makes it possible to run a make buildworld on a = single VM and just run make installworld on the others. Paul > rick >=20 >>=20 >> Paul >>=20 >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 14:09:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 09964A97 for ; Sun, 14 Apr 2013 14:09:31 +0000 (UTC) (envelope-from prvs=1816034565=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id A3D72D9 for ; Sun, 14 Apr 2013 14:09:30 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50003265740.msg for ; Sun, 14 Apr 2013 15:09:23 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 14 Apr 2013 15:09:23 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1816034565=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> From: "Steven Hartland" To: =?iso-8859-1?Q?Radio_mlodych_bandyt=F3w?= , References: <516A8092.2080002@o2.pl> Subject: Re: A failed drive causes system to hang Date: Sun, 14 Apr 2013 15:09:38 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 14:09:31 -0000 ----- Original Message ----- From: "Radio mlodych bandytw" To: Cc: Sent: Sunday, April 14, 2013 11:10 AM Subject: A failed drive causes system to hang > Cross-post from freebsd-fs: > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs > > I have a failing drive in my array. I need to RMA it, but don't have > time and it fails rarely enough to be a yet another annoyance. > The failure is simple: it fails to respond. > When it happens, the only thing I found I can do is switch consoles. Any command hangs, login on different consoles hangs, apps > hang. > I run PC-BSD 9.1. > > On the 1st console I see a series of messages like: > > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED > > I've seen it happening even when running an installer from a different drive, while preparing installation (don't remember which > step). > > I have partial dmesg screenshots from an older failure (21st of December 2012), transcript below: > > Screen1: > (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) > (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) > 00 smartctl has the ability to print out the queued log file if the drive supports it. This may give you some more information on what the problem may be with your drive. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 18:31:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 03A45B40 for ; Sun, 14 Apr 2013 18:31:59 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh3-ve1.go2.pl (moh3-ve2.go2.pl [193.17.41.86]) by mx1.freebsd.org (Postfix) with ESMTP id B7DE9CE2 for ; Sun, 14 Apr 2013 18:31:58 +0000 (UTC) Received: from moh3-ve1.go2.pl (unknown [10.0.0.157]) by moh3-ve1.go2.pl (Postfix) with ESMTP id 30AF7AF696C for ; Sun, 14 Apr 2013 20:31:57 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh3-ve1.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 20:31:57 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id zSKbSz; Sun, 14 Apr 2013 20:31:55 +0200 Message-ID: <516AF61B.7060204@o2.pl> Date: Sun, 14 Apr 2013 20:31:55 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Steven Hartland Subject: Re: A failed drive causes system to hang References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> In-Reply-To: <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-O2-Trust: 1, 34 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 18:31:59 -0000 On 14/04/2013 16:09, Steven Hartland wrote: > > ----- Original Message ----- From: "Radio mlodych bandytów" > > To: > Cc: > Sent: Sunday, April 14, 2013 11:10 AM > Subject: A failed drive causes system to hang > > >> Cross-post from freebsd-fs: >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs >> >> >> I have a failing drive in my array. I need to RMA it, but don't have >> time and it fails rarely enough to be a yet another annoyance. >> The failure is simple: it fails to respond. >> When it happens, the only thing I found I can do is switch consoles. >> Any command hangs, login on different consoles hangs, apps hang. >> I run PC-BSD 9.1. >> >> On the 1st console I see a series of messages like: >> >> (ada0:ahcich0:0:0:0): CAM status: Command timeout >> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED >> >> I've seen it happening even when running an installer from a different >> drive, while preparing installation (don't remember which step). >> >> I have partial dmesg screenshots from an older failure (21st of >> December 2012), transcript below: >> >> Screen1: >> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) >> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) >> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) >> 00 > > smartctl has the ability to print out the queued log file if > the drive supports it. This may give you some more information > on what the problem may be with your drive. > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, printing > or otherwise disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > No errors on any of these drives. -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 18:51:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 177C4E4A for ; Sun, 14 Apr 2013 18:51:22 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta15.emeryville.ca.mail.comcast.net (qmta15.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:228]) by mx1.freebsd.org (Postfix) with ESMTP id EF033D6D for ; Sun, 14 Apr 2013 18:51:21 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta15.emeryville.ca.mail.comcast.net with comcast id PuQK1l0021u4NiLAFurM08; Sun, 14 Apr 2013 18:51:21 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta21.emeryville.ca.mail.comcast.net with comcast id PurH1l00y1t3BNj8hurJoF; Sun, 14 Apr 2013 18:51:20 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B640973A33; Sun, 14 Apr 2013 11:51:17 -0700 (PDT) Date: Sun, 14 Apr 2013 11:51:17 -0700 From: Jeremy Chadwick To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?= Subject: Re: A failed drive causes system to hang Message-ID: <20130414185117.GA38259@icarus.home.lan> References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <516AF61B.7060204@o2.pl> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1365965481; bh=aFlhudGBuSyEIQpvUvzPrBEU0hp/uBYCAjeLPdB7KSw=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=ChOvudQdeeWdlUIUmjtao4JIR2ZUjo8pMBS2wfxzjt7tYdVdiSH6VeYl5IKhIAtS5 I8EypUwTxezsw+KJNIH7JE8NGBTA9OP7g8SSQKov88NDMAcFu78G8l5jhGTkY5rhmG GsphLoYvSWzfW3JVr+9bE+v9klkFwKm/53MFt0aH1D7ngWQvBo1uECzmAeXF8ePJOg uJrj16JMeXykR7RoBAUcVygfj0jNfW5z7domD0Fk6rrx4AqVmTEQsVSI8dsmjIauaT ljh3JWbRtQe/wRJrVFZTP/K/DGQUUK8+sgtP0niekkQK5+4V1yTonoapv0o6BtSo/C VGpPXj+L6aIMA== Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 18:51:22 -0000 On Sun, Apr 14, 2013 at 08:31:55PM +0200, Radio m?odych bandytw wrote: > On 14/04/2013 16:09, Steven Hartland wrote: > > > >----- Original Message ----- From: "Radio mlodych bandytów" > > > >To: > >Cc: > >Sent: Sunday, April 14, 2013 11:10 AM > >Subject: A failed drive causes system to hang > > > > > >>Cross-post from freebsd-fs: > >>http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs > >> > >> > >>I have a failing drive in my array. I need to RMA it, but don't have > >>time and it fails rarely enough to be a yet another annoyance. > >>The failure is simple: it fails to respond. > >>When it happens, the only thing I found I can do is switch consoles. > >>Any command hangs, login on different consoles hangs, apps hang. > >>I run PC-BSD 9.1. > >> > >>On the 1st console I see a series of messages like: > >> > >>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED > >> > >>I've seen it happening even when running an installer from a different > >>drive, while preparing installation (don't remember which step). > >> > >>I have partial dmesg screenshots from an older failure (21st of > >>December 2012), transcript below: > >> > >>Screen1: > >>(ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) > >>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) > >>00 > > > >smartctl has the ability to print out the queued log file if > >the drive supports it. This may give you some more information > >on what the problem may be with your drive. > > No errors on any of these drives. Please provide full output from the following command, and please retain the formatting (pastebin, etc.): smartctl -x /dev/ada0 I would also appreciate seeing the same output for the other drives on the system (specifically /dev/ada1 and /dev/ada2), now that I've seen the dmesg output. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 18:58:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 394ED16B for ; Sun, 14 Apr 2013 18:58:16 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-vb0-x236.google.com (mail-vb0-x236.google.com [IPv6:2607:f8b0:400c:c02::236]) by mx1.freebsd.org (Postfix) with ESMTP id F1B12DBA for ; Sun, 14 Apr 2013 18:58:15 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id w16so3304167vbf.27 for ; Sun, 14 Apr 2013 11:58:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=ewlU8n7ARbOs00OqEYT6KyC/XB5MbBDFjacepi2fRA0=; b=MxD8xSd1gphZFlXDMcbTXp+Mg6PxTCd2xMUK5Nqw4CkCyaoXOLSx/wlAURh2TnEvF6 va3rYZyzuUz0Vz60nRzMKVbRMU8VDtCHI4rB0/roFc+FkkbonZxTTHOWKw4J+QGoEwgV TpDofOE2wY5pKY4128kL/OMsxs5cR/T02zrZzw8TylajouBo66erex+WfCEiW9FuqEDW BdwPGiFPTAofLUSPjK4z/85O3GO5EHdGkZvIgqtZ1Wxr0dTC0AWC4GtYtwOrvMSunNzN U3DZPW0B/gzf7zzqNaBTm9MguvyA4jg1mbWuyLchROfxqXnp1nRKEOuwcC3gH1CKdn9V 1gjw== MIME-Version: 1.0 X-Received: by 10.52.183.36 with SMTP id ej4mr12056052vdc.95.1365965895509; Sun, 14 Apr 2013 11:58:15 -0700 (PDT) Received: by 10.220.91.83 with HTTP; Sun, 14 Apr 2013 11:58:15 -0700 (PDT) In-Reply-To: <20130414185117.GA38259@icarus.home.lan> References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan> Date: Sun, 14 Apr 2013 14:58:15 -0400 Message-ID: Subject: Re: A failed drive causes system to hang From: Zaphod Beeblebrox To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs , =?UTF-8?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMSCxYJ3?= , support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 18:58:16 -0000 I'd like to throw in my two cents here. I've seen this (drives in RAID-1 configuration) hanging whole systems. Back in the IDE days, two drives were connected with one cable --- I largely wrote it off as a deficiency of IDE hardware and resolved to by SCSI hardware for more important systems. Of late, the physical hardware for SCSI (SAS) and SATA drives have converged. I'm willing to accept that SAS hardware may be built to a different standard, but I'm suspicious of the fact that a bad SATA drive on an ACH* controller can hang the whole system. ... it's not complete, however. Often pulling the drive's cable will unfreeze things. It's also not entirely consistent. Drives I have behind 4:1 port multipliers haven't (so far) hung the system that they're on (which uses ACH10). Right now, I have a remote ACH10 system that's hung hard a couple of times --- and it passes both it's short and long SMART tests on both drives. Is there no global timeout we can depend on here? From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 19:11:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C20D12BF for ; Sun, 14 Apr 2013 19:11:28 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh1-ve1.go2.pl (moh1-ve1.go2.pl [193.17.41.131]) by mx1.freebsd.org (Postfix) with ESMTP id 839BFE13 for ; Sun, 14 Apr 2013 19:11:28 +0000 (UTC) Received: from moh1-ve1.go2.pl (unknown [10.0.0.131]) by moh1-ve1.go2.pl (Postfix) with ESMTP id 452F691F25D for ; Sun, 14 Apr 2013 21:11:25 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh1-ve1.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 21:11:25 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id GtzhlA; Sun, 14 Apr 2013 21:11:22 +0200 Message-ID: <516AFF5A.9010508@o2.pl> Date: Sun, 14 Apr 2013 21:11:22 +0200 From: Radio młodych bandytów User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan> In-Reply-To: <20130414185117.GA38259@icarus.home.lan> Content-Type: text/plain; charset=unknown-8bit Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 32 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 19:11:28 -0000 From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 19:28:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 93D278FA for ; Sun, 14 Apr 2013 19:28:31 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32]) by mx1.freebsd.org (Postfix) with ESMTP id 7623BEA6 for ; Sun, 14 Apr 2013 19:28:31 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta03.emeryville.ca.mail.comcast.net with comcast id PuCg1l0090x6nqcA3vUX5l; Sun, 14 Apr 2013 19:28:31 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta12.emeryville.ca.mail.comcast.net with comcast id PvUW1l0051t3BNj8YvUWRB; Sun, 14 Apr 2013 19:28:30 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 37E4A73A33; Sun, 14 Apr 2013 12:28:30 -0700 (PDT) Date: Sun, 14 Apr 2013 12:28:30 -0700 From: Jeremy Chadwick To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?= Subject: Re: A failed drive causes system to hang Message-ID: <20130414192830.GA38338@icarus.home.lan> References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> <516A8646.4000101@o2.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <516A8646.4000101@o2.pl> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1365967711; bh=Xhq5IrUfIsGMwcteqwHSZ8Q8/UYGbrM8mlXwUKgQxV8=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=qi6dwvBY1NZBMSLmCreMheNr6yEkZRH9bnMOWrAn2C3sD6foIX3cgbVOW5U2JCv5a 68rcMkLcQho+lQktkEUmc7gfo3F3ZndwPNNCgdjtul6N+JKJY4o7Trl+WxP8oC5zH8 gl9uSHaORGFtta+MGBw5Fli5BeV/9EDzoqbwRynJavDFq2b0Nhu4615rG7Gf5ZBNCt KH6zwoK+s2E7efwVyNZEidaAMrRK1FCcqqvN9b3zB5ihMCt1H7DIRtqojkKHG7Aq2Q Zzyun+2wrdbwSk5BD7b66oGLaO5Jfv1GP7bd491wwkeAIgwS+VufOrDtA45nzc1Blu XUQIA8JoR9xww== Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 19:28:31 -0000 On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote: > On 13/04/2013 02:07, Jeremy Chadwick wrote: > >On Sat, Apr 13, 2013 at 12:33:10AM +0200, Radio m?odych bandytw wrote: > >>On 13/04/2013 00:03, Jeremy Chadwick wrote: > >>>On Fri, Apr 12, 2013 at 11:52:31PM +0200, Radio m?odych bandytw wrote: > >>>>On 11/04/2013 23:24, Jeremy Chadwick wrote: > >>>>>On Thu, Apr 11, 2013 at 10:47:32PM +0200, Radio m?odych bandytw wrote: > >>>>>>Seeing a ZFS thread, I decided to write about a similar problem that > >>>>>>I experience. > >>>>>>I have a failing drive in my array. I need to RMA it, but don't have > >>>>>>time and it fails rarely enough to be a yet another annoyance. > >>>>>>The failure is simple: it fails to respond. > >>>>>>When it happens, the only thing I found I can do is switch consoles. > >>>>>>Any command fails, login fails, apps hang. > >>>>>> > >>>>>>On the 1st console I see a series of messages like: > >>>>>> > >>>>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>>>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED > >>>>>> > >>>>>>I use RAIDZ1 and I'd expect that none single failure would cause the > >>>>>>system to fail... > >>>>> > >>>>>You need to provide full output from "dmesg", and you need to define > >>>>>what the word "fails" means (re: "any command fails", "login fails"). > >>>>Fails = hangs. When trying to log it, I can type my user name, but > >>>>after I press enter the prompt for password never appear. > >>>>As to dmesg, tough luck. I have 2 photos on my phone and their > >>>>transcripts are all I can give until the problem reappears (which > >>>>should take up to 2 weeks). Photos are blurry and in many cases I'm > >>>>not sure what exactly is there. > >>>> > >>>>Screen1: > >>>>(ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) > >>>>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) > >>>>00 > >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) > >>>>00 > >>>>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) > >>>>00 > >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>> > >>>> > >>>>Screen 2: > >>>>ahcich0: Timeout on slot 29 port 0 > >>>>ahcich0: (unreadable, lots of numbers, some text) > >>>>(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > >>>>(aprobe0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > >>>>ahcich0: Timeout on slot 29 port 0 > >>>>ahcich0: (unreadable, lots of numbers, some text) > >>>>(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > >>>>(aprobe0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > >>>>ahcich0: Timeout on slot 30 port 0 > >>>>ahcich0: (unreadable, lots of numbers, some text) > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > >>>> > >>>>Both are from the same event. In general, messages: > >>>> > >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. > >>>> > >>>>are the most common. > >>>> > >>>>I've waited for more than 1/2 hour once and the system didn't return > >>>>to a working state, the messages kept flowing and pretty much > >>>>nothing was working. What's interesting, I remember that it happened > >>>>to me even when I was using an installer (PC-BSD one), before the > >>>>actual installation began, so the disk stored no program data. And I > >>>>*think* there was no ZFS yet anyway. > >>>> > >>>>> > >>>>>I've already demonstrated that loss of a disk in raidz1 (or even 2 disks > >>>>>in raidz2) does not cause ""the system to fail"" on stable/9. However, > >>>>>if you lose enough members or vdevs to cause catastrophic failure, there > >>>>>may be anomalies depending on how your system is set up: > >>>>> > >>>>>http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html > >>>>> > >>>>>If the pool has failmode=wait, any I/O to that pool will block (wait) > >>>>>indefinitely. This is the default. > >>>>> > >>>>>If the pool has failmode=continue, existing write I/O operations will > >>>>>fail with EIO (I/O error) (and hopefully applications/daemons will > >>>>>handle that gracefully -- if not, that's their fault) but any subsequent > >>>>>I/O (read or write) to that pool will block (wait) indefinitely. > >>>>> > >>>>>If the pool has failmode=panic, the kernel will immediately panic. > >>>>> > >>>>>If the CAM layer is what's wedged, that may be a different issue (and > >>>>>not related to ZFS). I would suggest running stable/9 as many > >>>>>improvements in this regard have been committed recently (some related > >>>>>to CAM, others related to ZFS and its new "deadman" watcher). > >>>> > >>>>Yeah, because of the installer failure, I don't think it's related to ZFS. > >>>>Even if it is, for now I won't set any ZFS properties in hope it > >>>>repeats and I can get better data. > >>>>> > >>>>>Bottom line: terse output of the problem does not help. Be verbose, > >>>>>provide all output (commands you type, everything!), as well as any > >>>>>physical actions you take. > >>>>> > >>>>Yep. In fact having little data was what made me hesitate to write > >>>>about it; since I did already, I'll do my best to get more info, > >>>>though for now I can only wait for a repetition. > >>>> > >>>> > >>>>On 12/04/2013 00:08, Quartz wrote:> > >>>>>>Seeing a ZFS thread, I decided to write about a similar problem that I > >>>>>>experience. > >>>>> > >>>>>I'm assuming you're referring to my "Failed pool causes system to hang" > >>>>>thread. I wonder if there's some common issue with zfs where it locks up > >>>>>if it can't write to disks how it wants to. > >>>>> > >>>>>I'm not sure how similar your problem is to mine. What's your pool setup > >>>>>look like? Redundancy options? Are you booting from a pool? I'd be > >>>>>interested to know if you can just yank the cable to the drive and see > >>>>>if the system recovers. > >>>>> > >>>>>You seem to be worse off than me- I can still login and run at least a > >>>>>couple commands. I'm booting from a straight ufs drive though. > >>>>> > >>>>>______________________________________ > >>>>>it has a certain smooth-brained appeal > >>>>> > >>>>Like I said, I don't think it's ZFS-specific, but just in case...: > >>>>RAIDZ1, root on ZFS. I should reduce severity of a pool loss before > >>>>pulling cables, so no tests for now. > >>> > >>>Key points: > >>> > >>>1. We now know why "commands hang" and anything I/O-related blocks > >>>(waits) for you: because your root filesystem is ZFS. If the ZFS layer > >>>is waiting on CAM, and CAM is waiting on your hardware, then those I/O > >>>requests are going to block indefinitely. So now you know the answer to > >>>why that happens. > >>> > >>>2. I agree that the problem is not likely in ZFS, but rather either with > >>>CAM, the AHCI implementation used, or hardware (either disk or storage > >>>controller). > >>> > >>>3. Your lack of "dmesg" is going to make this virtually impossible to > >>>solve. We really, ***really*** need that. I cannot stress this enough. > >>>This will tell us a lot of information about your system. We're also > >>>going to need to see "zpool status" output, as well as "zpool get all" > >>>and "zfs get all". "pciconf -lvbc" would also be useful. > >>> > >>>There are some known "gotchas" with certain models of hard disks or AHCI > >>>controllers (which is responsible is unknown at this time), but I don't > >>>want to start jumping to conclusions until full details can be provided > >>>first. > >>> > >>>I would recommend formatting a USB flash drive as FAT/FAT32, booting > >>>into single-user mode, then mounting the USB flash drive and issuing > >>>the above commands + writing the output to files on the flash drive, > >>>then provide those here. > >>> > >>>We really need this information. > >>> > >>>4. Please involve the PC-BSD folks in this discussion. They need to be > >>>made aware of issues like this so they (and iXSystems, potentially) can > >>>investigate from their side. > >>> > >>OK, thanks for the info. > >>Since dmesg is so important, I'd say the best thing is to wait for > >>the problem to happen again. When it does, I'll restart the thread > >>with every information that you requested here and with a PC-BSD > >>cross-post. > >> > >>However, I just got a different hang just a while ago. This time it > >>was temporary, I don't know, I switched to console0 after ~10 > >>seconds, there were 2 errors. Nothing appeared for ~1 minute, so I > >>switched back and the system was OK. Different drive, I haven't seen > >>problems with this one. And I think they used to be ahci, here's > >>ata. > >> > >>dmesg: > >> > >>fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 > >>(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 > >>(ada1:ata0:0:0:0): CAM status: Command timeout > >>(ada1:ata0:0:0:0): Retrying command > >>vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 > >>linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented > >>(ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 > >>(ada1:ata0:0:0:0): CAM status: Command timeout > >>(ada1:ata0:0:0:0): Retrying command > >> > >>{another 150KBytes of data snipped} > > > >The above output indicates that there was a timeout when trying to issue > >a 48-bit DMA request to the disk. The disk did not respond to the > >request within 30 seconds. > > > >If you were using AHCI, we'd be able to see if the AHCI layer was > >reporting signalling problems or other anomalies that could explain the > >behaviour. With ATA, such is significantly limited. It's worse if > >you're hiding/not showing us the entire information. > > > >The classic FreeBSD ATA driver does not provide command queueing (NCQ), > >while AHCI via CAM does. The difference is that command queueing causes > >xxx_FPDMA_QUEUED CDBs to be issued to the disk. > > > >I'm going to repeat myself -- for the last time: CAN YOU PLEASE JUST > >PROVIDE "DMESG" FROM THE SYSTEM? Like after a fresh reboot? If you're > >able to provide all of the above, I don't know why you can't provide > >dmesg. It is the most important information that there is. I am sick > >and tired of stressing this point. > Sorry. I thought just the error was important. So here you are: > dmesg.boot: > http://pastebin.com/LFXPusMX Thank you. Please read everything I have written below before doing anything. Based on this output, we can see the following: * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700 controller: ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0 * The system has 3 disks attached to this controller: ada0 at ahcich0 bus 0 scbus2 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) ada1 at ata0 bus 0 scbus6 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) ada2 at ata0 bus 0 scbus6 target 1 lun 0 ada2: ATA-8 SATA 2.x device ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) Let's talk about ada0 and ada1 first. The WD15EARS drives have two features that can cause problems during I/O requests: - They are "energy efficient" drives ("Green", "GreenPower", or -GP). These drives are known to repeatedly and very aggressively park their heads, which can cause horrible performance and other I/O anomalies, particularly timeouts during I/O operations (reads or writes). - Their physical sector size is 4096 bytes, but like all drives, logically advertise 512 byte sectors to retain compatibility. Partitions which do not align themselves to 4096-byte boundaries will result in abysmally degraded performance, particularly during writes. The drive internally may also have issues trying to deal with this situation after prolonged use (in a non-aligned state). When using 4KByte sector drives with ZFS, you have to "prep" them using gnop(8) first. Ivan Voras describes this procedure here: http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html When using 4KByte sector drives with UFS/UFS2, the easiest method is to use the GPT partitioning scheme (rather than classic MBR) to ensure alignment. This can be done manually through gpart(8). Warren Block's article on this is recommended (for both SSDs and MHDDs)< though for MHDDs you can change the alignment size to "4k" rather than "1m": http://www.wonkity.com/~wblock/docs/html/ssd.html - ada1 is only negotiating SATA150 speeds, even though this is a SATA300-capable drive (compare it to ada0). The WD15EARS drives have a jumper that can limit the PHY speed to SATA150 speeds. Please shut the system down and remove the disk (ada1) and physically examine it to see if that jumper is installed. If it is, please remove it. Jumper location: http://wdc.custhelp.com/app/answers/detail/search/1/a_id/1679#jumper Now on ada2... The ST3640323AS is one of Seagate's infamous Barracuda 7200.11 drives, which are known for: - Infamous firmware bugs, the most major of which is the drive becoming permanently catatonic (this has been covered by the media at great length). - Being "energy efficient", which means excessively parking its heads (same issue as the WD "Green" drives, but with no way to disable or inhibit the behaviour). The firmware on your ST3640323AS is version SD13; the latest firmware is SD1B (8 versions newer). I would strongly suggest upgrading this drive ASAP. Thankfully the ST3640323AS is a true 512-byte sector drive. Back to the rest of the specifics... * ZFS is in use for the root filesystem and possibly others: Trying to mount root from zfs:tank1/ROOT/default []... * The system is amd64 and has 4GB RAM; ZFS prefetch is therefore forcefully disabled: real memory = 4294967296 (4096 MB) avail memory = 4073431040 (3884 MB) ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. ZFS filesystem version 5 ZFS storage pool version 28 ...now, all that said, much more output is needed. I would like to see output from the following commands: - gpart show ada0 - gpart show ada1 - gpart show ada2 - zfs get all (keep reading for why I'm asking for this again) And also this command, run per every pool you have on the system: - zdb -C {poolname} | grep ashift (For readers) I do not need "zpool status" because I've seen it here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/017011.html http://pastebin.com/D3Av7x9X (Also for readers) I do not need "zfs get all" or "zpool get all" because the individual has made them available here: zfs get all -- http://pastebin.com/4sT37VqZ zpool get all -- http://pastebin.com/HZJTJPa2 One thing I do see is that at some point you did enable compression on your tank1 filesystem (I can tell because "compressratio" is 1.03x rather than 1.00x), but may have disabled it later. I'm not going to get into a debate about this, but my advice is to not use compression or dedup (either feature) on FreeBSD ZFS. Finally, as I asked in another post in this thread, I would like you to provide output from the following command (once per disk): - smartctl -x /dev/adaX > >Furthermore, please stop changing ATA vs. AHCI interface drivers. > >The more you change/screw around with, the less likely people are going > >to help. CHANGE NOTHING ON THE SYSTEM. Leave it how it is. Do not > >fiddle with things or start flipping switches/changing settings/etc. to > >"try and relieve the problem". You're asking other people for help, > >which means you need to be patient and follow what we ask. > I haven't changed one bit myself. It may have been a change of > defaults in PC-BSD. I just asked them about it. > Or maybe different drives use different drivers. If AHCI is enabled in your system BIOS, FreeBSD 9.x will use AHCI with CAM. If AHCI is not enabled in your system BIOS, FreeBSD 9.x will use classic ata(4) with CAM. In both cases disks will show up as /dev/adaX, but whether one is controlled with ahcichXX or ataX depends on AHCI capability. Your initial post showed this: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016996.html (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED Which shows AHCI in use. But then later, a different post said this: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/017011.html (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 (ada1:ata0:0:0:0): CAM status: Command timeout (ada1:ata0:0:0:0): Retrying command Which shows AHCI **not** in use. FreeBSD **does not** use "different drivers per drive". No OS does this: period. This is not how storage subsystems work, nor have ever worked. If you ever see the system suddenly reporting "ataX" (read: I said "atax" not "adaX"), and you are **ABSOLUTELY CERTAIN** AHCI mode is enabled in your BIOS, then to me that means your motherboard or SATA controller is behaving very erratically/wrong. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 19:44:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 80828AD1 for ; Sun, 14 Apr 2013 19:44:41 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id 63FFDF25 for ; Sun, 14 Apr 2013 19:44:41 +0000 (UTC) Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90]) by qmta01.emeryville.ca.mail.comcast.net with comcast id Pvah1l0051wfjNsA1vkhNK; Sun, 14 Apr 2013 19:44:41 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta23.emeryville.ca.mail.comcast.net with comcast id Pvkg1l00G1t3BNj8jvkggn; Sun, 14 Apr 2013 19:44:40 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 36D8773A33; Sun, 14 Apr 2013 12:44:40 -0700 (PDT) Date: Sun, 14 Apr 2013 12:44:40 -0700 From: Jeremy Chadwick To: Zaphod Beeblebrox Subject: Re: A failed drive causes system to hang Message-ID: <20130414194440.GB38338@icarus.home.lan> References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1365968681; bh=0DLeLOXztL+2XP6AuRY3dQhcrDrRHJSz8iTbu3d9Bno=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=KtbYfALT8TkT9r4ArqmVd35mYKsaKk/qmhw+TTr+2TKiRUaX1kSY+cMzJvI2PFYAX bwNZ1p1NNxnvRM+WZm6hpurl1mJZJQy+x5sBBcZ4nD7488owDmXDHDVq3Nr42eACjy TZrdaWxcy/wB/4fac/PES5dS9K96fljAWKQT9Sjn7z19a31ijU8FTNTPbzOG53uwo5 hLugpJvLbcmA8ebaPXBtczcU41l1sT8y0qS0h9t1+zM219Z26pjynafOS2CnUoS0CB 1lpnVPxQZQiZ9EG8htCo7dUXovrJXDYkWJBgZxRI2PPNOtHISxo8M1OOWX6uKPGJQJ ebiy9uqHMmFbw== Cc: freebsd-fs , Radio =?unknown-8bit?B?bcS5P29keWNoIGJhbmR5dMQ/xT93?= , support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 19:44:41 -0000 On Sun, Apr 14, 2013 at 02:58:15PM -0400, Zaphod Beeblebrox wrote: > I'd like to throw in my two cents here. I've seen this (drives in RAID-1 > configuration) hanging whole systems. Back in the IDE days, two drives > were connected with one cable --- I largely wrote it off as a deficiency of > IDE hardware and resolved to by SCSI hardware for more important systems. > Of late, the physical hardware for SCSI (SAS) and SATA drives have > converged. I'm willing to accept that SAS hardware may be built to a > different standard, but I'm suspicious of the fact that a bad SATA drive on > an ACH* controller can hang the whole system. Note to readers: this is borderline off-topic and is going to confuse the thread even more. I will respond to this ONLY ONCE, and WILL NOT be responding to this part of the thread past this point. I have only seen this happen on very specific controllers (JMicron for example), where either the AHCI driver was broken/badly written, or the underlying AHCI option ROM/firmware code was broken/badly written. > ... it's not complete, however. Often pulling the drive's cable will > unfreeze things. It's also not entirely consistent. Drives I have > behind 4:1 port multipliers haven't (so far) hung the system that > they're on (which uses ACH10). Right now, I have a remote ACH10 > system that's hung hard a couple of times --- and it passes both it's > short and long SMART tests on both drives. PMPs (port multipliers) are a *completely* separate beast, where some AHCI controllers (at a silicon level) screw up/break. In fact, the IXP600/700 is one such controller, and workarounds had to be put into FreeBSD and Linux for them. I can dig up the commits if need be. Rule of thumb (which you know -- this is for other readers): when using a PM, it's VERY IMPORTANT that be disclosed up front. These add a serious complication to analysis of the SATA subsystem as a whole, and in a lot of cases visibility into details are lost as a result. PMPs in general are "bleh". > Is there no global timeout we can depend on here? Please see kern.cam.ada.default_timeout (for adaX devices) and kern.cam.pmp.default_timeout (for I/O requests going across a PMP). Otherwise Alexander Motin (mav@) would be the guy to ask about PMP issues, and/or get him hardware + provide a reliable reproduction methodology for the issue. All the above said: Respectfully, please do not conflate your issue with this one. Please start a new thread (do not reply to this thread and change the Subject line, please actually start a brand new Email to ensure no Reference headers are retained) about this issue if you wish. There is already too much crap going on in this thread with 4 different people with what are 4 different issues, and nobody at this point is able to keep track of it all (including the participants). This situation happens way, WAY too often with storage-related matters on the list. ANYTHING ZFS-related and ANYTHING storage-related results in bandwagon-jumping and threads that spiral out of control/become almost useless and certainly impossible to follow. It needs to stop. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 19:52:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 486E6BA5 for ; Sun, 14 Apr 2013 19:52:14 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32]) by mx1.freebsd.org (Postfix) with ESMTP id 2BC66F53 for ; Sun, 14 Apr 2013 19:52:14 +0000 (UTC) Received: from omta16.emeryville.ca.mail.comcast.net ([76.96.30.72]) by qmta03.emeryville.ca.mail.comcast.net with comcast id PvlN1l0061ZMdJ4A3vsEzT; Sun, 14 Apr 2013 19:52:14 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta16.emeryville.ca.mail.comcast.net with comcast id PvsC1l0031t3BNj8cvsCH4; Sun, 14 Apr 2013 19:52:13 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id F2A2873A33; Sun, 14 Apr 2013 12:52:11 -0700 (PDT) Date: Sun, 14 Apr 2013 12:52:11 -0700 From: Jeremy Chadwick To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?= Subject: Re: A failed drive causes system to hang Message-ID: <20130414195211.GA39201@icarus.home.lan> References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130414192830.GA38338@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1365969134; bh=aN5pjazc8G4sxyH4/0He8A6Tw+BncOmtEMIaLNiGkiQ=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=NJs0/JFwDP1bvE9ujhC4nVuq5ZlVhLDMNH9udeuaNiYdtnBTbESW48dLVZcp509A0 X2y/5ZdOk6UlpXGgY/54P/eoO1n82zJzHWcg5IundwgRmVx5v0kJ7wWUh7YV+FCL6E MkAJm5TwOHtuRkwv5JERNg3tI4B69zYCyLOU2o13/Kl9FW0zijMtumIYFDwbyQrlCn HlOXAR1RE+aee5b+uGsCjdeKnpJahU186Yb/vaOm2TvB3+Lx6X0F6aa7wR7ERO6maT fgUKxTx2nvh0gU/HzOuAAL3VOIGEW3z8+NEQJXzXWw4KxBBKhAnaJ5q/3P9oMb8x2v yHK+Wc1rBpRow== Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 19:52:14 -0000 {snipping lots for brevity} On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote: > On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote: > > Sorry. I thought just the error was important. So here you are: > > dmesg.boot: > > http://pastebin.com/LFXPusMX > > Thank you. Please read everything I have written below before doing > anything. > > Based on this output, we can see the following: > > * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700 > controller: > > ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0 > > * The system has 3 disks attached to this controller: > > ada0 at ahcich0 bus 0 scbus2 target 0 lun 0 > ada0: ATA-8 SATA 2.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) > ada1 at ata0 bus 0 scbus6 target 0 lun 0 > ada1: ATA-8 SATA 2.x device > ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) > ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) > ada2 at ata0 bus 0 scbus6 target 1 lun 0 > ada2: ATA-8 SATA 2.x device > ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) > ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) > > Let's talk about ada0 and ada1 first. Hold up a minute -- I just noticed some key information here (see what happens with big conflated threads?), and it sheds some light on my concerns with AHCI vs. classic ata(4): ada0 -- attached to ahcich0 ada1 -- attached to ata0 (presumably a "master" drive) ada2 -- attached to ata0 (presumably a "slave" drive) This is extremely confusing, because ata0 is a classic ATA controller (I can even tell from the classic ISA I/O port ranges): atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 ata0: at channel 0 on atapci1 ata1: at channel 1 on atapci1 Yet the WD15EARS and ST3640323AS drives are physically SATA drives. Are you using SATA-to-IDE adapters on these two drives? If not, this seems to indicate the motherboard and/or SATA controller is actually only binding 1 disk to AHCI, while the others are bound to the same controller operating in (possibly) "SATA Enhanced" mode. This would be the first I've ever seen of this (a controller operating in both modes simultaneously), but I have a lot more experience with Intel SATA controllers than I do AMD. I don't know why a system would do this, unless all of this can be controlled via the BIOS somehow. What a mess. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 20:35:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 33968776 for ; Sun, 14 Apr 2013 20:35:54 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh3-ve1.go2.pl (moh3-ve2.go2.pl [193.17.41.86]) by mx1.freebsd.org (Postfix) with ESMTP id A481C12C for ; Sun, 14 Apr 2013 20:35:53 +0000 (UTC) Received: from moh3-ve1.go2.pl (unknown [10.0.0.157]) by moh3-ve1.go2.pl (Postfix) with ESMTP id 4C40AAF696B for ; Sun, 14 Apr 2013 22:35:52 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh3-ve1.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 22:35:52 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id nEdjtI; Sun, 14 Apr 2013 22:35:44 +0200 Message-ID: <516B1315.8060408@o2.pl> Date: Sun, 14 Apr 2013 22:35:33 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan> <20130414195211.GA39201@icarus.home.lan> In-Reply-To: <20130414195211.GA39201@icarus.home.lan> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 30 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 20:35:54 -0000 On 14/04/2013 21:52, Jeremy Chadwick wrote: > {snipping lots for brevity} > > On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote: >> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote: >>> Sorry. I thought just the error was important. So here you are: >>> dmesg.boot: >>> http://pastebin.com/LFXPusMX >> >> Thank you. Please read everything I have written below before doing >> anything. >> >> Based on this output, we can see the following: >> >> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700 >> controller: >> >> ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0 >> >> * The system has 3 disks attached to this controller: >> >> ada0 at ahcich0 bus 0 scbus2 target 0 lun 0 >> ada0: ATA-8 SATA 2.x device >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada0: Command Queueing enabled >> ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) >> ada1 at ata0 bus 0 scbus6 target 0 lun 0 >> ada1: ATA-8 SATA 2.x device >> ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) >> ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) >> ada2 at ata0 bus 0 scbus6 target 1 lun 0 >> ada2: ATA-8 SATA 2.x device >> ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) >> ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) >> >> Let's talk about ada0 and ada1 first. > > Hold up a minute -- I just noticed some key information here (see what > happens with big conflated threads?), and it sheds some light on my > concerns with AHCI vs. classic ata(4): > > ada0 -- attached to ahcich0 > ada1 -- attached to ata0 (presumably a "master" drive) > ada2 -- attached to ata0 (presumably a "slave" drive) > > This is extremely confusing, because ata0 is a classic ATA controller (I > can even tell from the classic ISA I/O port ranges): > > atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 > ata0: at channel 0 on atapci1 > ata1: at channel 1 on atapci1 > > Yet the WD15EARS and ST3640323AS drives are physically SATA drives. > > Are you using SATA-to-IDE adapters on these two drives? No. > > If not, this seems to indicate the motherboard and/or SATA controller > is actually only binding 1 disk to AHCI, while the others are bound to > the same controller operating in (possibly) "SATA Enhanced" mode. > > This would be the first I've ever seen of this (a controller operating > in both modes simultaneously), but I have a lot more experience with > Intel SATA controllers than I do AMD. > > I don't know why a system would do this, unless all of this can be > controlled via the BIOS somehow. What a mess. > I looked into BIOS and it can be controlled. 6 ports are divided into 2 triples and I can switch mode of each triple independently. One drive is connected to one and two to the other. Looks like there's a bug because both triples are set to ATA. I left them like that for now. Anyway, I got the hang again, so I can provide dmesg. I was not at the computer when it happened, so there's only the last screen though... pastebin.com/bjYtzPgs -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 21:24:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C953DC9 for ; Sun, 14 Apr 2013 21:24:42 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id AA4512A7 for ; Sun, 14 Apr 2013 21:24:42 +0000 (UTC) Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19]) by qmta01.emeryville.ca.mail.comcast.net with comcast id PxPV1l0040QkzPwA1xQimh; Sun, 14 Apr 2013 21:24:42 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta02.emeryville.ca.mail.comcast.net with comcast id PxQg1l00F1t3BNj8NxQgnr; Sun, 14 Apr 2013 21:24:42 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4F0D673A33; Sun, 14 Apr 2013 14:24:40 -0700 (PDT) Date: Sun, 14 Apr 2013 14:24:40 -0700 From: Jeremy Chadwick To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?= Subject: Re: A failed drive causes system to hang Message-ID: <20130414212440.GA40325@icarus.home.lan> References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan> <20130414195211.GA39201@icarus.home.lan> <516B1315.8060408@o2.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <516B1315.8060408@o2.pl> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1365974682; bh=tqDJomj+Mbsm94ct5aQI5JZO6ZTot4nO+yJwvUUFusE=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=LpFX2Kg8y3t6nXM+Dhr1vBtJ/OSPYdbKQqhZFqSsYMO0OVTnSEUw+MzVozTItB8Lg aHNXzixx/0WBur76N+4m6YpmbqsGiQ7hWuvM2g6CM/OLnQfJ9RrzhywvwBLIE1Z9sM 4yv+Yz2mzhIw9fODEKjZFA77ZqZ91fktBAOongnWT94rehVXxoBLEYVanls7sCs4NR iXtzxQVfxD36rQ285BUx9qDmF0qRhU78HxNTJKqh4XhL1+Up6ygcCA8k+N3E9qi4Kw Z3Y0wRKMk9RzgVQVMussUEm7ZXrXeV+ONwCuo7142UxTaa2Cw7LYQcWH9eE/th1wI/ i/TtMMiKtLP5w== Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 21:24:42 -0000 On Sun, Apr 14, 2013 at 10:35:33PM +0200, Radio m?odych bandytw wrote: > On 14/04/2013 21:52, Jeremy Chadwick wrote: > > {snipping lots for brevity} > > > > On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote: > >> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote: > >>> Sorry. I thought just the error was important. So here you are: > >>> dmesg.boot: > >>> http://pastebin.com/LFXPusMX > >> > >> Thank you. Please read everything I have written below before doing > >> anything. > >> > >> Based on this output, we can see the following: > >> > >> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700 > >> controller: > >> > >> ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0 > >> > >> * The system has 3 disks attached to this controller: > >> > >> ada0 at ahcich0 bus 0 scbus2 target 0 lun 0 > >> ada0: ATA-8 SATA 2.x device > >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > >> ada0: Command Queueing enabled > >> ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) > >> ada1 at ata0 bus 0 scbus6 target 0 lun 0 > >> ada1: ATA-8 SATA 2.x device > >> ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) > >> ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) > >> ada2 at ata0 bus 0 scbus6 target 1 lun 0 > >> ada2: ATA-8 SATA 2.x device > >> ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) > >> ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) > >> > >> Let's talk about ada0 and ada1 first. > > > > Hold up a minute -- I just noticed some key information here (see what > > happens with big conflated threads?), and it sheds some light on my > > concerns with AHCI vs. classic ata(4): > > > > ada0 -- attached to ahcich0 > > ada1 -- attached to ata0 (presumably a "master" drive) > > ada2 -- attached to ata0 (presumably a "slave" drive) > > > > This is extremely confusing, because ata0 is a classic ATA controller (I > > can even tell from the classic ISA I/O port ranges): > > > > atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 > > ata0: at channel 0 on atapci1 > > ata1: at channel 1 on atapci1 > > > > Yet the WD15EARS and ST3640323AS drives are physically SATA drives. > > > > Are you using SATA-to-IDE adapters on these two drives? > No. > > > > If not, this seems to indicate the motherboard and/or SATA controller > > is actually only binding 1 disk to AHCI, while the others are bound to > > the same controller operating in (possibly) "SATA Enhanced" mode. > > > > This would be the first I've ever seen of this (a controller operating > > in both modes simultaneously), but I have a lot more experience with > > Intel SATA controllers than I do AMD. > > > > I don't know why a system would do this, unless all of this can be > > controlled via the BIOS somehow. What a mess. > > > I looked into BIOS and it can be controlled. 6 ports are divided into 2 > triples and I can switch mode of each triple independently. One drive is > connected to one and two to the other. > Looks like there's a bug because both triples are set to ATA. > I left them like that for now. What exact motherboard model is this? I'd like to review the manual. > Anyway, I got the hang again, so I can provide dmesg. I was not at the > computer when it happened, so there's only the last screen though... > pastebin.com/bjYtzPgs Thank you. Sadly the log snippet doesn't have timestamps but this is what transpired: The log snippet you showed indicates the following: * An NCQ-based write CDB (WRITE_FPDMA_QUEUED) was issued to the ada0 drive attached to channel ahcich0 of controller ahci0, and the disk or controller did not respond within 30 seconds (I'm assuming PC-BSD did not change kern.cam.ada.default_timeout from the default of 30 seconds) * The same request was resubmit to the controller (CAM will try submission of a CDB up to 5 times (i.e. 4 retries), which is controlled with kern.cam.ada.retry_count). * The AHCI controller (rather the specific channel of the AHCI controller) also reported that the underlying disk/device was not responding (re: "Timeout on slot X port X"). I see no SERR condition. * An ATA_IDENTIFY CDB was issued to the ada0 drive attached to channel ahcich0 of controller ahci0, and this also timed out after 30 seconds. My gut feeling is that this system is running smartd(8); it's possible the kernel itself could submit the CDB to the drive, but in this condition/state I don't know why it'd do that. * Rinse lather repeat. To me, at first glance, this looks like the ada0 disk is going catatonic. The controller itself seems to be responding fine, just that the disk attached to ahcich0 is locking up hard. I see no sign of an AHCI reset ("AHCI reset..." message) either. So why does your system "hang" (meaning why can't you log in, why do applications stop working, etc.) when this happens? Simple: You're using ZFS for your root filesystem, as shown here: Trying to mount root from zfs:tank1/ROOT/default []... Your ZFS pool called tank1 consists of a raidz1 pool of 3 devices (more specifically partitions): ada0, ada1, and something that is missing. Recap: http://pastebin.com/D3Av7x9X The pool is already degraded, and as you know, raidz1 can only suffer up to loss of one vdev (in this case a disk) before ZFS will begin behaving based on what the pool's "failmode" property is. In effect, when this happens, you're down to only 1 disk: ada1, and that's not sufficient. So ZFS does exactly what it should (with failmode=wait, the default): it waits indefinitely, hoping that things recover. Because this is your root filesystem, as well as tons of other filesystems (including /usr, /var, /var/log, and so on): http://pastebin.com/4sT37VqZ ...any I/O submit to filesystems part of pool tank1 will indefinitely block/wait until things recover. Except they don't recover (and that isn't the fault of ZFS). I imagine if you let the system sit for roughly 5*30 seconds (see above for how I calculated that), you would eventually see a message on the console that looks something like this: (ada0:ahcich0:0:0:0): lost device (ada0:ahcich0:0:0:0): removing device entry So, the crux of your problem is: 1. Your disks fall off the bus for reasons unknown at this time -- I'm still waiting on smartctl -x output for each of your disks. Your disks themselves may actually be okay (I need to review the output to determine that) and the issue may be with SATA cabling, a faulty PSU, or a completely broken SATA controller or motherboard (bad traces, etc.). I am not going to help diagnose those problems, because the only reliable method is to start replacing each part, piece by piece, and see if the issue goes away. 2. Your array is already degraded/broken yet you don't care to fix it. If the array was in decent shape and ada0 fell off the bus, things would still work because ada1 and ada2 would be functioning (re: raidz1). If you were using UFS instead of ZFS for your root filesystem, you would still have the same issue, just that the system would kernel panic. You can induce that behaviour with ZFS as well using failmode=panic. There isn't much more for me to say. Everything is behaving how it's designed, from what I can tell. When you lose your root filesystem, you really can't expect the system to be in some "magical usable state". -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Apr 15 01:32:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 085512C4 for ; Mon, 15 Apr 2013 01:32:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 86B7A9F8 for ; Mon, 15 Apr 2013 01:32:08 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAJNXa1GDaFvO/2dsb2JhbABQgzyDML4RgR50gh8BAQEDAQEBASArIAsFFhgCAg0ZAikBCRgBDQYIBwQBHASHbQYMqCmRXYEjjEJ+NAeCLoETA5M4gQyCQYEhj3CDJyAygQU1 X-IronPort-AV: E=Sophos;i="4.87,472,1363147200"; d="scan'208";a="23873831" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 14 Apr 2013 21:32:01 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 575C1B3F15; Sun, 14 Apr 2013 21:32:01 -0400 (EDT) Date: Sun, 14 Apr 2013 21:32:01 -0400 (EDT) From: Rick Macklem To: Paul van der Zwan Message-ID: <1091296771.826148.1365989521302.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <2B576479-C83A-4D3F-B486-475625383E9C@vanderzwan.org> Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Apr 2013 01:32:10 -0000 Paul van der Zwan wrote: > On 14 Apr 2013, at 5:00 , Rick Macklem wrote: > > > Thanks for taking the effort to send such an extensive reply. > > > Paul van der Zwan wrote: > >> On 12 Apr 2013, at 16:28 , Paul van der Zwan > >> wrote: > >> > >>> > >>> I am running a few VirtualBox VMs with 9.1 on my OpenIndiana > >>> server > >>> and I noticed that make buildworld seem to take much longer > >>> when the clients mount /usr/src and /usr/obj over NFS V4 than when > >>> they use V3. > >>> Unfortunately I have to use V4 as a buildworld on V3 hangs the > >>> server completely... > >>> I noticed the number of PUTFH/GETATTR/GETFH calls in in the order > >>> of > >>> a few thousand per second > >>> and if I snoop the traffic I see the same filenames appear over > >>> and > >>> over again. > >>> It looks like the client is not caching anything at all and using > >>> a > >>> server request everytime. > >>> I use the default mount options: > >>> 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls) > >>> 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls) > >>> 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls) > >>> > >>> > >> > >> I had a look with dtrace > >> $ sudo dtrace -n '::getattr:start { @[stack()]=count();}' > >> and it seems the vast majority of the calls to getattr are from > >> open() > >> and close() system calls.: > >> kernel`newnfs_request+0x631 > >> kernel`nfscl_request+0x75 > >> kernel`nfsrpc_getattr+0xbe > >> kernel`nfs_getattr+0x280 > >> kernel`VOP_GETATTR_APV+0x74 > >> kernel`nfs_lookup+0x3cc > >> kernel`VOP_LOOKUP_APV+0x74 > >> kernel`lookup+0x69e > >> kernel`namei+0x6df > >> kernel`kern_execve+0x47a > >> kernel`sys_execve+0x43 > >> kernel`amd64_syscall+0x3bf > >> kernel`0xffffffff80784947 > >> 26 > >> > >> kernel`newnfs_request+0x631 > >> kernel`nfscl_request+0x75 > >> kernel`nfsrpc_getattr+0xbe > >> kernel`nfs_close+0x3e9 > >> kernel`VOP_CLOSE_APV+0x74 > >> kernel`kern_execve+0x15c5 > >> kernel`sys_execve+0x43 > >> kernel`amd64_syscall+0x3bf > >> kernel`0xffffffff80784947 > >> 26 > >> > >> kernel`newnfs_request+0x631 > >> kernel`nfscl_request+0x75 > >> kernel`nfsrpc_getattr+0xbe > >> kernel`nfs_getattr+0x280 > >> kernel`VOP_GETATTR_APV+0x74 > >> kernel`nfs_lookup+0x3cc > >> kernel`VOP_LOOKUP_APV+0x74 > >> kernel`lookup+0x69e > >> kernel`namei+0x6df > >> kernel`vn_open_cred+0x330 > >> kernel`vn_open+0x1c > >> kernel`kern_openat+0x207 > >> kernel`kern_open+0x19 > >> kernel`sys_open+0x18 > >> kernel`amd64_syscall+0x3bf > >> kernel`0xffffffff80784947 > >> 2512 > >> > >> kernel`newnfs_request+0x631 > >> kernel`nfscl_request+0x75 > >> kernel`nfsrpc_getattr+0xbe > >> kernel`nfs_close+0x3e9 > >> kernel`VOP_CLOSE_APV+0x74 > >> kernel`vn_close+0xee > >> kernel`vn_closefile+0xff > >> kernel`_fdrop+0x3a > >> kernel`closef+0x332 > >> kernel`kern_close+0x183 > >> kernel`sys_close+0xb > >> kernel`amd64_syscall+0x3bf > >> kernel`0xffffffff80784947 > >> 2530 > >> > >> I had a look at the source of nfs_close and could not find a call > >> to > >> nfsrpc_getattr, and I am wondering why close would be calling > >> getattr > >> anyway. > >> If the file is closed what do we care about it's attributes.... > >> > > Here are some random statements w.r.t. NFSv3 vs NFSv4 that might > > help > > with an understanding of what is going on. I do address the specific > > case of nfs_close() towards the end. (It is kinda long winded, but I > > threw out eveything I could think of..) > > > > NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and > > Close operations. > > > > In NFSv3, each RPC is defined and usually includes attributes for > > files > > before and after the operation (implicit getattrs not counted in the > > RPC > > counts reported by nfsstat). > > > > For NFSv4, every RPC is a compound built up of a list of Operations > > like > > Getattr. Since the NFSv4 server doesn't know what the compound is > > doing, > > nfsstat reports the counts of Operations for the NFSv4 server, so > > the counts > > will be much higher than with NFSv3, but do not reflect the number > > of RPCs being done. > > To get NFSv4 nfsstat output that can be compared to NFSv3, you need > > to > > do the command on the client(s) and it still is only roughly the > > same. > > (I just realized this should be documented in man nfsstat.) > > > I ran nfsstat -s -v 4 on the server and saw the number of requests > being done. > They were in the order of a few thousand per second for a single > FreeBSD 9.1 client > doing a make build world. > Yes, but as I noted above, for NFSv4, these are counts of operations, not RPCs. Each RPC in NFSv4 consists of several operations. For example, for read it is something like: - PutFH, Read, Getattr As such, you need to do "nfsstat -e -c" on the client in order to see how many RPCs are happening. > > For the FreeBSD NFSv4 client, the compounds include Getattr > > operations > > similar to what NFSv3 does. It doesn't do a Getattr on the directory > > for Lookup, because that would have made the compound much more > > complex. > > I don't think this will have a significant performance impact, but > > will > > result in some additional Getattr RPCs. > > > I ran snoop on port 2049 on the server and I saw a large number of > lookups. > A lot of them seem to be for directories which are part of the > filenames of > the compiler and include files which on the nfs mounted /usr/obj. > The same names keep reappering so it looks like there is no caching > being done on > the client. > Well, the name caching code is virtually identical to what is used for NFSv3 and I have compared RPC counts (using client stats) in the past (some while ago), to see if they are comparable. A name cache entry (like everything in NFS caching) is only valid for some amount of time (there are mount options for adjusting the cache timeouts). Now, I`m not saying it isn`t broken. I`ll take a look when I get home and it is also rather hard to tell when it is broken. Since NFS has no cache coherency protocol any amount of caching can break correctness when files are being modified by another client. The longer you cache, the more likely you are to see a breakage. > > I suspect the slowness is caused by the extra overhead of doing the > > Open/Close operations against the server. The only way to avoid > > doing > > these against the server for NFSv4 is to enable delegations in both > > client and server. How to do this is documented in "man nfsv4". > > Basically > > starting up the nfscbd in the client and setting: > > vfs.nfsd.issue_delegations=1 > > in the server. > > > > Specifically for nfs_close(), the attributes (modify time) > > is used for what is called "close to open consistency". This can be > > disabled by the "nocto" mount option, if you don't need it for your > > build environment. (You only need it if one client is writing a file > > and then another client is reading the same file.) > > > I tried the nocto option in /etc/fstab but it does not show when mount > shows > the mounted filesystems so I am not sure if it is being used. Head (and I think stable9) is patched so that ``nfsstat -m`` shows all the options actually being used. For 9.1, you just have to trust that it has been set. > On the server netstat shows an active connection to port 7745 on the > client > but snoop shows no data flowing on that session. > That connection is the callback path and is only used when delegations are in use. You can check to see if the server is issuing delegations via ``nfsstat -e -s`` on the server and looking at the delegation count, to see if it is greater than 0. Remember that you must enable delegations on the server by setting the sysctl: vfs.nfsd.issue_delegations=1 > > Both the attribute caching and close to open consistency algorithms > > in the client are essentially the same for NFSv3 vs NFSv4. > > > > The NFSv4 Close operation(s) are actually done when the v_usecount > > for > > the vnode goes to 0, since mmap'd files can do I/O on pages after > > the close syscall. As such, they are only loosely related to the > > close > > syscall. They are actually closing Windows style Openlock(s). > > > I had a look at the code of the NFS v4 client of Illumos ( which is > basically what > my server is running ) and as far as I understand it they only do the > gettatr only when > the close was for a file that was opened for write and when there was > actually something > written to the file. > The FreeBSD code seems to do the getattr for all close() calls. > For files that were never written, like executables or source files > that seems > to cause quite a lot of overhead. > Well, what would happen for the Illumos client if another client had just written to the file and closed it just before Illumos opens it for reading. For cto, the client needs to see an up to date modify time for the close to open consistency check to work, including opens for reading. As hinted at above, there is no correct answer to these questions. It is all about correctness vs caching for better performance. (That`s why jhb added the nocto option to turn it off if you don`t need this to work correctly.) > > You mention that you see the same file over and over in a packet > > trace. > > You don't give specifics, but I'd suggest that you look at both > > NFSv3 > > and NFSv4 for this (and file names are in lookups, not getattrs). > > > > I'd suggest you try enabling delegations in both client and server, > > plus > > trying the "nocto" mount option and see if that helps. > > > Tried it but it does not seem to make any noticable difference. > > I tried a make buildworld buildkernel with /usr/obj a local FS in the > Vbox VM > that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem > it takes > about a day. A twelve fold increase is elapsed time makes using NFSv4 > unusable > for this use case. Source builds on NFS mounts are notoriously slow. A big part of this is the synchronous writes that get done because there is only one dirty byte range for a block and the loader loves to write small non-contiguous areas of its output file. I have a patch that extends this to a list of dirty byte ranges, but it has not been committed to head. I should try and get back to it this summer. > Too bad the server hangs when I use nfsv3 mount for /usr/obj. Try this mount command: mount -t nfs -o nfsv3,nolockd ... (I do builds of the src tree NFS mounted, so the only reason I can think that it would hang would be a rpc.lockd issue.) If this works, I suspect it will still be slow, but it would be nice to find out how much slower NFSv4 is for your case. rick > Having a shared /usr/obj makes it possible to run a make buildworld on > a single VM > and just run make installworld on the others. > > Paul > > > rick > > > >> > >> Paul > >> > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to > >> "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Mon Apr 15 06:24:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EC179899 for ; Mon, 15 Apr 2013 06:24:43 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh2-ve2.go2.pl (moh2-ve2.go2.pl [193.17.41.200]) by mx1.freebsd.org (Postfix) with ESMTP id 682846B7 for ; Mon, 15 Apr 2013 06:24:43 +0000 (UTC) Received: from moh2-ve2.go2.pl (unknown [10.0.0.200]) by moh2-ve2.go2.pl (Postfix) with ESMTP id CA82FB010E1 for ; Mon, 15 Apr 2013 08:24:26 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh2-ve2.go2.pl (Postfix) with SMTP for ; Mon, 15 Apr 2013 08:24:26 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IbXflU; Mon, 15 Apr 2013 08:24:26 +0200 Message-ID: <516B9D19.2030909@o2.pl> Date: Mon, 15 Apr 2013 08:24:25 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan> <20130414195211.GA39201@icarus.home.lan> <516B1315.8060408@o2.pl> <20130414212440.GA40325@icarus.home.lan> In-Reply-To: <20130414212440.GA40325@icarus.home.lan> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 35 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Apr 2013 06:24:44 -0000 On 14/04/2013 23:24, Jeremy Chadwick wrote: > On Sun, Apr 14, 2013 at 10:35:33PM +0200, Radio m?odych bandytw wrote: >> On 14/04/2013 21:52, Jeremy Chadwick wrote: >>> {snipping lots for brevity} >>> >>> On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote: >>>> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote: >>>>> Sorry. I thought just the error was important. So here you are: >>>>> dmesg.boot: >>>>> http://pastebin.com/LFXPusMX >>>> >>>> Thank you. Please read everything I have written below before doing >>>> anything. >>>> >>>> Based on this output, we can see the following: >>>> >>>> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700 >>>> controller: >>>> >>>> ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0 >>>> >>>> * The system has 3 disks attached to this controller: >>>> >>>> ada0 at ahcich0 bus 0 scbus2 target 0 lun 0 >>>> ada0: ATA-8 SATA 2.x device >>>> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>>> ada0: Command Queueing enabled >>>> ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) >>>> ada1 at ata0 bus 0 scbus6 target 0 lun 0 >>>> ada1: ATA-8 SATA 2.x device >>>> ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) >>>> ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) >>>> ada2 at ata0 bus 0 scbus6 target 1 lun 0 >>>> ada2: ATA-8 SATA 2.x device >>>> ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) >>>> ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) >>>> >>>> Let's talk about ada0 and ada1 first. >>> >>> Hold up a minute -- I just noticed some key information here (see what >>> happens with big conflated threads?), and it sheds some light on my >>> concerns with AHCI vs. classic ata(4): >>> >>> ada0 -- attached to ahcich0 >>> ada1 -- attached to ata0 (presumably a "master" drive) >>> ada2 -- attached to ata0 (presumably a "slave" drive) >>> >>> This is extremely confusing, because ata0 is a classic ATA controller (I >>> can even tell from the classic ISA I/O port ranges): >>> >>> atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 >>> ata0: at channel 0 on atapci1 >>> ata1: at channel 1 on atapci1 >>> >>> Yet the WD15EARS and ST3640323AS drives are physically SATA drives. >>> >>> Are you using SATA-to-IDE adapters on these two drives? >> No. >>> >>> If not, this seems to indicate the motherboard and/or SATA controller >>> is actually only binding 1 disk to AHCI, while the others are bound to >>> the same controller operating in (possibly) "SATA Enhanced" mode. >>> >>> This would be the first I've ever seen of this (a controller operating >>> in both modes simultaneously), but I have a lot more experience with >>> Intel SATA controllers than I do AMD. >>> >>> I don't know why a system would do this, unless all of this can be >>> controlled via the BIOS somehow. What a mess. >>> >> I looked into BIOS and it can be controlled. 6 ports are divided into 2 >> triples and I can switch mode of each triple independently. One drive is >> connected to one and two to the other. >> Looks like there's a bug because both triples are set to ATA. >> I left them like that for now. > > What exact motherboard model is this? I'd like to review the manual. > >> Anyway, I got the hang again, so I can provide dmesg. I was not at the >> computer when it happened, so there's only the last screen though... >> pastebin.com/bjYtzPgs > > Thank you. Sadly the log snippet doesn't have timestamps but this is > what transpired: > > The log snippet you showed indicates the following: > > * An NCQ-based write CDB (WRITE_FPDMA_QUEUED) was issued to the > ada0 drive attached to channel ahcich0 of controller ahci0, and > the disk or controller did not respond within 30 seconds (I'm > assuming PC-BSD did not change kern.cam.ada.default_timeout from > the default of 30 seconds) Checked: default. > > * The same request was resubmit to the controller (CAM will try > submission of a CDB up to 5 times (i.e. 4 retries), which is > controlled with kern.cam.ada.retry_count). > Checked: default. > * The AHCI controller (rather the specific channel of the AHCI > controller) also reported that the underlying disk/device was > not responding (re: "Timeout on slot X port X"). I see no > SERR condition. > > * An ATA_IDENTIFY CDB was issued to the ada0 drive attached to > channel ahcich0 of controller ahci0, and this also timed out > after 30 seconds. My gut feeling is that this system is > running smartd(8); it's possible the kernel itself could submit > the CDB to the drive, but in this condition/state I don't know > why it'd do that. Nope, smartd doesn't run. > > * Rinse lather repeat. > > To me, at first glance, this looks like the ada0 disk is going > catatonic. The controller itself seems to be responding fine, just that > the disk attached to ahcich0 is locking up hard. I see no sign of an > AHCI reset ("AHCI reset..." message) either. > > So why does your system "hang" (meaning why can't you log in, why do > applications stop working, etc.) when this happens? Simple: > > You're using ZFS for your root filesystem, as shown here: > > Trying to mount root from zfs:tank1/ROOT/default []... > > Your ZFS pool called tank1 consists of a raidz1 pool of 3 devices > (more specifically partitions): ada0, ada1, and something that is > missing. Recap: > > http://pastebin.com/D3Av7x9X > > The pool is already degraded, and as you know, raidz1 can only suffer up > to loss of one vdev (in this case a disk) before ZFS will begin behaving > based on what the pool's "failmode" property is. > > In effect, when this happens, you're down to only 1 disk: ada1, and > that's not sufficient. So ZFS does exactly what it should (with > failmode=wait, the default): it waits indefinitely, hoping that things > recover. > > Because this is your root filesystem, as well as tons of other > filesystems (including /usr, /var, /var/log, and so on): > > http://pastebin.com/4sT37VqZ > > ...any I/O submit to filesystems part of pool tank1 will indefinitely > block/wait until things recover. Except they don't recover (and that > isn't the fault of ZFS). > > I imagine if you let the system sit for roughly 5*30 seconds (see above > for how I calculated that), you would eventually see a message on the > console that looks something like this: > > (ada0:ahcich0:0:0:0): lost device > (ada0:ahcich0:0:0:0): removing device entry I don't think so. I'm nearly sure it took me longer than that to write the errors down alone. And when I discovered the system lockup it was in such state already. The next time I can take precise time measurements. > > So, the crux of your problem is: > > 1. Your disks fall off the bus for reasons unknown at this time -- I'm > still waiting on smartctl -x output for each of your disks. Your disks > themselves may actually be okay (I need to review the output to > determine that) and the issue may be with SATA cabling, a faulty PSU, or > a completely broken SATA controller or motherboard (bad traces, etc.). > I am not going to help diagnose those problems, because the only > reliable method is to start replacing each part, piece by piece, and see > if the issue goes away. > > 2. Your array is already degraded/broken yet you don't care to fix it. > If the array was in decent shape and ada0 fell off the bus, things would > still work because ada1 and ada2 would be functioning (re: raidz1). > > If you were using UFS instead of ZFS for your root filesystem, you would > still have the same issue, just that the system would kernel panic. You > can induce that behaviour with ZFS as well using failmode=panic. > > There isn't much more for me to say. Everything is behaving how it's > designed, from what I can tell. > > When you lose your root filesystem, you really can't expect the system > to be in some "magical usable state". > The disk is out of the array because I didn't put it back after RMA. I RMA'd it because it used to cause precisely this kind of lockups on a non-degraded array. And I've seen it in the installer running from an entirely different device too. I guess that after reproducing the issue and taking time measurements, I should put the RMA'd drive back. I expect the problem to keep happening. -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Mon Apr 15 07:11:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6FBD7EC0 for ; Mon, 15 Apr 2013 07:11:53 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-vc0-f171.google.com (mail-vc0-f171.google.com [209.85.220.171]) by mx1.freebsd.org (Postfix) with ESMTP id 323C18B0 for ; Mon, 15 Apr 2013 07:11:52 +0000 (UTC) Received: by mail-vc0-f171.google.com with SMTP id ha12so3571713vcb.30 for ; Mon, 15 Apr 2013 00:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=afVyt8kwLlcc4OxZUOw3rc18xJVOOYIy0xAvUE8/+Ac=; b=ueLxWq5yaENF4yuT6LpaMFd3WTrPNlbiNFjEf79t/draMPz9qqP2B8Exao1MlIR07e QYP6RRCza7L56KamZzez7j5Hbrabiv8AChmV3B5hmU3M8hba6pillwx+kH/qWHYT8QhD fxHy9VL43bQg2W3TV8kFnRqDUf1W4HLWutvtv39HVy/kNpUhZfrt74JHbbnDF9h78I4d xbAyrLrlfwDPXPO70fskkqK+ilLZ37E8yw7Hk32bcV6DtFXrd7ppm6BmCobkRHbHKPLI OVBkTKiGyxWz/yF+hJl935NHdUZ/VumfcitU15wccSAsqLdrTOgL+kHsj9SLkQNDbtKN dZ4A== MIME-Version: 1.0 X-Received: by 10.52.75.8 with SMTP id y8mr12937656vdv.2.1366009912224; Mon, 15 Apr 2013 00:11:52 -0700 (PDT) Received: by 10.220.91.83 with HTTP; Mon, 15 Apr 2013 00:11:52 -0700 (PDT) In-Reply-To: <20130414194440.GB38338@icarus.home.lan> References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan> <20130414194440.GB38338@icarus.home.lan> Date: Mon, 15 Apr 2013 03:11:52 -0400 Message-ID: Subject: Re: A failed drive causes system to hang From: Zaphod Beeblebrox To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs , =?ISO-8859-1?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMQ/xT93?= , support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Apr 2013 07:11:53 -0000 If I was using my plain old UN*X mailer, I'd try to honor your request for a new thread (by editing the headers)... but I don't see any method by which google allows this. Anyways... rather than discuss my (admittedly vague) "me too" on the drive issue, I'd like to comment on the meta issue you raise. On Sun, Apr 14, 2013 at 3:44 PM, Jeremy Chadwick wrote: > > There is already too much crap going on in this thread with 4 different > people with what are 4 different issues, and nobody at this point is > able to keep track of it all (including the participants). > > This situation happens way, WAY too often with storage-related matters > on the list. ANYTHING ZFS-related and ANYTHING storage-related results > in bandwagon-jumping and threads that spiral out of control/become > almost useless and certainly impossible to follow. It needs to stop. > I think what's happening here is that the whole storage subsystem is (at this point) good enough that people who have problems are encountering fairly obscure but serious corner cases... but that since there isn't much hardware advice from core anymore, it's assumed by the sufferers that these issues must conflate since general experience leaves us thinking there are very few issues. When I say hardware advice... many common list readers might pick up on hardware opinions dropped here but it's easy to miss them and they remain uncollected. Worse, when software workarounds and/or fixed hardware revisions occur, there is again no reflection. Some driver man pages make some statements about hardware capabilities... but other hardware has none. ... and since I'm saying this, I'll volunteer... We need for each class of hardware a simple table of information. As an example, the columns for block storage might be: - chipset (list) - driver (name) - hot swap (y/n) - known to hang on drive failures (y/n) - pmp (y/n, 1:n) - queuing (type) - block sizes (512, 4k, ...) - relative performance (cpu heavy, scatter-gather, etc) - memory support (32 bit, 64 bit, bounce buffers) - "recommended" Similar lists can easily be generated for NICs, motherboards, video (a particular mess) and whatnot. There isn't an incentive for a computer retailer to put together working hardware as lists of components could then easily be bought ... undercutting the margin --- it seems to me that knowledge inside the community needs to be fostered. So... what am I volunteering for? I would be happy to maintain a portion of the FreeBSD wiki with hardware information from components right up to systems in this form, but I would need input from the driver writers ... who are in the best position to know ... what works and what doesn't. From owner-freebsd-fs@FreeBSD.ORG Mon Apr 15 10:28:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2DA33E1D for ; Mon, 15 Apr 2013 10:28:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 9DFD91CE for ; Mon, 15 Apr 2013 10:28:39 +0000 (UTC) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3FASVmN005203 for ; Mon, 15 Apr 2013 20:28:31 +1000 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3FASITk007638 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 15 Apr 2013 20:28:20 +1000 Date: Mon, 15 Apr 2013 20:28:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? In-Reply-To: <1091296771.826148.1365989521302.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20130415184639.V1081@besplex.bde.org> References: <1091296771.826148.1365989521302.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Ov0XUFDt c=1 sm=1 a=xj4t0lYZ87oA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=r-ufCDgtFiYA:10 a=jyafP8MNAAAA:8 a=gzcLvKzMasgLP25ndJgA:9 a=CjuIK1q_8ugA:10 a=gmjzRuXrkl8A:10 a=nRcZRO9L01eZmAkF:21 a=88WQ7wBtLHMueRgm:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Apr 2013 10:28:40 -0000 On Sun, 14 Apr 2013, Rick Macklem wrote: > Paul van der Zwan wrote: >> On 14 Apr 2013, at 5:00 , Rick Macklem wrote: >> >> Thanks for taking the effort to send such an extensive reply. >> >>> Paul van der Zwan wrote: >>>> On 12 Apr 2013, at 16:28 , Paul van der Zwan >>>> wrote: > ... >>> In NFSv3, each RPC is defined and usually includes attributes for >>> files >>> before and after the operation (implicit getattrs not counted in the >>> RPC >>> counts reported by nfsstat). >>> >>> For NFSv4, every RPC is a compound built up of a list of Operations >>> like >>> Getattr. Since the NFSv4 server doesn't know what the compound is >>> doing, >>> nfsstat reports the counts of Operations for the NFSv4 server, so >>> the counts >>> will be much higher than with NFSv3, but do not reflect the number >>> of RPCs being done. >>> To get NFSv4 nfsstat output that can be compared to NFSv3, you need >>> to >>> do the command on the client(s) and it still is only roughly the >>> same. >>> (I just realized this should be documented in man nfsstat.) >>> >> I ran nfsstat -s -v 4 on the server and saw the number of requests >> being done. >> They were in the order of a few thousand per second for a single >> FreeBSD 9.1 client >> doing a make build world. >> > Yes, but as I noted above, for NFSv4, these are counts of operations, > not RPCs. Each RPC in NFSv4 consists of several operations. For example, > for read it is something like: > - PutFH, Read, Getattr > > As such, you need to do "nfsstat -e -c" on the client in order to > see how many RPCs are happening. Does it show the number of physical RPC or only "roughly the same"? >>> For the FreeBSD NFSv4 client, the compounds include Getattr >>> operations >>> similar to what NFSv3 does. It doesn't do a Getattr on the directory >>> for Lookup, because that would have made the compound much more >>> complex. >>> I don't think this will have a significant performance impact, but >>> will >>> result in some additional Getattr RPCs. >>> >> I ran snoop on port 2049 on the server and I saw a large number of >> lookups. >> A lot of them seem to be for directories which are part of the >> filenames of >> the compiler and include files which on the nfs mounted /usr/obj. >> The same names keep reappering so it looks like there is no caching >> being done on >> the client. When I worked on this in ~2007, unnecessary RPCs for lookup was a large cause of slowness. This was fixed in at least nfsv3. Almost all RPCs for makeworld (closer to 99% than 90%) should now be for open of the excessively layered and polluted include files, since they are opened so often compared with other files and every open goes to the server (except "nocto" should fix this). There are lots of lookups for the include files too, but the lookups are properly cached. >> I tried the nocto option in /etc/fstab but it does not show when mount >> shows >> the mounted filesystems so I am not sure if it is being used. > Head (and I think stable9) is patched so that ``nfsstat -m`` shows > all the options actually being used. For 9.1, you just have to trust > that it has been set. This doesn't work on ref10-amd64 running 10.0-CURRENT Apr 5. nfsstat -m gives null output. Plain nfsstat confirms that there are some nfs mounts, with so much activity on them that man of the cache counts are negative after 9 days of uptime. > ... >> I tried a make buildworld buildkernel with /usr/obj a local FS in the >> Vbox VM >> that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem >> it takes >> about a day. A twelve fold increase is elapsed time makes using NFSv4 >> unusable >> for this use case. That is extremely slow. Here I am unhappy with the makeworld time over nfs staying about 13 minutes despite attempts to improve this, but I only have old slow hardware (2 core 2GHz Turion laptop). I also have a modified FreeBSD-5, which avoids some of the bloat in -current. My best time without excessive tuning was: @ -------------------------------------------------------------- @ >>> make world completed on Fri Nov 2 23:35:11 EST 2007 @ (started Fri Nov 2 23:21:27 EST 2007) @ -------------------------------------------------------------- @ 823.53 real 1295.80 user 192.46 sys @ @ Lookup Read Access Fsstat Other Total @ 127134 23214 624060 24764 99 799271 The kernel was current at the time, but userland was ~5.2. Newer kernels (1-2 years old) are only a bit slower and don't require any modifications to get similar RPC counts (with Getattr.nstead of Access) /usr including /usr/bin and /usr/src was on nfs, but /bin and /usr/obj were local. Everything fits in RAM caches so there was no disk activity except for new reads and new writes. Network latency was tuned to 60 usec (min for ping). When nfs was pessimized, the above RPC counts blew out to no more than 2 million. Suppose you have 2 million RPCs with a latency of just 65 usec. That gives a latency of 130 seconds. Not too bad, but large compared with 823 seconds. They latency is amortized by having more than 1 CPU and/or building concurrently. Then progress can usually be made in some threads while others are blocked waiting for the RPCs. However, many networks have latencies much larger than 65 usec. On the freebsd cluster now, the min latency is about 250 usec, and since it it has multiple users the latency is sometimes over 1 msec. 2 million RPCs with a latency of 1 msec take 2000 seconds, which is a lot compared with a build time of 823 seconds. I consider "nocto" as excessive tuning, since although it would help makeworld benchmarks it is unsafe in general. Of course I tried my version of it in the above. (They above RPC counts are with the following critical modifications that weren't in FreeBSD at the time: - negative caching - fix for broken dotdot caching - fix for broken "cto". It did twice as many RPCs as needed.) Adding the equivalent of "nocto" reduced the RPC counts significantly, but only reduced the real time by about 20 (?) seconds. > Source builds on NFS mounts are notoriously slow. A big part of this is Only when misconfigured. The nfs build time in the above is between 5% and 10% slower than the local build time. > the synchronous writes that get done because there is only one dirty > byte range for a block and the loader loves to write small non-contiguous > areas of its output file. Writing to nfs would be slow, but I made /usr/obj local to avoid it. Also, in other (kernel build) tests where object files are written to the current directory which is on nfs, the non-separate object directory is mounted async on the server so it is fast enough. Now my reference is building a FreeBSD-4 kernel. My best times were: - 32+ seconds (src and obj on nfs, async, -j4) - 30- seconds (src and obj of ffs, async, -j4) - 64+ (?) seconds (src and obj on nfs, async, -j1) - 58 (?) seconds (src and obj on ffs, async, -j1) (/usr on nfs, /bin on ffs). Without parallelism, everything has to wait for the RPCs, and even with low network latency this costs 5-10%. >> Too bad the server hangs when I use nfsv3 mount for /usr/obj. > Try this mount command: > mount -t nfs -o nfsv3,nolockd ... > (I do builds of the src tree NFS mounted, so the only reason I can > think that it would hang would be a rpc.lockd issue.) > If this works, I suspect it will still be slow, but it would be nice to > find out how much slower NFSv4 is for your case. Needed to localize the slowness anyway. It might be just in the server. Bruce From owner-freebsd-fs@FreeBSD.ORG Mon Apr 15 11:06:43 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 65C98959 for ; Mon, 15 Apr 2013 11:06:43 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 4A5C8799 for ; Mon, 15 Apr 2013 11:06:43 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3FB6hxN015080 for ; Mon, 15 Apr 2013 11:06:43 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3FB6gaN015078 for freebsd-fs@FreeBSD.org; Mon, 15 Apr 2013 11:06:42 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 15 Apr 2013 11:06:42 GMT Message-Id: <201304151106.r3FB6gaN015078@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Apr 2013 11:06:43 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/177658 fs [ufs] FreeBSD panics after get full filesystem with uf o kern/177536 fs [zfs] zfs livelock (deadlock) with high write-to-disk o kern/177445 fs [hast] HAST panic o kern/177240 fs [zfs] zpool import failed with state UNAVAIL but all d o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/174060 fs [ext2fs] Ext2FS system crashes (buffer overflow?) o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172942 fs [smbfs] Unmounting a smb mount when the server became o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic f kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 305 problems total. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 02:19:32 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0CB062F8 for ; Tue, 16 Apr 2013 02:19:32 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9A5BA6CE for ; Tue, 16 Apr 2013 02:19:30 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAD20bFGDaFvO/2dsb2JhbABQhmy9VYEedIIfAQEFI1YbDgoCAg0ZAlkGE4gUqVOSYYEjjUA0B2OBS4ETA5M4g02REYJ+KSCBbA X-IronPort-AV: E=Sophos;i="4.87,480,1363147200"; d="scan'208";a="25925760" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 15 Apr 2013 22:19:23 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3B475B406C; Mon, 15 Apr 2013 22:19:23 -0400 (EDT) Date: Mon, 15 Apr 2013 22:19:23 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <1236177219.867591.1366078763224.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130415184639.V1081@besplex.bde.org> Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 02:19:32 -0000 Bruce Evans wrote: > On Sun, 14 Apr 2013, Rick Macklem wrote: > > > Paul van der Zwan wrote: > >> On 14 Apr 2013, at 5:00 , Rick Macklem > >> wrote: > >> > >> Thanks for taking the effort to send such an extensive reply. > >> > >>> Paul van der Zwan wrote: > >>>> On 12 Apr 2013, at 16:28 , Paul van der Zwan > >>>> > >>>> wrote: > > ... > >>> In NFSv3, each RPC is defined and usually includes attributes for > >>> files > >>> before and after the operation (implicit getattrs not counted in > >>> the > >>> RPC > >>> counts reported by nfsstat). > >>> > >>> For NFSv4, every RPC is a compound built up of a list of > >>> Operations > >>> like > >>> Getattr. Since the NFSv4 server doesn't know what the compound is > >>> doing, > >>> nfsstat reports the counts of Operations for the NFSv4 server, so > >>> the counts > >>> will be much higher than with NFSv3, but do not reflect the number > >>> of RPCs being done. > >>> To get NFSv4 nfsstat output that can be compared to NFSv3, you > >>> need > >>> to > >>> do the command on the client(s) and it still is only roughly the > >>> same. > >>> (I just realized this should be documented in man nfsstat.) > >>> > >> I ran nfsstat -s -v 4 on the server and saw the number of requests > >> being done. > >> They were in the order of a few thousand per second for a single > >> FreeBSD 9.1 client > >> doing a make build world. > >> > > Yes, but as I noted above, for NFSv4, these are counts of > > operations, > > not RPCs. Each RPC in NFSv4 consists of several operations. For > > example, > > for read it is something like: > > - PutFH, Read, Getattr > > > > As such, you need to do "nfsstat -e -c" on the client in order to > > see how many RPCs are happening. > > Does it show the number of physical RPC or only "roughly the same"? > Yes, for NFSv4, the client side counts are for the RPCs. The roughly referred to the fact that the NFSv4 compound doesn't do exactly the same thing as the NFSv3 RPC, although they tend to be very similar. > >>> For the FreeBSD NFSv4 client, the compounds include Getattr > >>> operations > >>> similar to what NFSv3 does. It doesn't do a Getattr on the > >>> directory > >>> for Lookup, because that would have made the compound much more > >>> complex. > >>> I don't think this will have a significant performance impact, but > >>> will > >>> result in some additional Getattr RPCs. > >>> > >> I ran snoop on port 2049 on the server and I saw a large number of > >> lookups. > >> A lot of them seem to be for directories which are part of the > >> filenames of > >> the compiler and include files which on the nfs mounted /usr/obj. > >> The same names keep reappering so it looks like there is no caching > >> being done on > >> the client. > > When I worked on this in ~2007, unnecessary RPCs for lookup was a > large cause of slowness. This was fixed in at least nfsv3. Almost > all RPCs for makeworld (closer to 99% than 90%) should now be for open > of the excessively layered and polluted include files, since they are > opened so often compared with other files and every open goes to the > server (except "nocto" should fix this). There are lots of lookups > for the include files too, but the lookups are properly cached. > > >> I tried the nocto option in /etc/fstab but it does not show when > >> mount > >> shows > >> the mounted filesystems so I am not sure if it is being used. > > Head (and I think stable9) is patched so that ``nfsstat -m`` shows > > all the options actually being used. For 9.1, you just have to trust > > that it has been set. > > This doesn't work on ref10-amd64 running 10.0-CURRENT Apr 5. nfsstat > -m > gives null output. Plain nfsstat confirms that there are some nfs > mounts, > with so much activity on them that man of the cache counts are > negative > after 9 days of uptime. > I both the kernel and nfsstat binary are Apr. 5, I think it should work. (It will only do the new/default NFS mounts, not oldnfs ones.) I'll take another look, in case something got missed for the commit. rick > > ... > >> I tried a make buildworld buildkernel with /usr/obj a local FS in > >> the > >> Vbox VM > >> that completed in about 2 hours. With /usr/obj on an NFS v4 > >> filesystem > >> it takes > >> about a day. A twelve fold increase is elapsed time makes using > >> NFSv4 > >> unusable > >> for this use case. > > That is extremely slow. Here I am unhappy with the makeworld time over > nfs staying about 13 minutes despite attempts to improve this, but I > only have old slow hardware (2 core 2GHz Turion laptop). I also have > a modified FreeBSD-5, which avoids some of the bloat in -current. My > best > time without excessive tuning was: > > @ -------------------------------------------------------------- > @ >>> make world completed on Fri Nov 2 23:35:11 EST 2007 > @ (started Fri Nov 2 23:21:27 EST 2007) > @ -------------------------------------------------------------- > @ 823.53 real 1295.80 user 192.46 sys > @ > @ Lookup Read Access Fsstat Other Total > @ 127134 23214 624060 24764 99 799271 > > The kernel was current at the time, but userland was ~5.2. Newer > kernels (1-2 years old) are only a bit slower and don't require any > modifications to get similar RPC counts (with Getattr.nstead of > Access) > /usr including /usr/bin and /usr/src was on nfs, but /bin and /usr/obj > were local. Everything fits in RAM caches so there was no disk > activity > except for new reads and new writes. Network latency was tuned to 60 > usec (min for ping). > > When nfs was pessimized, the above RPC counts blew out to no more than > 2 > million. Suppose you have 2 million RPCs with a latency of just 65 > usec. > That gives a latency of 130 seconds. Not too bad, but large compared > with > 823 seconds. They latency is amortized by having more than 1 CPU > and/or > building concurrently. Then progress can usually be made in some > threads > while others are blocked waiting for the RPCs. However, many networks > have latencies much larger than 65 usec. On the freebsd cluster now, > the > min latency is about 250 usec, and since it it has multiple users the > latency is sometimes over 1 msec. 2 million RPCs with a latency of 1 > msec > take 2000 seconds, which is a lot compared with a build time of 823 > seconds. > > I consider "nocto" as excessive tuning, since although it would help > makeworld benchmarks it is unsafe in general. Of course I tried my > version of it in the above. (They above RPC counts are with the > following > critical modifications that weren't in FreeBSD at the time: > - negative caching > - fix for broken dotdot caching > - fix for broken "cto". It did twice as many RPCs as needed.) > Adding the equivalent of "nocto" reduced the RPC counts significantly, > but only reduced the real time by about 20 (?) seconds. > > > Source builds on NFS mounts are notoriously slow. A big part of this > > is > > Only when misconfigured. The nfs build time in the above is between 5% > and 10% slower than the local build time. > > > the synchronous writes that get done because there is only one dirty > > byte range for a block and the loader loves to write small > > non-contiguous > > areas of its output file. > > Writing to nfs would be slow, but I made /usr/obj local to avoid it. > Also, > in other (kernel build) tests where object files are written to the > current > directory which is on nfs, the non-separate object directory is > mounted > async on the server so it is fast enough. Now my reference is building > a FreeBSD-4 kernel. My best times were: > - 32+ seconds (src and obj on nfs, async, -j4) > - 30- seconds (src and obj of ffs, async, -j4) > - 64+ (?) seconds (src and obj on nfs, async, -j1) > - 58 (?) seconds (src and obj on ffs, async, -j1) > (/usr on nfs, /bin on ffs). Without parallelism, everything has to > wait > for the RPCs, and even with low network latency this costs 5-10%. > > >> Too bad the server hangs when I use nfsv3 mount for /usr/obj. > > Try this mount command: > > mount -t nfs -o nfsv3,nolockd ... > > (I do builds of the src tree NFS mounted, so the only reason I can > > think that it would hang would be a rpc.lockd issue.) > > If this works, I suspect it will still be slow, but it would be nice > > to > > find out how much slower NFSv4 is for your case. > > Needed to localize the slowness anyway. It might be just in the > server. > > Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 05:59:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 068FEB8A for ; Tue, 16 Apr 2013 05:59:05 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 843DFE38 for ; Tue, 16 Apr 2013 05:59:03 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r3G5wudp038487 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 16 Apr 2013 08:58:57 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <516CE8A0.3070808@digsys.bg> Date: Tue, 16 Apr 2013 08:58:56 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130415 Thunderbird/17.0.5 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS-inly server and dedicated ZIL References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 05:59:05 -0000 On 12.04.13 12:44, Dmitry Morozovsky wrote: > No, this will be 8*SAS in 4 mirrored pairs + 2*SSD for mirrored ZIL and striped > l2arc. Like the following (this is from other machine, but similar in setup): > > > NAME STATE READ WRITE CKSUM > pn ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > da0 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > da2d ONLINE 0 0 0 > da3d ONLINE 0 0 0 > cache > da2e ONLINE 0 0 0 > da3e ONLINE 0 0 0 > > As already mentioned, multiple vdev zpool for root works just fine on FreeBSD. There is however one "restriction" -- all of the drives should be visible as BIOS drives at boot. If your BIOS (HBA SAS BIOS) supports that, then you should not expect problems. Just make sure you test it with all the drives, before you build the pool. On the other hand, using the same SSD for SLOG and L2ARC is not always good idea because those two have quite opposite requirements. Daniel From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 08:53:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BCDD3D47 for ; Tue, 16 Apr 2013 08:53:08 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 4A4C2C3D for ; Tue, 16 Apr 2013 08:53:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r3G8qjse059998; Tue, 16 Apr 2013 12:52:45 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 16 Apr 2013 12:52:45 +0400 (MSK) From: Dmitry Morozovsky To: Daniel Kalchev Subject: Re: ZFS-inly server and dedicated ZIL In-Reply-To: <516CE8A0.3070808@digsys.bg> Message-ID: References: <516CE8A0.3070808@digsys.bg> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 08:53:08 -0000 On Tue, 16 Apr 2013, Daniel Kalchev wrote: > As already mentioned, multiple vdev zpool for root works just fine on FreeBSD. Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does not allow to add ZIL to boot pool > There is however one "restriction" -- all of the drives should be visible as > BIOS drives at boot. If your BIOS (HBA SAS BIOS) supports that, then you > should not expect problems. > > Just make sure you test it with all the drives, before you build the pool. I think I'll drift to UFS mirrored /boot I suppose, as I did for several servers already > On the other hand, using the same SSD for SLOG and L2ARC is not always good > idea because those two have quite opposite requirements. I know, but price and drive count restrictions do apply as well :) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 11:02:20 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 06D7286A for ; Tue, 16 Apr 2013 11:02:20 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 34B5B1EA for ; Tue, 16 Apr 2013 11:02:18 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA14784; Tue, 16 Apr 2013 14:02:00 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <516D2FA8.2000901@FreeBSD.org> Date: Tue, 16 Apr 2013 14:02:00 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130404 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS-inly server and dedicated ZIL References: <516CE8A0.3070808@digsys.bg> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 11:02:20 -0000 on 16/04/2013 11:52 Dmitry Morozovsky said the following: > Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does > not allow to add ZIL to boot pool I think that there is no reason for that. And a trivial workaround was already offered to you. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 11:50:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 57C1588E for ; Tue, 16 Apr 2013 11:50:19 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-ie0-x22e.google.com (mail-ie0-x22e.google.com [IPv6:2607:f8b0:4001:c03::22e]) by mx1.freebsd.org (Postfix) with ESMTP id 26BCE5FB for ; Tue, 16 Apr 2013 11:50:19 +0000 (UTC) Received: by mail-ie0-f174.google.com with SMTP id 10so360260ied.5 for ; Tue, 16 Apr 2013 04:50:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer; bh=w9nf0PvL5XAHn75U94Gtue8TkhN573QnevsLQZbduUs=; b=WGexmsIGEkEw43lhR6fhVkjPmxupkUozaHEGyjmxt4wbTAPARrBEGQ0m0c0nCgGLgI BHgcePMjOjaqVSSWUOGomkc5fDvVvheY31hUjQUMuiWetCNc0Xd2irCMM0MPcs5mSRAI x7CX1fKPKa5rlTkFq+f4TL845FMuApiwIpDk4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=w9nf0PvL5XAHn75U94Gtue8TkhN573QnevsLQZbduUs=; b=JrIFBbPQXFkd2vapJxXMlADaOvK5/Vg23O0XxT8MJISjB26K0PBL2C9SZWPtD3MMNx +YuCdVmM1cyzYpz1DiW/en2jCcVF5fm32U8psUhKbReOxAEqMsgCZVTncoGoweVLb4/f tCk6x+g2TiLYe1XcUdo8ja2rY4Td5EJ5sqP3HKeMK4d5Wif8w0Q0tjiiSpV/tMA4l5Ys Y3ePA3NRAg3QkXKb0dyHxUZDnxLpnkO2ngldvLLlPvKJ4ooexf2UXmQxEhB6O6lkyBCV TJNn0X8w8vRZiARkv+1B2QecIA3t3eB0XdjNZhV4nL2IWhK3jglIWqd7HK8mBHcGhz3S 8zaw== X-Received: by 10.50.77.110 with SMTP id r14mr1017932igw.85.1366113017804; Tue, 16 Apr 2013 04:50:17 -0700 (PDT) Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132]) by mx.google.com with ESMTPS id ua6sm15157222igb.0.2013.04.16.04.50.15 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 16 Apr 2013 04:50:16 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Does sync(8) really flush everything? Lost writes with journaled SU after sync+power cycle From: Kevin Day In-Reply-To: <20130411160253.V1041@besplex.bde.org> Date: Tue, 16 Apr 2013 06:50:13 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com> <20130411160253.V1041@besplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.1503) X-Gm-Message-State: ALoCoQm4ncTb/06dkLrHUQi8miztqtF4RIy9vUKwtbyyUAnFaxfBbL3hLHF4Z4eLwXay8Z/65TIh Cc: "freebsd-fs@FreeBSD.org Filesystems" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 11:50:19 -0000 On Apr 11, 2013, at 1:30 AM, Bruce Evans wrote: > sync(2) only schedules all writing of all modified buffers to disk. = Its > man page even says this. It doesn't wait for any of the writes to = complete. A very kind person has pointed out to me (off-list) that doing: mount -u -o ro / (without -f) causes mount to force a flush, waits for completion, THEN bails out = because there are open files preventing the read-only downgrade. We've = been testing this here and it seems to be a usable workaround. I'm also pointing out for at least our purposes, this problem (sync(2) = doesn't seem to actually cause any writes) only seems to be causing lost = directories if I'm using journaling. I'm attempting to narrow down why = journaling appears to make sync into a no-op.=20 -- Kevin From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 12:05:55 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C1042F34; Tue, 16 Apr 2013 12:05:55 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 48960758; Tue, 16 Apr 2013 12:05:54 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r3GC5o2D077669; Tue, 16 Apr 2013 16:05:50 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 16 Apr 2013 16:05:50 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: ZFS-inly server and dedicated ZIL In-Reply-To: <516D2FA8.2000901@FreeBSD.org> Message-ID: References: <516CE8A0.3070808@digsys.bg> <516D2FA8.2000901@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 12:05:55 -0000 On Tue, 16 Apr 2013, Andriy Gapon wrote: > on 16/04/2013 11:52 Dmitry Morozovsky said the following: > > Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does > > not allow to add ZIL to boot pool > > I think that there is no reason for that. Escuse me, ECANTPARS ;) No reason for disallowing ZIL on boot pool -- or no reason to have such config? > And a trivial workaround was already offered to you. Possibly I've missed that, could you please point me? What I definitely do not want is dedicating pair of expensive SAS as system mirror -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 12:11:47 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2EFE91A9 for ; Tue, 16 Apr 2013 12:11:47 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 74AD77B3 for ; Tue, 16 Apr 2013 12:11:46 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA15427; Tue, 16 Apr 2013 15:11:44 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <516D3FFF.9010906@FreeBSD.org> Date: Tue, 16 Apr 2013 15:11:43 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130404 Thunderbird/17.0.5 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: ZFS-inly server and dedicated ZIL References: <516CE8A0.3070808@digsys.bg> <516D2FA8.2000901@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 12:11:47 -0000 on 16/04/2013 15:05 Dmitry Morozovsky said the following: > On Tue, 16 Apr 2013, Andriy Gapon wrote: > >> on 16/04/2013 11:52 Dmitry Morozovsky said the following: >>> Yes, but not pools with dedicated ZIL, as I cited from sources: zpool just does >>> not allow to add ZIL to boot pool >> >> I think that there is no reason for that. > > Escuse me, ECANTPARS ;) > > No reason for disallowing ZIL on boot pool -- or no reason to have such config? No reason for disallowing. I promise to axe the check when I get some time. >> And a trivial workaround was already offered to you. > > Possibly I've missed that, could you please point me? Отмотка треда - $100, старым друзьям бесплатно :-) It's here: http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/17669/focus=17759 Search for "You do this by". > What I definitely do not want is dedicating pair of expensive SAS as system > mirror It's possible use partitions / slices for a pool. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Apr 16 12:18:22 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CE0352C0; Tue, 16 Apr 2013 12:18:22 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 56EE1802; Tue, 16 Apr 2013 12:18:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r3GCILIg078721; Tue, 16 Apr 2013 16:18:21 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 16 Apr 2013 16:18:21 +0400 (MSK) From: Dmitry Morozovsky To: Andriy Gapon Subject: Re: ZFS-inly server and dedicated ZIL In-Reply-To: <516D3FFF.9010906@FreeBSD.org> Message-ID: References: <516CE8A0.3070808@digsys.bg> <516D2FA8.2000901@FreeBSD.org> <516D3FFF.9010906@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 12:18:22 -0000 On Tue, 16 Apr 2013, Andriy Gapon wrote: > > No reason for disallowing ZIL on boot pool -- or no reason to have such config? > > No reason for disallowing. I promise to axe the check when I get some time. Great, thank you. > >> And a trivial workaround was already offered to you. > > > > Possibly I've missed that, could you please point me? > > Отмотка треда - $100, старым друзьям бесплатно :-) ;-P > It's here: > http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/17669/focus=17759 > Search for "You do this by". Wow. I actually *did* missed that. Will try. -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 06:33:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1516C91E for ; Wed, 17 Apr 2013 06:33:22 +0000 (UTC) (envelope-from slovichon@gmail.com) Received: from mail-ve0-f170.google.com (mail-ve0-f170.google.com [209.85.128.170]) by mx1.freebsd.org (Postfix) with ESMTP id CC8A7E18 for ; Wed, 17 Apr 2013 06:33:21 +0000 (UTC) Received: by mail-ve0-f170.google.com with SMTP id 14so1153423vea.1 for ; Tue, 16 Apr 2013 23:33:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:date:from:to:subject:message-id:mime-version :content-type:content-disposition; bh=SA/ZOtFQysbww8IwLBrR80YRkt083yYyFTCjkqMks3U=; b=Q1hNDSshWyVOOI2xfJNL+efbD+weWbnY7TCTqKIm43bCBsw+hjhwAKe9ZiMRy+xBWz BedF2dqb8m8f7wqvhtjN4p5SQHtYkWb2GuUp7nUNLctjAyLSEiYH0nduucBhKQhUP7az 79pwO345B+RL5r1WAt4TqlUN1KaWEZEzUdZbGXfl1xCJwowsHCt0bEEhEHum8PaCPeXJ aazGJtp4QgYDy/r1LCeOO+5KpKbBUXNOPZaXRARnBMxd113TgZbDMQY1xByeGZsg0l/y M8ftlkht2iLKgd12NuZ0fuOxV65e8xr7FcyST1D1PXWQ1NxCJOR+bLzyCa2xArBF4FCY mQIQ== X-Received: by 10.58.224.101 with SMTP id rb5mr3941730vec.17.1366180401166; Tue, 16 Apr 2013 23:33:21 -0700 (PDT) Received: from localhost (c-24-131-65-84.hsd1.pa.comcast.net. [24.131.65.84]) by mx.google.com with ESMTPS id j5sm4645802vdv.13.2013.04.16.23.33.19 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 16 Apr 2013 23:33:20 -0700 (PDT) Date: Wed, 17 Apr 2013 02:33:18 -0400 From: Jared Yanovich To: freebsd-fs@freebsd.org Subject: nfs client readdir eofflag Message-ID: <20130417063318.GK14599@nightderanger.bender.mtx> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W2ydbIOJmkm74tJ2" Content-Disposition: inline X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 06:33:22 -0000 --W2ydbIOJmkm74tJ2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, is there a reason why eofflag isn't set in nfsclient readdir()? This now allows union mounts to work for NFS above NFS. =20 /sys/fs/nfsclient Index: nfs_clvnops.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- nfs_clvnops.c (revision 249568) +++ nfs_clvnops.c (working copy) @@ -2221,6 +2221,7 @@ !NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) { mtx_unlock(&np->n_mtx); NFSINCRGLOBAL(newnfsstats.direofcache_hits); + *ap->a_eofflag =3D 1; return (0); } else mtx_unlock(&np->n_mtx); @@ -2233,8 +2234,10 @@ tresid =3D uio->uio_resid; error =3D ncl_bioread(vp, uio, 0, ap->a_cred); =20 - if (!error && uio->uio_resid =3D=3D tresid) + if (!error && uio->uio_resid =3D=3D tresid) { NFSINCRGLOBAL(newnfsstats.direofcache_misses); + *ap->a_eofflag =3D 1; + } return (error); } =20 --W2ydbIOJmkm74tJ2 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (OpenBSD) iQEcBAEBAgAGBQJRbkIuAAoJEPT+vgUENeYM0VMH+gI9DO9ReGDzWMA0gnC9clq8 NSy7rTvGCZ+0/BqAJ1e+COZLfrxX70GarigQNMkLKG1mrGv/lXzSFbE/KgXXZYNJ lCPl/Cw2WyIobfNgXlbq4tFVZFmz3Lg1VRT8RezxyGeFruxI1aEtGP5ox+moImEu +Qf2UdVP3R6sHbvT/ktxp98kwGH7r8rD3eg3J5H27SVSsQTa3QPNytaPliY4boI9 PtnS7iZ8s8MN5d9PXuXHAciWOyztMQqcniUzJ+EtbhcjS/68MuB1mj+UqwDHnPXL LSBlcUPg7rUAk/oxQ0PeeRBnyxFUP/QwqDa3/LGMDursuVm45Zmt/bTh7r+YQlc= =QQUV -----END PGP SIGNATURE----- --W2ydbIOJmkm74tJ2-- From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 18:05:09 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 71D3642B for ; Wed, 17 Apr 2013 18:05:09 +0000 (UTC) (envelope-from zaphod@berentweb.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id 5A93EA1F for ; Wed, 17 Apr 2013 18:05:08 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1USWjD-0001X1-RF for freebsd-fs@freebsd.org; Wed, 17 Apr 2013 11:05:07 -0700 Date: Wed, 17 Apr 2013 11:05:07 -0700 (PDT) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1366221907838-5804517.post@n5.nabble.com> Subject: [ZFS] recover destroyed zpool with ZDB MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 18:05:09 -0000 I destroyed my zpool but forgot to take the tar backup of /home folder. This was a single-HDD pool and I first did 'zpool destroy' then 'gpart destroy' before realizing my error. Since then, I have manually re-created the GPT partitions to the size they were (testdisk did not correctly identify the geom) and there have been no writes to the HDD. After a lengthly discussion here: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-what-are-the-available-options-td5800299.html and getting no result with: # zpool import -D -f -R /bsdr -N -F -X 12018916494219117471 rescue => cannot import 'bsdr' as 'rescue': no such pool or dataset. Destroy and re-create the pool from a backup source. I sent an email to an expert and was advised to look into zdb and the -F & -X flags. Good news and bad news there. '# zdb -e -F 12018916494219117471' gives a lot of output but this is conflicting because although there are no errors, %used is showing zero: Traversing all blocks to verify checksums and verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 43 bp logical: 357888 avg: 8322 bp physical: 36352 avg: 845 compression: 9.85 bp allocated: 93184 avg: 2167 compression: 3.84 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 93184 used: 0.00% The zdb -F command is giving the internal info for the zpool but it is not importing it, nor does it change the status to importable. What can I read or change in the zdb command to get this to come online? The zdb output is available as a link if needed. Thanks and regards. ----- 10-Current-amd64-using ccache-portstree merged with marcuscom.gnome3 & xorg.devel -- View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-with-ZDB-tp5804517.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 18:53:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C22DE752 for ; Wed, 17 Apr 2013 18:53:41 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) by mx1.freebsd.org (Postfix) with ESMTP id 62EEBEE4 for ; Wed, 17 Apr 2013 18:53:41 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id h11so808993wiv.7 for ; Wed, 17 Apr 2013 11:53:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=TApCRvhVSbH/UbSTa5Mv0Ukm6f0W3ooKESoWZZQJS1o=; b=Bj5gm4i4RvL+O9H/nND0vM0NSZ25VL3Bf51gMFv09s57dahihCfcAxWWcBiCkQdNzE jG7MIAKW8LKY6a8zhIOdSRS5xujzgFW244CGc9pN63K8EPPcXrWA+zvqM83qt85Byon9 VD7vU3bdcQqnyFeiuN4M41XjeyLFATvvNXTH43khjTCCIDRQmeA4R156CSjRFZgg3ZPc 67l+Y42UemA05IiAQoLinQP+J843kzzsU+bWcdEkybHV10m06DJGOc/VXSz/mlBM2Kct 102YxRdf/JdkQ7DXol75HW8g7ICVlYacXx8HgWYgC1qxQksmi7FIaWqHW+59wMPeR3wj kBIw== MIME-Version: 1.0 X-Received: by 10.194.176.165 with SMTP id cj5mr13527854wjc.37.1366224820552; Wed, 17 Apr 2013 11:53:40 -0700 (PDT) Received: by 10.194.242.101 with HTTP; Wed, 17 Apr 2013 11:53:40 -0700 (PDT) In-Reply-To: <1366221907838-5804517.post@n5.nabble.com> References: <1366221907838-5804517.post@n5.nabble.com> Date: Wed, 17 Apr 2013 13:53:40 -0500 Message-ID: Subject: Re: [ZFS] recover destroyed zpool with ZDB From: Adam Vande More To: Beeblebrox Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 18:53:41 -0000 On Wed, Apr 17, 2013 at 1:05 PM, Beeblebrox wrote: > I destroyed my zpool but forgot to take the tar backup of /home folder. > This > was a single-HDD pool and I first did 'zpool destroy' then 'gpart destroy' > before realizing my error. > > Since then, I have manually re-created the GPT partitions to the size they > were (testdisk did not correctly identify the geom) and there have been no > writes to the HDD. > > After a lengthly discussion here: > > http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-what-are-the-available-options-td5800299.html > and getting no result with: > # zpool import -D -f -R /bsdr -N -F -X 12018916494219117471 rescue => > cannot import 'bsdr' as 'rescue': no such pool or dataset. Destroy and > re-create the pool from a backup source. > > I sent an email to an expert and was advised to look into zdb and the -F & > -X flags. Good news and bad news there. '# zdb -e -F 12018916494219117471' > gives a lot of output but this is conflicting because although there are no > errors, %used is showing zero: > One thing is that you keep using zpool import -D when the pool isn't in a destroyed state. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 19:16:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BA6FD1D5 for ; Wed, 17 Apr 2013 19:16:21 +0000 (UTC) (envelope-from zaphod@berentweb.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id A12E7FFC for ; Wed, 17 Apr 2013 19:16:21 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1USXq8-0007ps-Kr for freebsd-fs@freebsd.org; Wed, 17 Apr 2013 12:16:20 -0700 Date: Wed, 17 Apr 2013 12:16:20 -0700 (PDT) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1366226180639-5804603.post@n5.nabble.com> In-Reply-To: References: <1366221907838-5804517.post@n5.nabble.com> Subject: [ZFS] recover destroyed zpool with ZDB MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 19:16:21 -0000 Hi, It's a long story by now and i was following volodymyr's suggestions. Anyway, 'zpool list' no-longer shows the bsdr pool at all after having ran # zdb -e -F 12018916494219117471 obviously, since the ada0p2 metadata was written into the zpool.cache file with the above command, and zpool list reads the cache file. Regards. ----- 10-Current-amd64-using ccache-portstree merged with marcuscom.gnome3 & xorg.devel -- View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-with-ZDB-tp5804517p5804603.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 19:32:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 490DA8DE for ; Wed, 17 Apr 2013 19:32:41 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-wg0-f43.google.com (mail-wg0-f43.google.com [74.125.82.43]) by mx1.freebsd.org (Postfix) with ESMTP id DC572172 for ; Wed, 17 Apr 2013 19:32:40 +0000 (UTC) Received: by mail-wg0-f43.google.com with SMTP id c11so1979672wgh.10 for ; Wed, 17 Apr 2013 12:32:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=SHJaDQYu0C1lHjYTDgkliGzC6BK+9yX8Fmw2/1gbyQs=; b=EGQIsxfw71ewrGE1OnielJYR/Q7MwVaFyrh8NSS1xGmOnNvDOBqtpT+k+V+trulbmB FLF9blj2S/qJzonzSvesSc+rAPYMll1GZcLEUZg9ojZf8MSeDytIzY3fccLJPuF4X9XI E1wv/we2GhVwouwBVYhSVXHnKq+KQqizRj4F5b7w2RoIQ0mdnp8nV1RrhMcWiMVkhOFf YuRNy4yBr0Z3dkGFAgWyVfBon4PdKaz9bVm3Nc28vddvkZ3Im4igIKEAY7/o2V2qRARm tXS0sWK8LjHj6q2q1DfJX643PnbVt9XZSS9zxvSW/5aBxxvIjWNwIKDPC7pw9RlNjO6C yHVQ== MIME-Version: 1.0 X-Received: by 10.194.5.4 with SMTP id o4mr7570099wjo.40.1366227154706; Wed, 17 Apr 2013 12:32:34 -0700 (PDT) Received: by 10.194.242.101 with HTTP; Wed, 17 Apr 2013 12:32:34 -0700 (PDT) In-Reply-To: <1366226180639-5804603.post@n5.nabble.com> References: <1366221907838-5804517.post@n5.nabble.com> <1366226180639-5804603.post@n5.nabble.com> Date: Wed, 17 Apr 2013 14:32:34 -0500 Message-ID: Subject: Re: [ZFS] recover destroyed zpool with ZDB From: Adam Vande More To: Beeblebrox Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 19:32:41 -0000 On Wed, Apr 17, 2013 at 2:16 PM, Beeblebrox wrote: > Hi, > It's a long story by now and i was following volodymyr's suggestions. > Anyway, 'zpool list' no-longer shows the bsdr pool at all after having ran > # zdb -e -F 12018916494219117471 > obviously, since the ada0p2 metadata was written into the zpool.cache file > with the above command, and zpool list reads the cache file. If you can get it back to faulted state, the official procedure is here: http://docs.oracle.com/cd/E19963-01/html/821-1448/gbbwl.html -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 21:38:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E64125B5 for ; Wed, 17 Apr 2013 21:38:11 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B17478E6 for ; Wed, 17 Apr 2013 21:38:11 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEANoVb1GDaFvO/2dsb2JhbABQhm26MYJqgRp0gh8BAQUjBFIbDgoCAg0ZAlkGiCeqeJJYgSONQwEzB4IzgRMDlwaRFIMnIIFs X-IronPort-AV: E=Sophos;i="4.87,496,1363147200"; d="scan'208";a="26243098" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 17 Apr 2013 17:37:59 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CB583B4045; Wed, 17 Apr 2013 17:37:59 -0400 (EDT) Date: Wed, 17 Apr 2013 17:37:59 -0400 (EDT) From: Rick Macklem To: Jared Yanovich Message-ID: <1761576953.936301.1366234679793.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130417063318.GK14599@nightderanger.bender.mtx> Subject: Re: nfs client readdir eofflag MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Linux)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 21:38:12 -0000 Jared Yanovich wrote: > Hi, is there a reason why eofflag isn't set in nfsclient readdir()? > > This now allows union mounts to work for NFS above NFS. > This patch looks ok to me. (I don't know, but my guess is that, since only the NFS server used eofflag for a long time, the code just didn't bother setting it.) If you aren't a src committer (I don't recognize your name), I will put testing/committing this patch on my "to do" list. (If you are a src committer, feel free to commit it.) Thanks for reporting this, rick > /sys/fs/nfsclient > > Index: nfs_clvnops.c > =================================================================== > --- nfs_clvnops.c (revision 249568) > +++ nfs_clvnops.c (working copy) > @@ -2221,6 +2221,7 @@ > !NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) { > mtx_unlock(&np->n_mtx); > NFSINCRGLOBAL(newnfsstats.direofcache_hits); > + *ap->a_eofflag = 1; > return (0); > } else > mtx_unlock(&np->n_mtx); > @@ -2233,8 +2234,10 @@ > tresid = uio->uio_resid; > error = ncl_bioread(vp, uio, 0, ap->a_cred); > > - if (!error && uio->uio_resid == tresid) > + if (!error && uio->uio_resid == tresid) { > NFSINCRGLOBAL(newnfsstats.direofcache_misses); > + *ap->a_eofflag = 1; > + } > return (error); > } From owner-freebsd-fs@FreeBSD.ORG Wed Apr 17 21:43:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AB3B37B8 for ; Wed, 17 Apr 2013 21:43:16 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-x231.google.com (mail-la0-x231.google.com [IPv6:2a00:1450:4010:c03::231]) by mx1.freebsd.org (Postfix) with ESMTP id 31E3092E for ; Wed, 17 Apr 2013 21:43:16 +0000 (UTC) Received: by mail-la0-f49.google.com with SMTP id fs13so1204250lab.36 for ; Wed, 17 Apr 2013 14:43:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:content-type:message-id:mime-version:subject:date :references:to:in-reply-to:x-mailer:x-gm-message-state; bh=zzqy0WXuz+U2GqljW1+Rix3X0YNycdeAfZ1lpxlQ9Dk=; b=ciT6k6eZ2V8w95ZYHjA8LyBj43eQ8VnXFkyDPluGMy9R20jlodIzMwr7GKzC6ic+3O Iv96Av57CxHuXAgqK04oQbC0hGVpMLpKbNIDe2z1gGWxo3fF46JbltrN5C4GdFkVT3Oz LCIqWQtcKLc3K8MXcLkmCguyQv/KdHG8vVzm3qahr1Qk7KSf/KjSGW4RHFVukrF+G//c WKx2fmJ4o14XhzGju/WRg6bxVrMqHDa8XPUrUryXoXksrk3SQuRFswMthz+jcbG/cKCQ +hdtVcI+Idnq1Io2LoSdPpN0jfP7qED5TfvqhSPGBVquguIWu+5D9R+T3HisOV7CcBno i2ww== X-Received: by 10.152.3.4 with SMTP id 4mr4411435lay.29.1366234994857; Wed, 17 Apr 2013 14:43:14 -0700 (PDT) Received: from grey.home.unixconn.com (h-75-17.a183.priv.bahnhof.se. [46.59.75.17]) by mx.google.com with ESMTPS id y9sm3301246lae.10.2013.04.17.14.43.13 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 17 Apr 2013 14:43:13 -0700 (PDT) From: mxb Message-Id: <2753912E-0B91-4F73-B956-9D558F16EEAE@alumni.chalmers.se> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: ZFS: ZIL device export/import Date: Wed, 17 Apr 2013 23:43:11 +0200 References: <5A2824CA-2A67-47FA-AB27-20C6EBD2C501@alumni.chalmers.se> <51699B8E.7050003@platinum.linux.pl> <2DE8AD5E-B84C-4D88-A242-EA30EA4A68FD@alumni.chalmers.se> <9EE9328B-40B1-4510-B404-242D0F2C7697@alumni.chalmers.se> To: "freebsd-fs@freebsd.org" In-Reply-To: <9EE9328B-40B1-4510-B404-242D0F2C7697@alumni.chalmers.se> X-Mailer: Apple Mail (2.1503) X-Gm-Message-State: ALoCoQmzQcwJKbfFRUFR+2d3+nD7u5N/FYki3cxnDhP/r8lYsxzw8LPfYp+GYxrnCod04OwxTthf Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2013 21:43:16 -0000 Thanks everyone, whom replied! //mxb On 14 apr 2013, at 12:01, mxb wrote: >=20 > Well, I'm trying to preclude any undesired effect in the whole setup, = as this is going to production. >=20 > SAS-link might not be a bottleneck here and I'm overreacting. >=20 > Locally ,on per HU basis, I have 6Gbit/s SAS/SATA. Both card and = disks(SSD) attached to it. > SAS Expander is also 6Gbit/s, attaching 10k RPM SAS mechanical disks = on JBOD. >=20 > I use Intel 520 SSD and Pulsar SSD in this setup. > ZIL resided locally on Intel SSD(per HU), but now will probably move = to Pulsar SSD(moved to JBOD as those disks have dual SAS/SATA link). = L2ARC resided on Pulsar (Pulsar was in each HU. eg. I have 2x Pulsar). >=20 > Looks like I have to re-design the whole setup, as of ZIL. >=20 > //mxb >=20 >=20 > On 13 apr 2013, at 22:51, Ronald Klop = wrote: >=20 >> I thought the idea of ZIL is a fast buffer before the write to slow = disk. Are you really sure the SAS expander is the bottleneck in the = system instead of the disks? >=20 From owner-freebsd-fs@FreeBSD.ORG Thu Apr 18 01:37:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 518D896B for ; Thu, 18 Apr 2013 01:37:00 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1BA641C2 for ; Thu, 18 Apr 2013 01:36:59 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAG9Nb1GDaFvO/2dsb2JhbABQgzyDMb0mgRZ0gh8BAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARwEh20GDKpyklmBI4xFfjQHgjOBEwOTOYEMgkGBI49xgycgMoEFNQ X-IronPort-AV: E=Sophos;i="4.87,496,1363147200"; d="scan'208";a="24406644" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 17 Apr 2013 21:36:52 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 80C41B403E; Wed, 17 Apr 2013 21:36:52 -0400 (EDT) Date: Wed, 17 Apr 2013 21:36:52 -0400 (EDT) From: Rick Macklem To: Paul van der Zwan Message-ID: <986577218.940691.1366249012504.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <495AEA10-9B8F-4A03-B706-79BF43539482@vanderzwan.org> Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Linux)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Apr 2013 01:37:00 -0000 Paul van der Zwan wrote: > On 12 Apr 2013, at 16:28 , Paul van der Zwan > wrote: > > > > > I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server > > and I noticed that make buildworld seem to take much longer > > when the clients mount /usr/src and /usr/obj over NFS V4 than when > > they use V3. > > Unfortunately I have to use V4 as a buildworld on V3 hangs the > > server completely... > > I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of > > a few thousand per second > > and if I snoop the traffic I see the same filenames appear over and > > over again. > > It looks like the client is not caching anything at all and using a > > server request everytime. > > I use the default mount options: > > 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls) > > 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls) > > 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls) > > > > > just fyi, on a kernel build test I just did, I am seeing a much larger number of Lookups for NFSv4 vs NFSv3. I'll post again if/when I come up with a fix, rick > I had a look with dtrace > $ sudo dtrace -n '::getattr:start { @[stack()]=count();}' > and it seems the vast majority of the calls to getattr are from open() > and close() system calls.: > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_getattr+0x280 > kernel`VOP_GETATTR_APV+0x74 > kernel`nfs_lookup+0x3cc > kernel`VOP_LOOKUP_APV+0x74 > kernel`lookup+0x69e > kernel`namei+0x6df > kernel`kern_execve+0x47a > kernel`sys_execve+0x43 > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 26 > > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_close+0x3e9 > kernel`VOP_CLOSE_APV+0x74 > kernel`kern_execve+0x15c5 > kernel`sys_execve+0x43 > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 26 > > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_getattr+0x280 > kernel`VOP_GETATTR_APV+0x74 > kernel`nfs_lookup+0x3cc > kernel`VOP_LOOKUP_APV+0x74 > kernel`lookup+0x69e > kernel`namei+0x6df > kernel`vn_open_cred+0x330 > kernel`vn_open+0x1c > kernel`kern_openat+0x207 > kernel`kern_open+0x19 > kernel`sys_open+0x18 > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 2512 > > kernel`newnfs_request+0x631 > kernel`nfscl_request+0x75 > kernel`nfsrpc_getattr+0xbe > kernel`nfs_close+0x3e9 > kernel`VOP_CLOSE_APV+0x74 > kernel`vn_close+0xee > kernel`vn_closefile+0xff > kernel`_fdrop+0x3a > kernel`closef+0x332 > kernel`kern_close+0x183 > kernel`sys_close+0xb > kernel`amd64_syscall+0x3bf > kernel`0xffffffff80784947 > 2530 > > I had a look at the source of nfs_close and could not find a call to > nfsrpc_getattr, and I am wondering why close would be calling getattr > anyway. > If the file is closed what do we care about it's attributes.... > > > Paul > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Apr 18 05:15:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4F47B93B for ; Thu, 18 Apr 2013 05:15:18 +0000 (UTC) (envelope-from zaphod@berentweb.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id 33924B0A for ; Thu, 18 Apr 2013 05:15:17 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1UShBl-0008UI-60 for freebsd-fs@freebsd.org; Wed, 17 Apr 2013 22:15:17 -0700 Date: Wed, 17 Apr 2013 22:15:17 -0700 (PDT) From: Beeblebrox To: freebsd-fs@freebsd.org Message-ID: <1366262117117-5804714.post@n5.nabble.com> In-Reply-To: References: <1366221907838-5804517.post@n5.nabble.com> <1366226180639-5804603.post@n5.nabble.com> Subject: [ZFS] recover destroyed zpool with ZDB MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Apr 2013 05:15:18 -0000 Thanks, but that document does not appear very relevant to my situation. Also, the issue is not as straight-forward as it seems. The DEFAULTED status of the zpool was a 'false positive', because A- The "present pool" did not accept any zpool commands and always gave message like no such pool or dataset ... recover the pool from a backup source. B- The more relevant on-disk metadata showed and still shows this: # zdb -l /dev/ada0p2 => all 4 labels intact and pool_guid: 12018916494219117471 vdev_tree: type: 'disk' id: 0 guid: 17860002997423999070 While the pool showing up in the zpool list was/is clearly in a worse state that the above pool: # zdb -l /dev/ada0 => only label 2 intact and pool_guid: 16018525702691588432 In my opinion, this problem is more similar to a "Resolving a Missing Device" problem rather than data corruption. Unfortunately, missing device repairs focus on mirrored setups and no decent document on missing device of single-HDD pool. ----- 10-Current-amd64-using ccache-portstree merged with marcuscom.gnome3 & xorg.devel -- View this message in context: http://freebsd.1045724.n5.nabble.com/ZFS-recover-destroyed-zpool-with-ZDB-tp5804517p5804714.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Thu Apr 18 18:49:53 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 360AB935; Thu, 18 Apr 2013 18:49:53 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id BE290138E; Thu, 18 Apr 2013 18:49:52 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r3IInpvi019293; Thu, 18 Apr 2013 12:49:51 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r3IInp7l019292; Thu, 18 Apr 2013 12:49:51 -0600 (MDT) (envelope-from ken) Date: Thu, 18 Apr 2013 12:49:51 -0600 From: "Kenneth D. Merry" To: Bruce Evans Subject: Re: patches to add new stat(2) file flags Message-ID: <20130418184951.GA18777@nargothrond.kdm.org> References: <20130307000533.GA38950@nargothrond.kdm.org> <20130307222553.P981@besplex.bde.org> <20130308232155.GA47062@nargothrond.kdm.org> <20130310181127.D2309@besplex.bde.org> <20130409190838.GA60733@nargothrond.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130409190838.GA60733@nargothrond.kdm.org> User-Agent: Mutt/1.4.2i Cc: arch@FreeBSD.org, fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Apr 2013 18:49:53 -0000 On Tue, Apr 09, 2013 at 13:08:38 -0600, Kenneth D. Merry wrote: > On Sun, Mar 10, 2013 at 19:21:57 +1100, Bruce Evans wrote: > > On Fri, 8 Mar 2013, Kenneth D. Merry wrote: > > > > >On Fri, Mar 08, 2013 at 00:37:15 +1100, Bruce Evans wrote: > > >>On Wed, 6 Mar 2013, Kenneth D. Merry wrote: > > >> > > >>>I have attached diffs against head for some additional stat(2) file > > >>>flags. > > >>> > > >>>The primary purpose of these flags is to improve compatibility with CIFS, > > >>>both from the client and the server side. > > >>>... > > >> > > >>I missed looking at the diffs in my previous reply. > > >> > > >>% --- //depot/users/kenm/FreeBSD-test3/bin/chflags/chflags.1 2013-03-04 > > >>17:51:12.000000000 -0700 > > >>% +++ /usr/home/kenm/perforce4/kenm/FreeBSD-test3/bin/chflags/chflags.1 > > >>2013-03-04 17:51:12.000000000 -0700 > > >>% --- /tmp/tmp.49594.86 2013-03-06 16:42:43.000000000 -0700 > > >>% +++ /usr/home/kenm/perforce4/kenm/FreeBSD-test3/bin/chflags/chflags.1 > > >>2013-03-06 14:47:25.987128763 -0700 > > >>% @@ -117,6 +117,16 @@ > > >>% set the user immutable flag (owner or super-user only) > > >>% .It Cm uunlnk , uunlink > > >>% set the user undeletable flag (owner or super-user only) > > >>% +.It Cm system , usystem > > >>% +set the Windows system flag (owner or super-user only) > > >> > > >>This begins unsorting of the list. > > > > > >Fixed. > > > > > >>It's not just a Windows flag, since it also works in DOS. > > > > > >Fixed. > > > > Thanks. Hopefully all the simple bugs are fixed now. > > > > >>"Owner or" is too strict for msdosfs, since files can only have a > > >>single owner so it is controlling access using groups is needed. I > > >>use owner root and group msdosfs for msdosfs mounts. This works for > > >>normal operations like open/read/write, but fails for most attributes > > >>including file flags. msdosfs doesn't support many attributes but > > >>this change is supposed to add support for 3 new file flags so it would > > >>be good if it didn't restrict the support to root. > > > > > >I wasn't trying to change the existing security model for msdosfs, but if > > >you've got a suggested patch to fix it I can add that in. > > > > I can't think of anything better than making group write permission enough > > for attributes. > > > > msdosfs also has some style bugs in this area. It uses VOP_ACCESS() > > with VADMIN for the non-VA_UTIMES_NULL case of utimes(), but for all > > other attributes it hard-codes a direct uid check followed a > > priv_check_cred() with PRIV_VFS_ADMIN. VADMIN requires even more than > > user write permission for POSIX file systems and using it unchanged > > for all the attributes would be even more restrictive unless we changed > > it, but it would be easier to make it uniformly less restrictive for > > msdosfs by using it consistently. > > > > Oops, that was in the old version of ffs. ffs now has related > > complications and unnecessary style bugs (verboseness and misformatting) > > to support ACLs. It now uses VOP_ACCESSX() with VWRITE_ATTRIBUTES for > > utimes(), and VOP_ACCESSX() with other VFOO for all attributes except > > flags. It still uses VOP_ACCESS() with VADMIN() for flags. > > > > >>... > > >>% .It Dv SF_ARCHIVED > > >>... > > >>% +Filesystems in FreeBSD may or may not have special handling for this > > >>flag. > > >>% +For instance, ZFS tracks changes to files and will clear this bit when > > >>a > > >>% +file is updated. > > >>% +UFS only stores the flag, and relies on the application to change it > > >>when > > >>% +needed. > > >> > > >>I think that is useless, since changing it is needed whenever the file > > >>changes, and applications can do that (short of running as daemons and > > >>watching for changes). > > > > > >Do you mean applications can't do that or can? > > > > Oops, can't. > > > > It is still hard for users to know how their file system supports. > > Even programmers don't know that it is backwards :-). > > > > >>% --- //depot/users/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c > > >>2013-03-04 17:51:12.000000000 -0700 > > >>% +++ > > >>/usr/home/kenm/perforce4/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c > > >>2013-03-04 17:51:12.000000000 -0700 > > >>% --- /tmp/tmp.49594.370 2013-03-06 16:42:43.000000000 -0700 > > >>% +++ > > >>/usr/home/kenm/perforce4/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c > > >>2013-03-06 14:49:47.179130318 -0700 > > >>% @@ -345,8 +345,17 @@ > > >>% vap->va_birthtime.tv_nsec = 0; > > >>% } > > >>% vap->va_flags = 0; > > >>% + /* > > >>% + * The DOS Archive attribute means that a file needs to be > > >>% + * archived. The BSD SF_ARCHIVED attribute means that a file has > > >>% + * been archived. Thus the inversion here. > > >>% + */ > > >> > > >>No need to document it again. It goes without saying that ARCHIVE > > >>!= ARCHIVED. > > > > > >I disagree. It wasn't immediately obvious to me that SF_ARCHIVED was > > >generally used as the inverse of the DOS Archived bit until I started > > >digging into this. If this helps anyone figure that out more quickly, it's > > >useful. > > > > The surprising thing is that it is backwards in FreeBSD and not really > > supported except in msdosfs. Now several file systems have the comment > > about it being inverted, but man pages still don't. > > I made the change to UF_ARCHIVE, and updated the man pages. > > > >>% @@ -420,12 +429,21 @@ > > >>% if (error) > > >>% return (error); > > >>% } > > >> > > >>The permissions check before this is delicate and was broken and is > > >>more broken now. It is still short-circuit to handle setting the > > >>single flag that used to be supported, and is slightly broken for that: > > >>- unprivileged user asks to set ARCHIVE by passing !SF_ARCHIVED. We > > >> allow that, although this may toggle the flag and normal semantics > > >> for SF flags is to not allow toggling. > > >>- unprivileged user asks to clear ARCHIVE by passing SF_ARCHIVED. We > > >> don't allow that. But we should allow preserving ARCHIVE if it is > > >> already clear. > > >>The bug wasn't very important when only 1 flag was supported. Now it > > >>prevents unprivileged users managing the new UF flags if ARCHIVE is > > >>clear. Fortunately, this is the unusual case. Anyway, unprivileged > > >>users can set ARCHIVE by doing some other operation. Even the chflags() > > >>operation should set ARCHIVE and thus allow further chflags()'s that now > > >>keep ARCHIVE set. Except it is very confusing if a chflags() asks for > > >>ARCHIVE to be clear. This request might be just to try to preserve > > >>the current setting and not want it if other things are changed, or > > >>it might be to purposely clear it. Changing it from set to clear should > > >>still be privileged. > > > > > >I changed it to allow setting or clearing SF_ARCHIVED. Now I can set or > > >clear the flag as non-root: > > > > Actually, it seems OK, since there are no old or new SF_ immututable flags. > > Some of the actions are broken in the old and new code for directories -- > > see below. > > > > >>See the more complicated permissions check in ffs. It would be safest > > >>to duplicate most of it, to get different permissions checking for the > > >>SF and UF flags. Then decide if we want to keep allowing setting > > >>ARCHIVE without privilege. > > > > > >I think we should allow getting and setting SF_ARCHIVED without special > > >privileges. Given how it is generally used, I don't think it should be > > >restricted to the super-user. > > > > I don't really like that since changing the flags is mainly needed for > > the failry privileged operation of managing other OS's file systems. > > However, since we're mapping the SYSTEM flag to a UF_ flag, the SYSTEM > > flag will require less privilege than the ARCHIVE flag. This is backwards, > > so we might as well require less privilege for ARCHIVE too. I think we, > > that is, you should use a new UF_ARCHIVE flag with the correct sense. > > Okay, done. The patches are attached with UF_ARCHIVE used instead of > SF_ARCHIVED, with the sense reversed. > > > >Can you provide some code demonstrating how the permissions code should > > >be changed in msdosfs? I don't know that much about that sort of thing, > > >so I'll probably spend an inordinate amount of time stumbling > > >through it. > > > > Now I think only cleanups are needed. > > Okay. > > > >>% return EOPNOTSUPP; > > >>% if (vap->va_flags & SF_ARCHIVED) > > >>% dep->de_Attributes &= ~ATTR_ARCHIVE; > > >>% else if (!(dep->de_Attributes & ATTR_DIRECTORY)) > > >>% dep->de_Attributes |= ATTR_ARCHIVE; > > >> > > >>The comment before this says that we ignore attmps to set ATTR_ARCHIVED > > >>for directories. However, it is out of date. WinXP allows setting it > > >>and all the new flags for directories, and so do we. > > > > > >Do you mean we allow setting it in UFS, or where? Obviously the code above > > >won't set it on a directory. > > > > I meant it here. Actually, the comment matches the code -- I somehow missed > > the test in the code. However, the code is wrong. All directories except > > the root directory have this and other attributes, but FreeBSD refuses to > > set them. More below. > > > > >>The WinXP attrib command (at least on a FAT32 fs) doesn't allow setting > > >>or clearing ARCHIVE (even if it is already set or clear) if any of > > >>HIDDEN, READONLY or SYSTEM is already set and remains set after the > > >>command. Thus the HRS attributes act a bit like immutable flags, but > > >>subtly differently. (ffs has the quite different and worse behaviour > > >>of allowing chflags() irrespective of immutable flags being set before > > >>or after, provided there is enough privilege to change the immutable > > >>flags.) Anyway, they should all give some aspects of immutability. > > > > > >We could do that for msdosfs, but why add more things for the user to trip > > >over given how the filesystem is typically used? Most people probably > > >use it for USB thumb drives these days. Or perahps on a dual boot system > > >to access their Windows partition. > > > > The small data drives won't have many files with attributes (except > > ARCHIVE). For multiple-boot, I think the permssions shouldn't be too > > much different than the foreign OS's. I used not to worry about this > > and liked deleting WinXP files without asking it, but recently I spent > > a lot of time recovering a WinXP ntfs partition and changed a bit too > > much using FreeBSD and Cygwin because I didn't understand the > > permissions (especially ACLs). ntfs in FreeBSD was less than r/o so it > > couldn't even back up the permissions (for file flags, it returned the > > garbage in its internal inode flags without translation...). > > > > >*** src/bin/chflags/chflags.1.orig > > >--- src/bin/chflags/chflags.1 > > >*************** > > >*** 101,120 **** > > > .Bl -tag -offset indent -width ".Cm opaque" > > > .It Cm arch , archived > > > set the archived flag (super-user only) > > > .It Cm opaque > > > set the opaque flag (owner or super-user only) > > >- .It Cm nodump > > >- set the nodump flag (owner or super-user only) > > > .It Cm sappnd , sappend > > > > The opaque flag is UF_ too. > > Yes, but all of the flag descriptions are sorted in alphabetical order. > How would you suggest sorting them instead? (SF first and then UF, both in > some version of alphabetical order?) > > > >+ .It Cm snapshot > > >+ set the snapshot flag (most filesystems do not allow changing this flag) > > > > I think none do. It can only be displayed. > > Fixed. > > > chflags(1) doesn't display flags, so this shouldn't be here. The problem > > is that this man page is the only place where the flag names are documented. > > ls(1) and strtofflags(3) just point to here. strtofflags(3) says that the > > flag names are documented here, but ls(1) just has an Xref to here. > > I fixed ls(1) at least. > > > >*** src/lib/libc/sys/chflags.2.orig > > >--- src/lib/libc/sys/chflags.2 > > >--- 71,127 ---- > > > the following values > > > .Pp > > > .Bl -tag -width ".Dv SF_IMMUTABLE" -compact -offset indent > > >! .It Dv SF_APPEND > > > The file may only be appended to. > > > .It Dv SF_ARCHIVED > > >! The file has been archived. > > >! This flag means the opposite of the Windows and CIFS > > >FILE_ATTRIBUTE_ARCHIVE > > > > DOS, Windows and CIFS... > > Fixed. > > > >! attribute. > > >! That attribute means that the file should be archived, whereas > > >! .Dv SF_ARCHIVED > > >! means that the file has been archived. > > >! Filesystems in FreeBSD may or may not have special handling for this > > >flag. > > >! For instance, ZFS tracks changes to files and will clear this bit when a > > >! file is updated. > > > > Does zfs clear it in other circumstances? WinXP doesn't for msdosfs (or > > ntfs?), but FreeBSD clears it when changing some attributes, even for > > null changes (these are: times except for atimes, and the HIDDEN attribute > > when it is changed by chmod() -- even for null changes --, but not for > > the HIDDEN attribute when it is changed (or preserved) by chflags() in > > your new code). I want to to be cleared for metadata so that backup > > utilities can trust the ARCHIVE flag for metadata changes. > > Well, it does look like changing a file or touching it causes the archive > flag to get set with ZFS: > > # touch foo > # ls -lao foo > -rw-r--r-- 1 root wheel uarch 0 Apr 8 21:45 foo > # chflags 0 foo > # ls -lao foo > -rw-r--r-- 1 root wheel - 0 Apr 8 21:45 foo > # echo "hello" >> foo > # ls -lao foo > -rw-r--r-- 1 root wheel uarch 6 Apr 8 21:46 foo > # chflags 0 foo > # ls -lao foo > -rw-r--r-- 1 root wheel - 6 Apr 8 21:46 foo > # touch foo > # ls -lao foo > -rw-r--r-- 1 root wheel uarch 6 Apr 8 21:46 foo > > > >+ .It Dv UF_IMMUTABLE > > >+ The file may not be changed. > > >+ Filesystems may use this flag to maintain compatibility with the Windows > > >and > > >+ CIFS FILE_ATTRIBUTE_READONLY attribute. > > > > So READONLY is only mapped to UFS_IMMUTABLE if it gives immutability? > > No, it's mapped to whatever the CIFS server decides. In my changes to > Likewise, I mapped it to UF_IMMUTABLE. I mapped UF_IMMUTABLE to the ZFS > READONLY flag. As Pawel pointed out, there has been some talk on the > Illumos developers list about just storing the READONLY bit and not > enforcing it in ZFS: > > http://www.listbox.com/member/archive/182179/2013/03/sort/time_rev/page/2/?search_for=readonly > > That complicates things somewhat in the Illumos CIFS server, and so I think > it's a reasonable thing to just record the bit and let the CIFS server > enforce things where it needs to. > > UFS does honor the UF_IMMUTABLE flag, so it may be that we need to create > a UF_READONLY flag that corresponds to the DOS readonly flag and is only > stored, and the enforcement would happen in the CIFS server. > > > >*** src/sys/fs/msdosfs/msdosfs_vnops.c.orig > > >--- src/sys/fs/msdosfs/msdosfs_vnops.c > > >*************** > > >*** 415,431 **** > > > * set ATTR_ARCHIVE for directories `cp -pr' from a more > > > * sensible filesystem attempts it a lot. > > > */ > > >! if (vap->va_flags & SF_SETTABLE) { > > > error = priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0); > > > if (error) > > > return (error); > > > } > > >! if (vap->va_flags & ~SF_ARCHIVED) > > > return EOPNOTSUPP; > > > if (vap->va_flags & SF_ARCHIVED) > > > dep->de_Attributes &= ~ATTR_ARCHIVE; > > > else if (!(dep->de_Attributes & ATTR_DIRECTORY)) > > > dep->de_Attributes |= ATTR_ARCHIVE; > > > dep->de_flag |= DE_MODIFIED; > > > } > > > > > >--- 424,448 ---- > > > * set ATTR_ARCHIVE for directories `cp -pr' from a more > > > * sensible filesystem attempts it a lot. > > > */ > > >! if (vap->va_flags & (SF_SETTABLE & ~(SF_ARCHIVED))) { > > > > Excessive parentheses. > > Fixed, by moving to UF_ARCHIVE. > > > > error = priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0); > > > if (error) > > > return (error); > > > } > > > > VADMIN is still needed, and that is too strict. This is a general problem > > and should be fixed separately. > > I took out the check, since I changed the code to use UF_ARCHIVE instead of > SF_ARCHIVED. > > > >! if (vap->va_flags & ~(SF_ARCHIVED | UF_HIDDEN | UF_SYSTEM)) > > > return EOPNOTSUPP; > > > if (vap->va_flags & SF_ARCHIVED) > > > dep->de_Attributes &= ~ATTR_ARCHIVE; > > > else if (!(dep->de_Attributes & ATTR_DIRECTORY)) > > > dep->de_Attributes |= ATTR_ARCHIVE; > > >+ if (vap->va_flags & UF_HIDDEN) > > >+ dep->de_Attributes |= ATTR_HIDDEN; > > >+ else > > >+ dep->de_Attributes &= ~ATTR_HIDDEN; > > >+ if (vap->va_flags & UF_SYSTEM) > > >+ dep->de_Attributes |= ATTR_SYSTEM; > > >+ else > > >+ dep->de_Attributes &= ~ATTR_SYSTEM; > > > dep->de_flag |= DE_MODIFIED; > > > } > > > > Technical old and new problems with msdosfs: > > - all directories except the root directory support the 3 attributes > > handled above, and READONLY > > - the special case for the root directory is because before FAT32, the > > root directory didn't have an entry for itself (and was otherwise > > special). With FAT32, the root directory is not so special, but > > still doesn't have an entry for itself. > > - thus the old code in the above is wrong for all directories except > > the root directory > > - thus the new code in the above is wrong for the root directory. It > > will make changes to the in-core denode. These can be seen by stat() > > for a while, but go away when the vnode is recycled. > > - other code is wrong for directories too. deupdat() refuses to > > convert from the in-core denode to the disk directory entry for > > directories. So even when the above changes values for directories, > > the changes only get synced to the disk accidentally when there is > > a large change (such as for extending the directory), to the directory > > entry. > > - being the root directory is best tested for using VV_ROOT. I use the > > following to fix the corresponding bugs in utimes(): > > > > /* Was: silently ignore the non-error or error for all dirs. > > */ > > if (DETOV(dep)->v_vflag & VV_ROOT) > > return (EINVAL); > > /* Otherwise valid. */ > > > > deupdat() needs a similar change to not ignore all directories. > > Okay, I think these issues should now be fixed. We now refuse to change > attributes only on the root directory. And I updatd deupdat() to do the > same. > > When a directory is created or a file is added, the archive bit is not > changed on the directory. Not sure if we need to do that or not. (Simply > changing msdosfs_mkdir() to set ATTR_ARCHIVE was not enough to get the > archive bit set on directory creation.) Bruce, any comment on this? Thanks, Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-fs@FreeBSD.ORG Thu Apr 18 19:14:13 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 62D2938E; Thu, 18 Apr 2013 19:14:13 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 3B3CF160F; Thu, 18 Apr 2013 19:14:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3IJED8A015352; Thu, 18 Apr 2013 19:14:13 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3IJEDs2015351; Thu, 18 Apr 2013 19:14:13 GMT (envelope-from linimon) Date: Thu, 18 Apr 2013 19:14:13 GMT Message-Id: <201304181914.r3IJEDs2015351@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/177966: [zfs] resilver completes but subsequent scrub reports errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Apr 2013 19:14:13 -0000 Synopsis: [zfs] resilver completes but subsequent scrub reports errors Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Apr 18 19:13:57 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=177966 From owner-freebsd-fs@FreeBSD.ORG Fri Apr 19 17:43:52 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3C881DFA; Fri, 19 Apr 2013 17:43:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id F0F985EA; Fri, 19 Apr 2013 15:57:28 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7A1061041552; Fri, 19 Apr 2013 22:53:51 +1000 (EST) Date: Fri, 19 Apr 2013 22:53:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Kenneth D. Merry" Subject: Re: patches to add new stat(2) file flags In-Reply-To: <20130418184951.GA18777@nargothrond.kdm.org> Message-ID: <20130419215624.L1262@besplex.bde.org> References: <20130307000533.GA38950@nargothrond.kdm.org> <20130307222553.P981@besplex.bde.org> <20130308232155.GA47062@nargothrond.kdm.org> <20130310181127.D2309@besplex.bde.org> <20130409190838.GA60733@nargothrond.kdm.org> <20130418184951.GA18777@nargothrond.kdm.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=A8I0pNqG c=1 sm=1 a=n2O7wv11oSwA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=YOiZBDKP_E4A:10 a=QuKyM733q63FOVycHlwA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: arch@FreeBSD.org, fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Apr 2013 17:43:52 -0000 On Thu, 18 Apr 2013, Kenneth D. Merry wrote: > On Tue, Apr 09, 2013 at 13:08:38 -0600, Kenneth D. Merry wrote: >> ... >> Okay, I think these issues should now be fixed. We now refuse to change >> attributes only on the root directory. And I updatd deupdat() to do the >> same. >> >> When a directory is created or a file is added, the archive bit is not >> changed on the directory. Not sure if we need to do that or not. (Simply >> changing msdosfs_mkdir() to set ATTR_ARCHIVE was not enough to get the >> archive bit set on directory creation.) > > Bruce, any comment on this? I didn't get around to looking at it closely. Just had a quick look at the msdosfs parts. Apparently we are already doing the same as WinXP for ATTR_ARCHIVE on directories. Not the right thing, but: - don't set it on directory creation - don't set it on directory modification - allow setting and clearing it (with your changes). @ *** src/lib/libc/sys/chflags.2.orig @ --- src/lib/libc/sys/chflags.2 @ *************** @ *** 112,137 **** @ ... @ --- 112,170 ---- @ ... @ + .It Dv UF_IMMUTABLE @ + The file may not be changed. @ + Filesystems may use this flag to maintain compatibility with the DOS, Windows @ + and CIFS FILE_ATTRIBUTE_READONLY attribute. msdosfs doesn't use this yet. It uses ATTR_READONLY, and doesn't map this to or from UF_IMMUTABLE. I think I want ATTR_READONLY to be a flag and not affect the file permissions (just like immutable flags normally don't affect the file permissions. Does CIFS FILE_ATTRIBUTE_READONLY have exactly the same semantics as IMMUTABLE? That is, does it prevent all operations on the file and the file's metadata except read()? For IMMUTABLE, the other operations that it disallows include setattr(), rename() and unlink(). Well it doesn't in WinXP using Cygwin. I made a directory with attributes +R, and this didn't prevent creating files in the directory or rmdir of the directory. Even attributes +R +H +S didn't prevent these operations. Maybe +R isn't really used for directories, like +A. Then for a file with +R +H +S: - rm asked before deleting it (+R changed its fake permissions from rw-r--r-- to r--r--r--). - touching it succeeded - attrib on it succeeded - writing it failed. So it seems that in WinXP, ATTR_READONLY is ignored for directories, and more like the !writeable permission than the immutable flag. @ *** src/sys/fs/msdosfs/msdosfs_denode.c.orig @ --- src/sys/fs/msdosfs/msdosfs_denode.c @ *************** @ *** 300,307 **** @ if ((dep->de_flag & DE_MODIFIED) == 0) @ return (0); @ dep->de_flag &= ~DE_MODIFIED; @ ! if (dep->de_Attributes & ATTR_DIRECTORY) @ ! return (0); @ if (dep->de_refcnt <= 0) @ return (0); @ error = readde(dep, &bp, &dirp); @ --- 300,309 ---- @ if ((dep->de_flag & DE_MODIFIED) == 0) @ return (0); @ dep->de_flag &= ~DE_MODIFIED; @ ! /* Was: silently ignore attribute changes for all dirs. */ @ ! if (DETOV(dep)->v_vflag & VV_ROOT) @ ! return (EINVAL); @ ! /* Otherwise valid. */ Clean up the comments a bit. Say nothing, or that all attributes apply to all directories except the root directory. Perhaps the VV_ROOT case is unreachable because callers filter out this case. I have a debugger trap for it. @ if (dep->de_refcnt <= 0) @ return (0); @ error = readde(dep, &bp, &dirp); @ *** src/sys/fs/msdosfs/msdosfs_vnops.c.orig @ --- src/sys/fs/msdosfs/msdosfs_vnops.c @ *************** @ *** 398,403 **** @ --- 402,418 ---- @ if (vap->va_flags != VNOVAL) { @ if (vp->v_mount->mnt_flag & MNT_RDONLY) @ return (EROFS); @ + /* @ + * We don't allow setting attributes on the root directory, @ + * because according to Bruce Evans: "The special case for @ + * the root directory is because before FAT32, the root @ + * directory didn't have an entry for itself (and was @ + * otherwise special). With FAT32, the root directory is @ + * not so special, but still doesn't have an entry for itself." @ + */ @ + if (vp->v_vflag & VV_ROOT) @ + return (EINVAL); @ + @ if (cred->cr_uid != pmp->pm_uid) { @ error = priv_check_cred(cred, PRIV_VFS_ADMIN, 0); @ if (error) No need to give the source. I prefer the do this check after the permissions check, but if it is done early then it is best done as a single check for all attributes in msdosfs_settattr() and not just for flags. Currently there is: - no check for ownerships. We only allow null changes to ownerships. With no check like the above, we allow them even for the root directory, while the above disallows null changes to flags for the root directory. - for truncate(), the error is EISDIR for all directories. - for file times, we silently ignore changes for all directories, after doing permissions checks. Only the root directory should be special. - for file permissions, we handle directories as for file times. Now the only possible non-null change is of ATTR_READONLY, and since this apparently has no effect in WinXP, ignorig changing it for directories is best. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Apr 19 18:22:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0BD2C204 for ; Fri, 19 Apr 2013 18:22:42 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com [IPv6:2a00:1450:4010:c03::229]) by mx1.freebsd.org (Postfix) with ESMTP id 6F0C5112A for ; Fri, 19 Apr 2013 18:22:41 +0000 (UTC) Received: by mail-la0-f41.google.com with SMTP id er20so3859239lab.0 for ; Fri, 19 Apr 2013 11:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=eGyYth67GAF9lWFYom85IEKgT9nY97BIAeKiCbVWoDY=; b=aHFJvGeVkX49hqTklfR/UUiSb+iQH6K4totX0zNNwrrZrIOM1aeDNeRaXpTaKLEWDF rFhwKckvTnkMetQdi5ympIZRqjkEfQzflx5rtmtMtVY9v1RA9PwswNqneNoSzIt233IX UGDC4o4tMrA87+isYpaQMhhdCZx+R7hhykFI8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=eGyYth67GAF9lWFYom85IEKgT9nY97BIAeKiCbVWoDY=; b=YDDYgouyx9DmpSFdReQlKG/h56cGtsQ+xK5BbTQGnzKmZgcGkPuqQECaDDXvfoKfDe uB7xBtOJOH9KOD9dgxSyYMGdIUyC4AL8N5Z/Gd74cCZA0ysT0oYgjp1aNKTIeO/ie86v f9LqeIx7VmZ96guweoCITKQpy8Qv4DtvUH2jnKjfHwr32edPB23MhBhg3TMM1FGf87Uv BpkG6WrvYmLFAcFF10m4fMxg2/txHIl78lMumL9IZGrgtNge6ThWEb/guV0EddhckXxd YWlZ/vIBtNdG8xcVMa4KD8phdGw+q2lc4hnZ278G+rmp62JqNHm2LvmSDshl4Qlgwlw6 pGtQ== MIME-Version: 1.0 X-Received: by 10.112.167.200 with SMTP id zq8mr8491324lbb.58.1366395760206; Fri, 19 Apr 2013 11:22:40 -0700 (PDT) Received: by 10.114.22.4 with HTTP; Fri, 19 Apr 2013 11:22:40 -0700 (PDT) In-Reply-To: <5169B0D7.9090607@platinum.linux.pl> References: <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org> <516949C7.4030305@platinum.linux.pl> <5169B0D7.9090607@platinum.linux.pl> Date: Fri, 19 Apr 2013 11:22:40 -0700 Message-ID: Subject: Re: ZFS slow reads for unallocated blocks From: Matthew Ahrens To: Adam Nowacki X-Gm-Message-State: ALoCoQkbe53M2ILBG4BB3rq1v1hzEbSViUPXubQ4IV5J/iWZc/SLkKk9sBYJToATY0XnLc2j5zF/ Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-fs@freebsd.org" , illumos-zfs , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Apr 2013 18:22:42 -0000 Sorry I'm late to the game here, just saw this email now. Yes, this is also a problem on illumos, though much less so on my system, only about 2x. It looks like the difference is due to the fact that the zeroed dbufs are not cached, so we have to zero the entire dbuf (e.g. 128k) for every read syscall (e.g. 8k). Increasing the size of the reads to match the recordsize results in performance parity between reading cached data and sparse zeros. You can see this behavior in the following dtrace, which shows that we are initializing the dbuf in dbuf_read_impl() as many times as we do syscalls: sudo dtrace -n 'dbuf_read_impl:entry/pid==$target/{@[probefunc] = count()}' -c 'dd if=t100m of=/dev/null bs=8k' dtrace: description 'dbuf_read_impl:entry' matched 1 probe *12800*+0 records in 12800+0 records out dtrace: pid 29419 has exited dbuf_read_impl *12800* --matt On Sat, Apr 13, 2013 at 12:24 PM, Adam Nowacki wrote: > Including zfs@illumos on this. To recap: > > Reads from sparse files are slow with speed proportional to ratio of read > size to filesystem recordsize ratio. There is no physical disk I/O. > > # zfs create -o atime=off -o recordsize=128k -o compression=off -o > sync=disabled -o mountpoint=/home/testfs home/testfs > # dd if=/dev/random of=/home/testfs/random10m bs=1024k count=10 > # truncate -s 10m /home/testfs/trunc10m > # dd if=/home/testfs/random10m of=/dev/null bs=512 > 10485760 bytes transferred in 0.078637 secs (133344041 bytes/sec) > # dd if=/home/testfs/trunc10m of=/dev/null bs=512 > 10485760 bytes transferred in 1.011500 secs (10366544 bytes/sec) > > # zfs create -o atime=off -o recordsize=8M -o compression=off -o > sync=disabled -o mountpoint=/home/testfs home/testfs > # dd if=/home/testfs/random10m of=/dev/null bs=512 > 10485760 bytes transferred in 0.080430 secs (130371205 bytes/sec) > # dd if=/home/testfs/trunc10m of=/dev/null bs=512 > 10485760 bytes transferred in 72.465486 secs (144700 bytes/sec) > > This is from FreeBSD 9.1 and possible solution at > http://tepeserwery.pl/nowak/**freebsd/zfs_sparse_** > optimization_v2.patch.txt- untested yet, system will be busy building packages for a few more days. > > > On 2013-04-13 19:11, Will Andrews wrote: > >> Hi, >> >> I think the idea of using a pre-zeroed region as the 'source' is a good >> one, but probably it would be better to set a special flag on a hole >> dbuf than to require caller flags. That way, ZFS can lazily evaluate >> the hole dbuf (i.e. avoid zeroing db_data until it has to). However, >> that could be complicated by the fact that there are many potential >> users of hole dbufs that would want to write to the dbuf. >> >> This sort of optimization should be brought to the illumos zfs list. As >> it stands, your patch is also FreeBSD-specific, since 'zero_region' only >> exists in vm/vm_kern.c. Given the frequency of zero-copying, however, >> it's quite possible there are other versions of this region elsewhere. >> >> --Will. >> >> >> On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki > >> wrote: >> >> Temporary dbufs are created for each missing (unallocated on disk) >> record, including indirects if the hole is large enough. Those dbufs >> never find way to ARC and are freed at the end of dmu_read_uio. >> >> A small read (from a hole) would in the best case bzero 128KiB >> (recordsize, more if missing indirects) ... and I'm running modified >> ZFS with record sizes up to 8MiB. >> >> # zfs create -o atime=off -o recordsize=8M -o compression=off -o >> mountpoint=/home/testfs home/testfs >> # truncate -s 8m /home/testfs/trunc8m >> # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1 >> 1+0 records in >> 1+0 records out >> 8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec) >> >> # time cat /home/testfs/trunc8m > /dev/null >> 0.000u 6.111s 0:06.11 100.0% 15+2753k 0+0io 0pf+0w >> >> # time cat /home/testfs/zero8m > /dev/null >> 0.000u 0.010s 0:00.01 100.0% 12+2168k 0+0io 0pf+0w >> >> 600x increase in system time and close to 1MB/s - insanity. >> >> The fix - a lot of the code to efficiently handle this was already >> there. >> >> dbuf_hold_impl has int fail_sparse argument to return ENOENT for >> holes. Just had to get there and somehow back to dmu_read_uio where >> zeroing can happen at byte granularity. >> >> ... didn't have time to actually test it yet. >> >> >> On 2013-04-13 12:24, Andriy Gapon wrote: >> >> on 13/04/2013 02:35 Adam Nowacki said the following: >> >> http://tepeserwery.pl/nowak/__**freebsd/zfs_sparse___** >> optimization.patch.txt >> >> > optimization.patch.txt >> > >> >> Does it look sane? >> >> >> It's hard to tell from a quick look since they change is not >> small. >> What is your idea of the problem and the fix? >> >> On 2013-04-12 09:03, Andriy Gapon wrote: >> >> >> ENOTIME to really investigate, but here is a basic >> profile result for those >> interested: >> kernel`bzero+0xa >> kernel`dmu_buf_hold_array_by__** >> _dnode+0x1cf >> >> kernel`dmu_read_uio+0x66 >> kernel`zfs_freebsd_read+0x3c0 >> kernel`VOP_READ_APV+0x92 >> kernel`vn_read+0x1a3 >> kernel`vn_io_fault+0x23a >> kernel`dofileread+0x7b >> kernel`sys_read+0x9e >> kernel`amd64_syscall+0x238 >> kernel`0xffffffff80747e4b >> >> That's where > 99% of time is spent. >> >> >> >> >> >> ______________________________**___________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/__**mailman/listinfo/freebsd-fs >> >> >> > >> To unsubscribe, send any mail to >> "freebsd-fs-unsubscribe@__free**bsd.org >> >> >" >> >> >> > ______________________________**_________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org > " > From owner-freebsd-fs@FreeBSD.ORG Sat Apr 20 17:06:50 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4665A1FB; Sat, 20 Apr 2013 17:06:50 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 1F813613; Sat, 20 Apr 2013 17:06:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KH6nPY061216; Sat, 20 Apr 2013 17:06:49 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KH6nut061215; Sat, 20 Apr 2013 17:06:49 GMT (envelope-from linimon) Date: Sat, 20 Apr 2013 17:06:49 GMT Message-Id: <201304201706.r3KH6nut061215@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2013 17:06:50 -0000 Old Synopsis: disk usage problem when copying from one zfs dataset to another on the same pool using mv command New Synopsis: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Apr 20 17:06:38 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=177985 From owner-freebsd-fs@FreeBSD.ORG Sat Apr 20 17:08:58 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9CA50334; Sat, 20 Apr 2013 17:08:58 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 76886641; Sat, 20 Apr 2013 17:08:58 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KH8w8Q061399; Sat, 20 Apr 2013 17:08:58 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KH8wr4061398; Sat, 20 Apr 2013 17:08:58 GMT (envelope-from linimon) Date: Sat, 20 Apr 2013 17:08:58 GMT Message-Id: <201304201708.r3KH8wr4061398@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/177971: [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3, rsize=4096, wsize=4096 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2013 17:08:58 -0000 Old Synopsis: FreeBSD 9.1 nfs client dirlist problem w/ nfsv3,rsize=4096,wsize=4096 New Synopsis: [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3,rsize=4096,wsize=4096 Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Apr 20 17:08:46 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=177971 From owner-freebsd-fs@FreeBSD.ORG Sat Apr 20 17:40:03 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 27CB1CBE for ; Sat, 20 Apr 2013 17:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 1A83F7D6 for ; Sat, 20 Apr 2013 17:40:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KHe2IS067656 for ; Sat, 20 Apr 2013 17:40:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KHe2HQ067655; Sat, 20 Apr 2013 17:40:02 GMT (envelope-from gnats) Date: Sat, 20 Apr 2013 17:40:02 GMT Message-Id: <201304201740.r3KHe2HQ067655@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: "Steven Hartland" Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Steven Hartland List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2013 17:40:03 -0000 The following reply was made to PR kern/177985; it has been noted by GNATS. From: "Steven Hartland" To: , Cc: Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command Date: Sat, 20 Apr 2013 18:30:26 +0100 Deletes / frees are lower priority than standard writes / reads so its quite possible in the scenario you describe that you could run out of space. Could you please confirm the exact behaviour by allow mv to process a number of files, before suspending and seeing if the free space is correct for the current progress after waiting for the pool to sync all outstanding requests. Regards Steve From owner-freebsd-fs@FreeBSD.ORG Sat Apr 20 19:20:02 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 87CB2996 for ; Sat, 20 Apr 2013 19:20:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 7A44CB37 for ; Sat, 20 Apr 2013 19:20:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KJK2MT088018 for ; Sat, 20 Apr 2013 19:20:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KJK2Pn088017; Sat, 20 Apr 2013 19:20:02 GMT (envelope-from gnats) Date: Sat, 20 Apr 2013 19:20:02 GMT Message-Id: <201304201920.r3KJK2Pn088017@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Andriy Gapon Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Andriy Gapon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2013 19:20:02 -0000 The following reply was made to PR kern/177985; it has been noted by GNATS. From: Andriy Gapon To: bug-followup@FreeBSD.org, sybersnake@gmail.com Cc: Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command Date: Sat, 20 Apr 2013 22:12:12 +0300 Sorry, but I do not see any bug reported here. mv behaves as it is expected/documented to behave. ZFS behaves as it should as well. If the behavior is surprising to you then please update your knowledge of the tools. If you need a different behavior then you can script it yourself or use different tools to accomplish your job. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Apr 20 19:50:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 68764ECB for ; Sat, 20 Apr 2013 19:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5A96ED01 for ; Sat, 20 Apr 2013 19:50:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KJo1WA093416 for ; Sat, 20 Apr 2013 19:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KJo1bR093415; Sat, 20 Apr 2013 19:50:01 GMT (envelope-from gnats) Date: Sat, 20 Apr 2013 19:50:01 GMT Message-Id: <201304201950.r3KJo1bR093415@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Jon Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Jon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2013 19:50:01 -0000 The following reply was made to PR kern/177985; it has been noted by GNATS. From: Jon To: Andriy Gapon Cc: "bug-followup@FreeBSD.org" Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command Date: Sat, 20 Apr 2013 15:49:41 -0400 This is not a bug, it is a workflow problem introduced by the difference in b= ehavior between ZFS datasets and fixed sized file systems.=20 You should be able to move files from one dataset to another on the same poo= l without having to copy it to another pool and back. This all can be accomp= lished by deleting copied files more often than it currently does or at leas= t adding a flag to turn on synchronized deletes. After I am done testing the same scenario on Solaris I will run the test Ste= ve suggested.=20 Sent from my iPhone On Apr 20, 2013, at 3:12 PM, Andriy Gapon wrote: >=20 > Sorry, but I do not see any bug reported here. > mv behaves as it is expected/documented to behave. > ZFS behaves as it should as well. > If the behavior is surprising to you then please update your knowledge of t= he tools. > If you need a different behavior then you can script it yourself or use > different tools to accomplish your job. >=20 > --=20 > Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Apr 20 20:30:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7D018263 for ; Sat, 20 Apr 2013 20:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 6FB50DE1 for ; Sat, 20 Apr 2013 20:30:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3KKU1xr001301 for ; Sat, 20 Apr 2013 20:30:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3KKU1uu001300; Sat, 20 Apr 2013 20:30:01 GMT (envelope-from gnats) Date: Sat, 20 Apr 2013 20:30:01 GMT Message-Id: <201304202030.r3KKU1uu001300@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Andriy Gapon Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Andriy Gapon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2013 20:30:01 -0000 The following reply was made to PR kern/177985; it has been noted by GNATS. From: Andriy Gapon To: Jon Cc: "bug-followup@FreeBSD.org" Subject: Re: kern/177985: [zfs] disk usage problem when copying from one zfs dataset to another on the same pool using mv command Date: Sat, 20 Apr 2013 23:25:40 +0300 on 20/04/2013 22:49 Jon said the following: > This is not a bug, it is a workflow problem introduced by the difference in > behavior between ZFS datasets and fixed sized file systems. > > You should be able to move files from one dataset to another on the same pool > without having to copy it to another pool and back. You lost me at 'another pool'. Perhaps moving an object from one zfs dataset to another could be optimized, but... That would definitely require zfs-specific tools. It is not implemented in the code yet, as far as I know. > This all can be > accomplished by deleting copied files more often than it currently does or at > least adding a flag to turn on synchronized deletes. No, it can not be accomplished that way, because it would violate how mv(1) across filesystems works. Perhaps it's indeed the time to read the man page? > After I am done testing the same scenario on Solaris I will run the test > Steve suggested. Yes, please do. Personal experience is always more enlightening that someone else's words. > On Apr 20, 2013, at 3:12 PM, Andriy Gapon wrote: > >> >> Sorry, but I do not see any bug reported here. mv behaves as it is >> expected/documented to behave. ZFS behaves as it should as well. If the >> behavior is surprising to you then please update your knowledge of the >> tools. If you need a different behavior then you can script it yourself or >> use different tools to accomplish your job. >> >> -- Andriy Gapon -- Andriy Gapon