From owner-freebsd-fs@FreeBSD.ORG Sun Jun 12 04:37:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 912E4106564A; Sun, 12 Jun 2011 04:37:17 +0000 (UTC) (envelope-from dmagda@ee.ryerson.ca) Received: from eccles.ee.ryerson.ca (ee.ryerson.ca [141.117.1.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1D9358FC16; Sun, 12 Jun 2011 04:37:16 +0000 (UTC) Received: from [10.0.1.2] (bas2-toronto09-1176443988.dsl.bell.ca [70.31.28.84]) (authenticated bits=0) by eccles.ee.ryerson.ca (8.14.4/8.14.4) with ESMTP id p5C3wY5L015382 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 11 Jun 2011 23:58:35 -0400 (EDT) (envelope-from dmagda@ee.ryerson.ca) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: David Magda In-Reply-To: <4DF28BCF.3060008@gmail.com> Date: Sat, 11 Jun 2011 23:58:39 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <525D503A-240C-49F2-9AAD-EC8E3C1CDB9A@ee.ryerson.ca> References: <4DECB197.8020102@FreeBSD.org> <4DF28BCF.3060008@gmail.com> To: Volodymyr Kostyrko X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: HEADS UP: ZFS v28 merged to 8-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Jun 2011 04:37:17 -0000 On Jun 10, 2011, at 17:25, Volodymyr Kostyrko wrote: > Am I missing something? How about using fletcher[24] for dedup? Fletcher is fairly weak as things go, and so even though two checksums = are the same, there's a decent chance that the data is actually = different. At least with recent releases of (Open)Solaris, when you = enable do a 'dedup=3Don' the has used is SHA-256, which has very, very, = very, low odds of having the same value occur from two different blocks = of data. When ZFS dedupe originally came out you could have one of the following = values: . off . on (=3D=3D sha256) . flecther4 with verify/compare . sha256 (without verify/compare) . sha256 with verify There was a long-ish thread on zfs-discuss fairly recently on whether = SHA-256 was "good enough" where you could trust it, or whether one = should do a verify step in addition to SHA-256: = http://mail.opensolaris.org/pipermail/zfs-discuss/2011-January/046875.html= While some people argued that it was prudent to use "verify" (especially = with your data/job on the line), a good portion of folks though said = that it's not worth it (i.e., if you're not worried about being hit by = lightning (2^-17 to 2^-18), you shouldn't be worried about a hash = collision (2^-128)). From owner-freebsd-fs@FreeBSD.ORG Sun Jun 12 04:46:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 037A9106566B; Sun, 12 Jun 2011 04:46:00 +0000 (UTC) (envelope-from dmagda@ee.ryerson.ca) Received: from eccles.ee.ryerson.ca (ee.ryerson.ca [141.117.1.2]) by mx1.freebsd.org (Postfix) with ESMTP id B99648FC14; Sun, 12 Jun 2011 04:45:59 +0000 (UTC) Received: from [10.0.1.2] (bas2-toronto09-1176443988.dsl.bell.ca [70.31.28.84]) (authenticated bits=0) by eccles.ee.ryerson.ca (8.14.4/8.14.4) with ESMTP id p5C4jpHp016219 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 12 Jun 2011 00:45:52 -0400 (EDT) (envelope-from dmagda@ee.ryerson.ca) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: David Magda In-Reply-To: Date: Sun, 12 Jun 2011 00:45:57 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4DECB197.8020102@FreeBSD.org> <20110610211202.GA52253@icarus.home.lan> To: Bob Friesenhahn X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: HEADS UP: ZFS v28 merged to 8-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Jun 2011 04:46:00 -0000 On Jun 10, 2011, at 17:24, Bob Friesenhahn wrote: > Dedup can require a huge amount of RAM, or a dedicated L2ARC SSD, = depending on the size of your storage. You should not enable it unless = you are prepared for the consequences. Under OpenSolaris, each tracking entry for a deduped block (which can be = between 512B to 128KB) can be up to 376 bytes (struct ddt_entry): so for = one 1 GB (10^9) of deduped data (244140 blocks@4K), you would need ~91MB = of overhead to keep track of it; for 1 TB (10^12) of deduped data you = would need ~91 GB of space to keep track of all the blocks. And if you = can't fit the DDT in RAM, it will have to be saved to disk, which means = more I/O to fetch the data. = http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts= /common/fs/zfs/sys/ddt.h?rev=3D1.2 If your data is in blocks smaller than 4K you'll need more memory for = the DDT; if the data is broken up into blocks larger than 4K you'll = probably need less. Also remember that even though an L2ARC cache may save you from having = to go to spinning rust, you still need to use some RAM (struct = arc_buf_hdr; ~178B) to reference the DDT stuff in L2ARC: = http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts= /common/fs/zfs/arc.c A few threads on zfs-dicuss on this: = http://mail.opensolaris.org/pipermail/zfs-discuss/2011-April/thread.html#4= 8026 = http://mail.opensolaris.org/pipermail/zfs-discuss/2011-May/thread.html#481= 85 Also, the above numbers are for OpenSolaris: someone may want to check = the structure sizes for FreeBSD to be sure. They should get you in the = right ballpark though. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 09:25:05 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E51DB1065672 for ; Mon, 13 Jun 2011 09:25:05 +0000 (UTC) (envelope-from prvs=1145dbc1e5=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 709F78FC13 for ; Mon, 13 Jun 2011 09:25:05 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 10:13:14 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 10:13:14 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50013570687.msg for ; Mon, 13 Jun 2011 10:13:13 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1145dbc1e5=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: From: "Steven Hartland" To: Date: Mon, 13 Jun 2011 10:13:33 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6090 Cc: Subject: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 09:25:06 -0000 I've just done an optimisation of a mysql table which is located on a compressed zfs partition and spotted that the size reported by du is impossibly small for the file:- du -h detail* 1.5K detail.frm 7.5K detail.ibd ls -l detail* -rw-rw---- 1 mysql mysql 8660 Jun 13 10:00 detail.frm -rw-rw---- 1 mysql mysql 650117120 Jun 13 10:04 detail.ibd The table format for those interested is 3 int's, 3 indexes and contains 8million rows. I highly doubt that my 620MB table is taking up just 7.5K on disk any ideas? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 09:29:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C0F4106566B for ; Mon, 13 Jun 2011 09:29:08 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id E46618FC19 for ; Mon, 13 Jun 2011 09:29:07 +0000 (UTC) Received: by bwz12 with SMTP id 12so5244636bwz.13 for ; Mon, 13 Jun 2011 02:29:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=TDY5QwLagz84ziNl1vImo/YLQP8RhKxvIo5EuWwGPUY=; b=kP/zPFgQNwye+b72v1va2aKTgH9ypqXAm9xhqD3poi0UjilRZJtS5t4PR6NEVrrXhd dbb+ZYb4oQ8D7E56amaVven8CVWwz9HcE7thaYPZkhdq+BCqSK3gIv7xt8rDXFOmnmf0 yWhPNvlU3OWCTgCsvAzbFHMXKRQ09VJx+AhaE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=PhVVY6oYEcUfzazbJfj6AOngmkxd+sGdnus7u9dxNVbSNT9Ros+IE5cxUejY/YTaeI AD6yETTTiMJTbfzWLwealUzWbEVJI5mF1xVxhIG2Ms6ZodRmMPi15ksNSCYX/aJvYtU4 O7nXfHK6zl637SV1l123HtIrX/dw5qwMm0jvQ= MIME-Version: 1.0 Received: by 10.205.81.193 with SMTP id zz1mr4389621bkb.3.1307957346509; Mon, 13 Jun 2011 02:29:06 -0700 (PDT) Received: by 10.204.180.139 with HTTP; Mon, 13 Jun 2011 02:29:06 -0700 (PDT) In-Reply-To: References: Date: Mon, 13 Jun 2011 11:29:06 +0200 Message-ID: From: =?UTF-8?Q?=C5=A0imun_Mikecin?= To: Steven Hartland , freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 09:29:08 -0000 2011/6/13 Steven Hartland > I've just done an optimisation of a mysql table which is located > on a compressed zfs partition and spotted that the size reported > by du is impossibly small for the file:- > > du -h detail* > 1.5K detail.frm > 7.5K detail.ibd > > ls -l detail* > -rw-rw---- 1 mysql mysql 8660 Jun 13 10:00 detail.frm > -rw-rw---- 1 mysql mysql 650117120 Jun 13 10:04 detail.ibd > > The table format for those interested is 3 int's, 3 indexes and > contains 8million rows. > > I highly doubt that my 620MB table is taking up just 7.5K on disk > any ideas? > It is possible if the file detail.ibd is mostly filled with zeros (that would be the case unless you have 620MB of real data). From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 09:48:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1B7EE106566B for ; Mon, 13 Jun 2011 09:48:05 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id F35778FC13 for ; Mon, 13 Jun 2011 09:48:04 +0000 (UTC) Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51]) by qmta09.emeryville.ca.mail.comcast.net with comcast id vMix1g00116AWCUA9Mo2RB; Mon, 13 Jun 2011 09:48:02 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta06.emeryville.ca.mail.comcast.net with comcast id vMo31g0041t3BNj8SMo3g0; Mon, 13 Jun 2011 09:48:03 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 39C0F102C19; Mon, 13 Jun 2011 02:48:03 -0700 (PDT) Date: Mon, 13 Jun 2011 02:48:03 -0700 From: Jeremy Chadwick To: Steven Hartland Message-ID: <20110613094803.GA10290@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 09:48:05 -0000 On Mon, Jun 13, 2011 at 10:13:33AM +0100, Steven Hartland wrote: > I've just done an optimisation of a mysql table which is located > on a compressed zfs partition and spotted that the size reported > by du is impossibly small for the file:- > > du -h detail* > 1.5K detail.frm > 7.5K detail.ibd > > ls -l detail* > -rw-rw---- 1 mysql mysql 8660 Jun 13 10:00 detail.frm > -rw-rw---- 1 mysql mysql 650117120 Jun 13 10:04 detail.ibd > > The table format for those interested is 3 int's, 3 indexes and > contains 8million rows. > > I highly doubt that my 620MB table is taking up just 7.5K on disk > any ideas? Well-known "quirk"; welcome to ZFS. :-) The following article is long, but if you grab a coffee and read it in full, it'll shed some light on the ordeal: http://www.cuddletech.com/blog/pivot/entry.php?id=983 There's also this: http://blog.buttermountain.co.uk/2008/05/10/zfs-compression-when-du-and-ls-appear-to-disagree/ This is one of the many reasons I do not use ZFS compression. Not spreading FUD, just saying stuff like this throws users for a loop, case in point. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 09:56:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 037B2106564A for ; Mon, 13 Jun 2011 09:56:21 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 833278FC12 for ; Mon, 13 Jun 2011 09:56:20 +0000 (UTC) Received: by bwz12 with SMTP id 12so5265970bwz.13 for ; Mon, 13 Jun 2011 02:56:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=1Q31GP5IILhMpmrGA78KfB9uyqHkgsZmBVWwyeSvhYk=; b=qo3SDf/vI0JZt1lko5dElxNZCb0tFqkvR4W94kZTjnGupbIQgahQj2j1kfSCTEvgX/ Xhg2k89y/Icdhu6na7QqLUTV2hntQQ96kBzgRyVd1y8qN6SfB1Dr1/VWHot/hx6U30fU dMMgow8yiqaDfJhoGPypGl/IopsTrugBrgk9I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=OH0P/lD+Ht9wT92dpXT9bfNAqgqymjsxp6wgDj9sdPyD0AIWfv0Vj6BU0ijb0AmHsj 9REY/xLifCy5rMOgljjp8LChvy6ggVcafVRvDJx3m5zWNeRRFlbtPUyw9SYzwQnz5aQt S917pD4bIhjP47xuo7lCwRgb60qq7Ep7iJVuA= MIME-Version: 1.0 Received: by 10.204.84.166 with SMTP id j38mr1369334bkl.84.1307958979361; Mon, 13 Jun 2011 02:56:19 -0700 (PDT) Received: by 10.204.180.139 with HTTP; Mon, 13 Jun 2011 02:56:19 -0700 (PDT) In-Reply-To: <20110613094803.GA10290@icarus.home.lan> References: <20110613094803.GA10290@icarus.home.lan> Date: Mon, 13 Jun 2011 11:56:19 +0200 Message-ID: From: =?UTF-8?Q?=C5=A0imun_Mikecin?= To: Jeremy Chadwick , freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 09:56:21 -0000 2011/6/13 Jeremy Chadwick This is one of the many reasons I do not use ZFS compression. Not > spreading FUD, just saying stuff like this throws users for a loop, case > in point. Using 'du' for file sizes (without -A option) is wrong in the first place. Any program or script that is using it in such a way is broken and should be corrected. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 10:45:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9DC7E106564A for ; Mon, 13 Jun 2011 10:45:08 +0000 (UTC) (envelope-from prvs=1145dbc1e5=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 293B18FC17 for ; Mon, 13 Jun 2011 10:45:07 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 11:44:02 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 11:44:02 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50013571279.msg for ; Mon, 13 Jun 2011 11:44:01 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1145dbc1e5=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <3DA28334D5774636A0DDEB48D8A43A91@multiplay.co.uk> From: "Steven Hartland" To: =?iso-8859-1?Q?Simun_Mikecin?= , "Jeremy Chadwick" , References: <20110613094803.GA10290@icarus.home.lan> Date: Mon, 13 Jun 2011 11:44:23 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6090 Cc: Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 10:45:08 -0000 ----- Original Message ----- From: "Simun Mikecin" > Using 'du' for file sizes (without -A option) is wrong in the first place. > Any program or script that is using it in such a way is broken and should be > corrected. That's not true, -A displays the apparent size not the actual disk usage which is what we want. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 10:56:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9255B106564A for ; Mon, 13 Jun 2011 10:56:48 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1B31D8FC12 for ; Mon, 13 Jun 2011 10:56:47 +0000 (UTC) Received: by bwz12 with SMTP id 12so5316941bwz.13 for ; Mon, 13 Jun 2011 03:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=Y4pICHgw95NQYbd4PH6A2b5pTRbCS9OgaMoy8pb11os=; b=quy3yzlpdT5j+nXc522F9fEXAqoRpERHnRHi8bjbid3ikJJLTZVcCEq7gnxBFtnXz9 zGHMBo8QgR9Hm1vMBILVYfUl1yNODlcOoCau5K3CrvdVEWvV5XIdDrVUIrUHGLuII7/S mW6t3c37/yF6UT4MGakHXPrtrxYs7jGsh3P60= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=rO+t/m7jj97xl+LJzNOmJ2spbbwzTCdp4wsXhU99SePZTz0J0KEtyxxEulEAf3lD5C ugjwdrbRuapjrF+J7iDBbboVDEdIX6yzqfafBTPqT+ehbzaP0qIi8sXpzy4NOf3PuuPW 6jvu3OC+V56cf2bD/HhgUiWoDEmEZwM4t/WlA= MIME-Version: 1.0 Received: by 10.204.130.16 with SMTP id q16mr4441609bks.192.1307962606926; Mon, 13 Jun 2011 03:56:46 -0700 (PDT) Received: by 10.204.180.139 with HTTP; Mon, 13 Jun 2011 03:56:46 -0700 (PDT) In-Reply-To: <3DA28334D5774636A0DDEB48D8A43A91@multiplay.co.uk> References: <20110613094803.GA10290@icarus.home.lan> <3DA28334D5774636A0DDEB48D8A43A91@multiplay.co.uk> Date: Mon, 13 Jun 2011 12:56:46 +0200 Message-ID: From: =?UTF-8?Q?=C5=A0imun_Mikecin?= To: Steven Hartland , freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 10:56:48 -0000 2011/6/13 Steven Hartland > Using 'du' for file sizes (without -A option) is wrong in the first place. >> Any program or script that is using it in such a way is broken and should >> be >> corrected. >> > > That's not true, -A displays the apparent size not the actual disk usage > which is what we want. Disk usage is not equal to file size. 'ls -al' shows file size, 'du' shows disk usage. Use the one you need, but don't expect them to be the same thing. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 10:58:08 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F9AF106564A for ; Mon, 13 Jun 2011 10:58:08 +0000 (UTC) (envelope-from prvs=1145dbc1e5=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id CD02E8FC1B for ; Mon, 13 Jun 2011 10:58:07 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 11:57:02 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 11:57:02 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50013571354.msg for ; Mon, 13 Jun 2011 11:57:02 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1145dbc1e5=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" References: <20110613094803.GA10290@icarus.home.lan> Date: Mon, 13 Jun 2011 11:57:22 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6090 Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 10:58:08 -0000 ----- Original Message ----- From: "Jeremy Chadwick" > Well-known "quirk"; welcome to ZFS. :-) The following article is long, > but if you grab a coffee and read it in full, it'll shed some light on > the ordeal: > > http://www.cuddletech.com/blog/pivot/entry.php?id=983 > > There's also this: > > http://blog.buttermountain.co.uk/2008/05/10/zfs-compression-when-du-and-ls-appear-to-disagree/ > > This is one of the many reasons I do not use ZFS compression. Not > spreading FUD, just saying stuff like this throws users for a loop, case > in point. I think your miss-understanding my question, its not the fact that its showing different sizes from du and ls, that's 100% expected but clearly 8million rows of 3 int's can't possibly compress down to 7.5K. Having just looked back at the machine, an hour later, the values now seem correct with du showing:- 278M detail.ibd I checked this several times, over what had to be 10mins or more even did a flush tables to ensure everything had been written out as far as mysql was concerned. So it seems that zfs was still processing the file for a good amount of time, and during that time was showing incorrect disk usage for said file. I'm wondering if the data is some how being processed in l2 arc or something? For reference we're running 8.2-RELEASE, on an areca backed raid6 with two ssd drives in l2 arc. zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 da0p3 ONLINE 0 0 0 cache ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 errors: No known data errors Obviously everything seems to have caught up and is now showing real stats but confused as to why it would take quite so long to display the real usage via du. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 11:07:02 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A91DB106564A for ; Mon, 13 Jun 2011 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8E2418FC22 for ; Mon, 13 Jun 2011 11:07:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5DB72RX092056 for ; Mon, 13 Jun 2011 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5DB71vD092054 for freebsd-fs@FreeBSD.org; Mon, 13 Jun 2011 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 13 Jun 2011 11:07:02 GMT Message-Id: <201106131107.p5DB71vD092054@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 11:07:02 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/157728 fs [zfs] zfs (v28) incremental receive may leave behind t o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157684 fs [nfs] NFSv4 ignoring "-ro" option in exports file o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip f kern/157365 fs [nfs] cannot umount an nfs from dead server o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156933 fs [zfs] ZFS receive after read on readonly=on filesystem o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs f kern/130133 fs [panic] [zfs] 'kmem_map too small' caused by make clea o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi f kern/126703 fs [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/120210 fs [zfs] [panic] reboot after panic: solaris assert: arc_ o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using f kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro f kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 234 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 11:11:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0581106566C for ; Mon, 13 Jun 2011 11:11:08 +0000 (UTC) (envelope-from prvs=1145dbc1e5=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 676CE8FC1A for ; Mon, 13 Jun 2011 11:11:07 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 12:10:34 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 12:10:34 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50013571502.msg for ; Mon, 13 Jun 2011 12:10:33 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1145dbc1e5=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <7BE33B31ED3440A19C9B86EFF32563D3@multiplay.co.uk> From: "Steven Hartland" To: =?iso-8859-1?Q?Simun_Mikecin?= , References: <20110613094803.GA10290@icarus.home.lan><3DA28334D5774636A0DDEB48D8A43A91@multiplay.co.uk> Date: Mon, 13 Jun 2011 12:10:54 +0100 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6090 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 11:11:09 -0000 I wasn't but 7.5K vs 620MB not gonna happen ;-) ----- Original Message -----=20 From: Simun Mikecin=20 To: Steven Hartland ; freebsd-fs@freebsd.org=20 Sent: Monday, June 13, 2011 11:56 AM Subject: Re: Impossible compression ratio on ZFS 2011/6/13 Steven Hartland Using 'du' for file sizes (without -A option) is wrong in the first place. Any program or script that is using it in such a way is broken and should be corrected. That's not true, -A displays the apparent size not the actual disk usage which is what we want. Disk usage is not equal to file size. 'ls -al' shows file size, 'du' shows disk usage. Use the one you need, but don't expect them to be the same thing. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.=20 In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 11:13:40 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DF1A106564A for ; Mon, 13 Jun 2011 11:13:40 +0000 (UTC) (envelope-from edhoprima@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id BF53F8FC13 for ; Mon, 13 Jun 2011 11:13:39 +0000 (UTC) Received: by bwz12 with SMTP id 12so5332789bwz.13 for ; Mon, 13 Jun 2011 04:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=8z9MaYXYt+WxnTTG+UP4bYy9Zxyhch23Sme5sLWzeqY=; b=EulINdk4MjFNX9oXjtwZNuD8JeQjda+nVVaQFA3nSjEXds9ywn0WRKJYehUzXSi2/8 5oAGY5XWaLpA0gV3ntQp2vfrYm0JSD9svYy4XL7pKT+SsdvNIm9V4CAMWcX8uyyDzSW/ y7wz1iJJ6ZJs9Fn8i42YsyP8seebqMuna1ckw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=CKPnz7SMwIV1JhbQC2R3mI07nfO7E6e+S8NdzaK5zOvvr7l8ZwF1v9e5bv6fDJTrKt iUz/KmMfRabfhUnDn7NX7ZhfJmoqdvmkcMh/DXIdTaPKtfdN3wqUhwKFw222IHSsuzbK yO5knHgjikSNEh9DlFibhybkDsxQOcBmJuLeI= Received: by 10.204.74.67 with SMTP id t3mr149986bkj.43.1307962154120; Mon, 13 Jun 2011 03:49:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.103.15 with HTTP; Mon, 13 Jun 2011 03:48:54 -0700 (PDT) In-Reply-To: References: From: Edho P Arief Date: Mon, 13 Jun 2011 17:48:54 +0700 Message-ID: To: Steven Hartland Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 11:13:40 -0000 On Mon, Jun 13, 2011 at 4:13 PM, Steven Hartland wrote: > I've just done an optimisation of a mysql table which is located > on a compressed zfs partition and spotted that the size reported > by du is impossibly small for the file:- > > du -h detail* > 1.5K =C2=A0 =C2=A0detail.frm > 7.5K =C2=A0 =C2=A0detail.ibd > > ls -l detail* > -rw-rw---- =C2=A01 mysql =C2=A0mysql =C2=A0 =C2=A0 =C2=A0 8660 Jun 13 10:= 00 detail.frm > -rw-rw---- =C2=A01 mysql =C2=A0mysql =C2=A0650117120 Jun 13 10:04 detail.= ibd > > The table format for those interested is 3 int's, 3 indexes and > contains 8million rows. > > I highly doubt that my 620MB table is taking up just 7.5K on disk > any ideas? > you can try gzipping the file to verify this... From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 12:26:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2D7E8106564A for ; Mon, 13 Jun 2011 12:26:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 56B8B8FC13 for ; Mon, 13 Jun 2011 12:26:04 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5DCQ0In016212 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 13 Jun 2011 22:26:01 +1000 Date: Mon, 13 Jun 2011 22:26:00 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: =?UTF-8?Q?=C5=A0imun_Mikecin?= In-Reply-To: Message-ID: <20110613222215.J1792@besplex.bde.org> References: <20110613094803.GA10290@icarus.home.lan> <3DA28334D5774636A0DDEB48D8A43A91@multiplay.co.uk> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-384147912-1307967960=:1792" Cc: freebsd-fs@freebsd.org Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 12:26:05 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-384147912-1307967960=:1792 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 13 Jun 2011, [UTF-8] =C5=A0imun Mikecin wrote: > 2011/6/13 Steven Hartland > >> Using 'du' for file sizes (without -A option) is wrong in the first plac= e. >>> Any program or script that is using it in such a way is broken and shou= ld >>> be >>> corrected. You mean that any script that uses du with the -A option is broken. -A just shows the file size in bad units. >> That's not true, -A displays the apparent size not the actual disk usage >> which is what we want. > > Disk usage is not equal to file size. 'ls -al' shows file size, 'du' show= s > disk usage. > Use the one you need, but don't expect them to be the same thing. Indeed, the file size is only vaguely related to the disk usage. Normal du shows disk usage including increases of it due to metadata and decreases of it due to compression or sparseness (which is a particular type of compression). du -A might be useful if the file is to be copied to another file system with no compression at all. Bruce --0-384147912-1307967960=:1792-- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 19:23:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5C5D106566C for ; Mon, 13 Jun 2011 19:23:47 +0000 (UTC) (envelope-from mirror176@hotmail.com) Received: from snt0-omc4-s30.snt0.hotmail.com (snt0-omc4-s30.snt0.hotmail.com [65.55.90.233]) by mx1.freebsd.org (Postfix) with ESMTP id 8BF808FC13 for ; Mon, 13 Jun 2011 19:23:47 +0000 (UTC) Received: from SNT105-W27 ([65.55.90.200]) by snt0-omc4-s30.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 13 Jun 2011 12:23:46 -0700 Message-ID: X-Originating-IP: [24.56.42.84] From: Edward Sutton To: Date: Mon, 13 Jun 2011 12:23:46 -0700 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 13 Jun 2011 19:23:46.0901 (UTC) FILETIME=[6A9D8850:01CC29FF] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: zfs mirror: 1 disk lost, corrupted other disk. crashes zfs tools and panics system X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 19:23:48 -0000 Wanted to report from a state I could describe without confusion of steps= I had taken. Restored DD copy and ran `zpool import -fFX zroot` from the s= ame -current disk mentioned earlier and this time it did roll back a few mi= nutes and gave access to the corrupt filesystems. I must have touched it mo= re with an older copy every time though I thought that was not the first ti= me I started with -current. -X still is not defined in the manpage but is r= eferenced a lot online. I still have the disk copy that can be restored in the corrupt form if an= yone is interested in fixing issues such as scrubbing without the v28 rollb= ack taking place. The scrub was when it got stuck in a panic on import beca= use the scrub would resume and would cause a panic in doing so. Current was prone to occasional random crashes on my system so it took a = while to copy off the data in a successful run=2C though that may be my har= dware (which seems most stable after being up and running for a few hours u= nder full cpu load such as boinc). = From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 20:05:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A7B0106566B for ; Mon, 13 Jun 2011 20:05:38 +0000 (UTC) (envelope-from cmdlnkid@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 085058FC13 for ; Mon, 13 Jun 2011 20:05:37 +0000 (UTC) Received: by iwn33 with SMTP id 33so6043327iwn.13 for ; Mon, 13 Jun 2011 13:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:date:from:to:cc:subject:message-id :references:mime-version:content-type:content-disposition :in-reply-to; bh=NQM9je2H97xggKWpVRLPsEJNQv2ClTzzbTJ2GJJqZIA=; b=mzR7IaB1ZraI/5LBDyh+9G0Q9UPSdRHMoaBR6ToDTYj7yjVymnMz9Khv8unSLAdIDI vCnyEVXzEzm7YhBMJTBomAGv/IPESM+DaJHD+7ZMlaKz1L+UXlpcN/EX0JGCloaxnY9P Q6yDS3C7e7Cwh+opo0V05sdNG/cE5hKSVMtWA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; b=Wn5EW7sWMhTtMyPS4pmLD/7oCtAEDjuFQ+i8hu46daw/Tipptj0TT2kjlY25TxDc7J K1+fw+xAh8jxR+V0NhEvWgcNaPMzffEUPGFWy5KuULUDxwFU6+jAKOQs2UfJVAPaP23y KL79D/TlgI/lCFSxR0FRlY2KKEAkx/Hkbj5ls= Received: by 10.231.34.1 with SMTP id j1mr5970744ibd.87.1307993734221; Mon, 13 Jun 2011 12:35:34 -0700 (PDT) Received: from DataIX.net (adsl-99-181-139-216.dsl.klmzmi.sbcglobal.net [99.181.139.216]) by mx.google.com with ESMTPS id q15sm2924858ibb.48.2011.06.13.12.35.32 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 13 Jun 2011 12:35:33 -0700 (PDT) Sender: The Command Line Kid Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.4/8.14.4) with ESMTP id p5DJZUZp033334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 13 Jun 2011 15:35:30 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.4/8.14.4/Submit) id p5DJZTNZ033324; Mon, 13 Jun 2011 15:35:29 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Mon, 13 Jun 2011 15:35:29 -0400 From: jhell To: Steven Hartland Message-ID: <20110613193529.GA21103@DataIX.net> References: <20110613094803.GA10290@icarus.home.lan> <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sdtB3X0nJg68CQEu" Content-Disposition: inline In-Reply-To: <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> Cc: freebsd-fs@freebsd.org Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 20:05:38 -0000 --sdtB3X0nJg68CQEu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 13, 2011 at 11:57:22AM +0100, Steven Hartland wrote: > ----- Original Message -----=20 > From: "Jeremy Chadwick" > =20 > > Well-known "quirk"; welcome to ZFS. :-) The following article is long, > > but if you grab a coffee and read it in full, it'll shed some light on > > the ordeal: > >=20 > > http://www.cuddletech.com/blog/pivot/entry.php?id=3D983 > >=20 > > There's also this: > >=20 > > http://blog.buttermountain.co.uk/2008/05/10/zfs-compression-when-du-and= -ls-appear-to-disagree/ > >=20 > > This is one of the many reasons I do not use ZFS compression. Not > > spreading FUD, just saying stuff like this throws users for a loop, case > > in point. >=20 > I think your miss-understanding my question, its not the fact that its > showing different sizes from du and ls, that's 100% expected but clearly > 8million rows of 3 int's can't possibly compress down to 7.5K. >=20 > Having just looked back at the machine, an hour later, the values now > seem correct with du showing:- > 278M detail.ibd >=20 > I checked this several times, over what had to be 10mins or more even > did a flush tables to ensure everything had been written out as far > as mysql was concerned. >=20 > So it seems that zfs was still processing the file for a good amount of > time, and during that time was showing incorrect disk usage for said file. >=20 > I'm wondering if the data is some how being processed in l2 arc or > something? >=20 > For reference we're running 8.2-RELEASE, on an areca backed raid6 with > two ssd drives in l2 arc. >=20 > zpool status > pool: tank > state: ONLINE > scrub: none requested > config: >=20 > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > cache > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 >=20 > errors: No known data errors >=20 > Obviously everything seems to have caught up and is now showing real > stats but confused as to why it would take quite so long to display > the real usage via du. >=20 Hi Steve, Knowing that there were patches out for v28 on 8.X can you confirm that in fact you are using v15 ZFS ? I would assume you are because of the release but I don't want to do that. If you happen to have patched up to v28 did you turn dedup on.? if so I would expect the behavior your seeing with the data not being written right away. If not, then seeing you have compression turned on... did you just dump that whole table into the database ? its quite possible that the compression was still happening in ARC before it was finally written out and this would also explain why that happened. Also what level of compression are you using ? --sdtB3X0nJg68CQEu Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJN9mZ/AAoJEJBXh4mJ2FR+6ucH/RW82Bh9i0AAJ56m3Ojx+GqY BdoizrBBoJrxAqu+XpvMU/P4B94TAfe921ZOE1GH9fy2eZzthh9uzQ/329+BqJ5Z Lnvq0AdmZFfO2xFiGvABnBkBNSCXQUNM/Yh4EGpKXZmP5Ga69o8845Fm++0xC4sr 7pyrXNUTpQtDUfN/BABnjE52MA6VUxVUPsSqMnQ/ugN5fLLOmHbKJETCoPkBRdEX +aZEBAmikP02Y+K+Jo5YseWy92m/B2pH2DTVMZN9nyoZVbgppeacEasG09Kl4q02 gnNDsGd4lNBgFtg3akev2so7xDmB2FzX5dBGSuOSMhfJ7uGhcBeNHz5k+geUJ34= =a5WH -----END PGP SIGNATURE----- --sdtB3X0nJg68CQEu-- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 20:15:53 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 771FD106566C; Mon, 13 Jun 2011 20:15:53 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 22D608FC08; Mon, 13 Jun 2011 20:15:52 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id CE8F245C98; Mon, 13 Jun 2011 22:15:50 +0200 (CEST) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id B9E9045683; Mon, 13 Jun 2011 22:15:45 +0200 (CEST) Date: Mon, 13 Jun 2011 22:15:43 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110613201543.GA1733@garage.freebsd.pl> References: <4DF25544.3020301@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy" Content-Disposition: inline In-Reply-To: <4DF25544.3020301@FreeBSD.org> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: fs@FreeBSD.org Subject: Re: Drop of spa_namespace lock in vdev_geom.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 20:15:53 -0000 --KsGdsel6WgEHnImy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 10, 2011 at 11:32:52AM -0600, Justin T. Gibbs wrote: > Dropping and reacquiring the spa_namespace lock in vdev_geom_open() > creates a lock order reversal with the spa_config locks. As the > spa_config locks are not standard mutexes, witness will not warn > about this issue. I only noticed this problem when debugging a ZFS > deadlock. The deadlock can be triggered anytime that there are > multiple insert/remove processes going on (e.g. vdev orphan processing > while a fault management daemon is onlining a replacement device for > some other vdev). >=20 > I haven't noticed any issues with just holding the namespace lock > for the duration of the open. Does anyone know why this lock drop > was added in v28? I did that as part of @182208 to fix another LOR. Full commit log: Change 182208 on 2010/08/10 by pjd@pjd_zoo OpenSolaris switched to lazy creation of /dev/ entires for ZVOLs. It creates /dev/ entries on VOP_LOOKUP() or VOP_READDIR(). This of course can't work this way on FreeBSD with GEOM, so we need to create ZVOL providers where appropriate. I found the following cases: 1. Pool first open (pool is loaded based on zpool/cache configuration and is then opened for a first time on eg. zfs mount). 2. Pool import. It's not the same as 1. 3. ZVOL creation: zfs create -V . 4. Creation of ZVOL snapshot, this includes recursive snapshot creation. To make it work I had to fix LOR between the zfsdev_state_lock, the GEOM topology lock and the spa_namespace_lock. They are now always obtained in the following order: 1. zfsdev_state_lock 2. g_topology_lock 3. spa_namespace_lock Also, we can't use taskqueue to scan for VDEVs as this introduces deadlock (because there is no way to honour the order above). This also allows to simplify vdev_geom.c quite a bit as it is no longer a problem to taste ZVOL or ZVOL-based provider. Update /etc/rc.d/zvol as there are no longer volinit and volfini subcommands to zfs(8). --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --KsGdsel6WgEHnImy Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk32b+8ACgkQForvXbEpPzTTMQCeOpNr4VS569h9QhAbnCGgVqh/ cI8AoOS/q1Y0dNsRP2hBO2KYWdtwnWUU =SQsf -----END PGP SIGNATURE----- --KsGdsel6WgEHnImy-- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 20:20:40 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30D891065673; Mon, 13 Jun 2011 20:20:40 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id C73ED8FC16; Mon, 13 Jun 2011 20:20:39 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 9267E45C8C; Mon, 13 Jun 2011 22:20:38 +0200 (CEST) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 87A4E45685; Mon, 13 Jun 2011 22:20:33 +0200 (CEST) Date: Mon, 13 Jun 2011 22:20:31 +0200 From: Pawel Jakub Dawidek To: "K. Macy" Message-ID: <20110613202031.GB1733@garage.freebsd.pl> References: <4DF25544.3020301@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ftEhullJWpWg/VHq" Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: fs@freebsd.org Subject: Re: Drop of spa_namespace lock in vdev_geom.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 20:20:40 -0000 --ftEhullJWpWg/VHq Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 10, 2011 at 09:35:08PM +0200, K. Macy wrote: > On Fri, Jun 10, 2011 at 7:32 PM, Justin T. Gibbs wrot= e: > > Dropping and reacquiring the spa_namespace lock in vdev_geom_open() > > creates a lock order reversal with the spa_config locks. =A0As the > > spa_config locks are not standard mutexes, witness will not warn > > about this issue. >=20 > The real problem is that WITNESS is disabled on the sx locks used for > mutex compatibility in ZFS. This questionable decision has made > debugging deadlocks quite painful on a number of occasions. I think > this choice should be revisited and perhaps special workaround shims > added for cases where cv_wait is called. WITNESS is disabled only if you are compiling ZFS without debug. This was done because of huge number of false-positive LOR reports from users. If you are developing ZFS you should have debug turned on anyway. I had a patch for additional sx-creation flag to tell witness that we don't want LOR reports, but we still want to track the lock itself, but there were no agreement among people I talked to about that, so it never went in. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --ftEhullJWpWg/VHq Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk32cQ4ACgkQForvXbEpPzSgMgCfYTiI2LqbDzOBfnBeItLGY7NH BRYAoNZ3G4LcZugdHtnjJZ1vK2SBhMWu =JE2c -----END PGP SIGNATURE----- --ftEhullJWpWg/VHq-- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 20:21:01 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 125D31065670; Mon, 13 Jun 2011 20:21:01 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id D7F758FC12; Mon, 13 Jun 2011 20:21:00 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5DKMLPE008685 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 13 Jun 2011 14:22:21 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) Message-ID: <4DF67126.50404@FreeBSD.org> Date: Mon, 13 Jun 2011 14:20:54 -0600 From: "Justin T. Gibbs" Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4DF25544.3020301@FreeBSD.org> <20110613201543.GA1733@garage.freebsd.pl> In-Reply-To: <20110613201543.GA1733@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Mon, 13 Jun 2011 14:22:21 -0600 (MDT) Cc: fs@FreeBSD.org Subject: Re: Drop of spa_namespace lock in vdev_geom.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gibbs@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 20:21:01 -0000 On 6/13/11 2:15 PM, Pawel Jakub Dawidek wrote: > On Fri, Jun 10, 2011 at 11:32:52AM -0600, Justin T. Gibbs wrote: >> Dropping and reacquiring the spa_namespace lock in vdev_geom_open() >> creates a lock order reversal with the spa_config locks. As the >> spa_config locks are not standard mutexes, witness will not warn >> about this issue. I only noticed this problem when debugging a ZFS >> deadlock. The deadlock can be triggered anytime that there are >> multiple insert/remove processes going on (e.g. vdev orphan processing >> while a fault management daemon is onlining a replacement device for >> some other vdev). >> >> I haven't noticed any issues with just holding the namespace lock >> for the duration of the open. Does anyone know why this lock drop >> was added in v28? > I did that as part of @182208 to fix another LOR. Full commit log: > > Change 182208 on 2010/08/10 by pjd@pjd_zoo ... > To make it work I had to fix LOR between the zfsdev_state_lock, the > GEOM topology lock and the spa_namespace_lock. They are now always > obtained in the following order: > 1. zfsdev_state_lock > 2. g_topology_lock > 3. spa_namespace_lock > Also, we can't use taskqueue to scan for VDEVs as this introduces > deadlock (because there is no way to honour the order above). I'll review these code paths later today and see if I can find an alternate way to resolve the issue. -- Justin From owner-freebsd-fs@FreeBSD.ORG Mon Jun 13 22:51:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D1A0106566B for ; Mon, 13 Jun 2011 22:51:30 +0000 (UTC) (envelope-from prvs=1145dbc1e5=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 93F4A8FC08 for ; Mon, 13 Jun 2011 22:51:29 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 23:50:28 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 13 Jun 2011 23:50:28 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50013577816.msg for ; Mon, 13 Jun 2011 23:50:27 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1145dbc1e5=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "jhell" References: <20110613094803.GA10290@icarus.home.lan> <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> <20110613193529.GA21103@DataIX.net> Date: Mon, 13 Jun 2011 23:50:50 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6090 Cc: freebsd-fs@freebsd.org Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2011 22:51:30 -0000 ----- Original Message ----- From: "jhell" To: "Steven Hartland" Cc: "Jeremy Chadwick" ; Sent: Monday, June 13, 2011 8:35 PM Subject: Re: Impossible compression ratio on ZFS > > Hi Steve, > > Knowing that there were patches out for v28 on 8.X can you confirm that > in fact you are using v15 ZFS ? I would assume you are because of the > release but I don't want to do that. Confirmed this is a pure 8.2 release build machine no additional patches except for compiling libz without assembly optimisations as thats known to cause crashes. Specifically the following as directed by Xin LI:- cd /usr/src/lib/libz make cleandir make cleandir (yes, do it the second time) make MACHINE_ARCH=x86_64 obj depend all make MACHINE_ARCH=x86_64 install > If not, then seeing you have compression turned on... did you just dump > that whole table into the database ? its quite possible that the > compression was still happening in ARC before it was finally written out > and this would also explain why that happened. The table was just rebuilt due to changing an index, so in effect yes the data would have been copied from the old table into a fresh new copy and then renamed. > Also what level of compression are you using ? Standard lzjb, which is achieving 1.9 overall and 2.45 on this table file. Does indeed sound like this data was still being processed in some way but surprised it took quite so long to show something other than the initial file creation size. Its not a big issue in this case, but does raise concerns that if it wasn't showing the "correct" file size that the data may not have been commited to disk, hence could have been unsafe for this quite extended period. Setting that may be relavent in the case within mysql are:- innodb_log_file_size = 1024M innodb_log_buffer_size = 8M innodb_flush_method = O_DIRECT innodb_use_native_aio = 1 So its possible that the table was in the innodb log, but I've never witnessed that before tbh but its also only very recently we have moved our db server from ufs to zfs, hence the questions. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 07:19:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15C5C106567B for ; Tue, 14 Jun 2011 07:19:37 +0000 (UTC) (envelope-from pvz@itassistans.se) Received: from zcs1.itassistans.net (zcs1.itassistans.net [212.112.191.37]) by mx1.freebsd.org (Postfix) with ESMTP id 84BBE8FC14 for ; Tue, 14 Jun 2011 07:19:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs1.itassistans.net (Postfix) with ESMTP id 8B912C0258 for ; Tue, 14 Jun 2011 09:19:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at zcs1.itassistans.net Received: from zcs1.itassistans.net ([127.0.0.1]) by localhost (zcs1.itassistans.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0ZwwLVVtkbVo for ; Tue, 14 Jun 2011 09:19:34 +0200 (CEST) Received: from [192.168.1.239] (c213-89-160-61.bredband.comhem.se [213.89.160.61]) by zcs1.itassistans.net (Postfix) with ESMTPSA id 18E1DC01C5 for ; Tue, 14 Jun 2011 09:19:34 +0200 (CEST) From: Per von Zweigbergk Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 14 Jun 2011 09:19:32 +0200 Message-Id: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 07:19:37 -0000 I've been following the "Impossible compression ratio on ZFS" thread = with some interest, and it made me ask myself this: Let us say we have a hypothetical zfs filesystem with the equally = hypothetical files A and B. The filesystem has deduplication enabled. = Both files have an apparent file size of 100 MB, but 50 MB of that data = is common between the two files and thus can be deduplicated. This would = mean that total disk usage would be 150 MB. If you use "du" to determine disk size for a deduplication, what would = be the result? Which file would the common data be accounted to? Or = would it be accounted to both files somehow, in part or in full?= From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 11:47:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD3971065670 for ; Tue, 14 Jun 2011 11:47:59 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 697B08FC0A for ; Tue, 14 Jun 2011 11:47:59 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QWS6A-0005s2-CY for freebsd-fs@freebsd.org; Tue, 14 Jun 2011 13:47:58 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 14 Jun 2011 13:47:58 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 14 Jun 2011 13:47:58 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Tue, 14 Jun 2011 13:47:46 +0200 Lines: 7 Message-ID: References: <20110613094803.GA10290@icarus.home.lan> <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> X-Enigmail-Version: 1.1.2 Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 11:47:59 -0000 On 13/06/2011 12:57, Steven Hartland wrote: > So it seems that zfs was still processing the file for a good amount of > time, and during that time was showing incorrect disk usage for said file. Yes, this is my observation also. "du" display will not "settle down" on ZFS until some time after the last write. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 15:06:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BABD106566B for ; Tue, 14 Jun 2011 15:06:21 +0000 (UTC) (envelope-from cmdlnkid@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id EA5568FC1A for ; Tue, 14 Jun 2011 15:06:20 +0000 (UTC) Received: by iwn33 with SMTP id 33so6972265iwn.13 for ; Tue, 14 Jun 2011 08:06:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:date:from:to:cc:subject:message-id :references:mime-version:content-type:content-disposition :in-reply-to; bh=L5RUTa7PWv41O6F03ytSG98fWT+cMnnl4xDlYSHmW+Q=; b=eaDQG8RKU2Qu/vmqtwAxVn6Y3cCCV8wGtPy8WPPhHMEeicRwdKoU7wknbv7Ifk8ClJ F0yV0tsGMnOIF1/0Qyk/Wz9Btmq7gmqr9OfhLXuYcNBhFfjpn6Gpo03gPmpPGUs0AQan d7EwHH5xX2YsROghQ4u3kBvl9cZurXwbjXu0I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; b=JJOX6uYI4rOesPTso1nFoq91Vmi/81bjXKKp0gGnO2A8sSUWFqdOSTiIDqdbRvdS8I +GWJ6EKWJzNJPuK9qy0SLiL/PPApVpG81D95N1JkADb+bWE0vdkE1xjU1wmDpiYdLVkx GsTapOEwgyPfC/kBwBQU0XzOGHrrCDv1FxqiA= Received: by 10.42.138.129 with SMTP id c1mr8542398icu.249.1308063979182; Tue, 14 Jun 2011 08:06:19 -0700 (PDT) Received: from DataIX.net (adsl-99-181-139-216.dsl.klmzmi.sbcglobal.net [99.181.139.216]) by mx.google.com with ESMTPS id hp8sm5716665icc.11.2011.06.14.08.06.18 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 14 Jun 2011 08:06:18 -0700 (PDT) Sender: The Command Line Kid Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.4/8.14.4) with ESMTP id p5EF6EDp078523 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 14 Jun 2011 11:06:16 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.4/8.14.4/Submit) id p5EF6DCq078522; Tue, 14 Jun 2011 11:06:13 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Tue, 14 Jun 2011 11:06:13 -0400 From: jhell To: Per von Zweigbergk Message-ID: <20110614150613.GB27199@DataIX.net> References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="QKdGvSO+nmPlgiQ/" Content-Disposition: inline In-Reply-To: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> Cc: freebsd-fs@freebsd.org Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 15:06:21 -0000 --QKdGvSO+nmPlgiQ/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 14, 2011 at 09:19:32AM +0200, Per von Zweigbergk wrote: > I've been following the "Impossible compression ratio on ZFS" thread with= some interest, and it made me ask myself this: >=20 > Let us say we have a hypothetical zfs filesystem with the equally hypothe= tical files A and B. The filesystem has deduplication enabled. Both files h= ave an apparent file size of 100 MB, but 50 MB of that data is common betwe= en the two files and thus can be deduplicated. This would mean that total d= isk usage would be 150 MB. >=20 > If you use "du" to determine disk size for a deduplication, what would be= the result? Which file would the common data be accounted to? Or would it = be accounted to both files somehow, in part or in full? Logical answer would be that both files should be showing thier resulting size regardless of how ZFS processes them. Being deduped does not mean representing files to the user any different. --QKdGvSO+nmPlgiQ/ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJN93jkAAoJEJBXh4mJ2FR+dpIH/1c+Q47ZHLaRjqezauJ+GzWw QFyfu9AzvVxkIJDCtDuiSdp3/9l112cCOeaSpzA9MsNgjqxt2xq7TOlxBfP5wi6O 1PMNCv6geh+y/yc6nW6PVyUvzyPf4s1Lq+bWRwN+Tb12t+ttKxY/7G7Pa/M8waOm xonxaRXtCDNmGr2OpuBbo/rYpsIY6CoBuGsxwl0KM5HA+kGTvg0+MpYcsVAIodgN DliDxUKvvPMY9cu/z8vfpC58TFFUObmn9JsVYMV44+rPYW7PgBabv9LfJZ6xxJeV bpEAuxmwIc9ywCFMBU/kzsZhCwhTnKQn7XfwDJSkJCAx2pQR6jWFu1pqvrRiskA= =Cjdv -----END PGP SIGNATURE----- --QKdGvSO+nmPlgiQ/-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 15:12:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C119106564A for ; Tue, 14 Jun 2011 15:12:08 +0000 (UTC) (envelope-from pvz@itassistans.se) Received: from zcs1.itassistans.net (zcs1.itassistans.net [212.112.191.37]) by mx1.freebsd.org (Postfix) with ESMTP id 023EF8FC14 for ; Tue, 14 Jun 2011 15:12:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs1.itassistans.net (Postfix) with ESMTP id A607EC025F; Tue, 14 Jun 2011 17:12:06 +0200 (CEST) X-Virus-Scanned: amavisd-new at zcs1.itassistans.net Received: from zcs1.itassistans.net ([127.0.0.1]) by localhost (zcs1.itassistans.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3vbSitlNYlbs; Tue, 14 Jun 2011 17:12:06 +0200 (CEST) Received: from [192.168.1.239] (c213-89-160-61.bredband.comhem.se [213.89.160.61]) by zcs1.itassistans.net (Postfix) with ESMTPSA id 0EBD2C01C5; Tue, 14 Jun 2011 17:12:06 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Per von Zweigbergk In-Reply-To: <20110614150613.GB27199@DataIX.net> Date: Tue, 14 Jun 2011 17:12:05 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <61335943-0172-4483-A221-5C77CD8BAEFB@itassistans.se> References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> <20110614150613.GB27199@DataIX.net> To: jhell X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 15:12:08 -0000 14 jun 2011 kl. 17.06 skrev jhell: >=20 >=20 >=20 > On Tue, Jun 14, 2011 at 09:19:32AM +0200, Per von Zweigbergk wrote: >> I've been following the "Impossible compression ratio on ZFS" thread = with some interest, and it made me ask myself this: >>=20 >> Let us say we have a hypothetical zfs filesystem with the equally = hypothetical files A and B. The filesystem has deduplication enabled. = Both files have an apparent file size of 100 MB, but 50 MB of that data = is common between the two files and thus can be deduplicated. This would = mean that total disk usage would be 150 MB. >>=20 >> If you use "du" to determine disk size for a deduplication, what = would be the result? Which file would the common data be accounted to? = Or would it be accounted to both files somehow, in part or in full? >=20 > Logical answer would be that both files should be showing thier > resulting size regardless of how ZFS processes them. Being deduped = does > not mean representing files to the user any different. That would be the file size, yes, as opposed to the disk usage.= From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 15:15:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A365106566C for ; Tue, 14 Jun 2011 15:15:46 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id D6F348FC08 for ; Tue, 14 Jun 2011 15:15:45 +0000 (UTC) Received: by fxm11 with SMTP id 11so5615449fxm.13 for ; Tue, 14 Jun 2011 08:15:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=efXh4OfnNI/FUDTLrD0kLwzBQW4iv/3kIno4i4txhwY=; b=LjEDAGMDjIpC+i4pXFeTENtNRPT6szGl1phfL7EqnG5LDzc+BqxhdineIf5r71mHQ4 5zM/vOHUgBR/3mKy773+wvW3yr1a2HezzURfvYDGvft4QEr46Q8xdnVxRTVVvNxld9Kd AlreaXIvdGklwSgX9E8Lfd/X0nVFB/HXIG+Y8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=mB2sPlW9M1ZnnJRy+ZUpCaShRavaS3Ssb1ylSunoCnOY7N/9zDT00XqQrOZuYTiLWT MzqWFrESQNBqEWl5ENXFAaiXnPUnIBRpBtYwL/JAaQUyYf1/E1x+9qk/E2X6s6nvFnXm Ax2QeGKbyr1rnvCNLHFE3Aoc9203siKKpKD3s= MIME-Version: 1.0 Received: by 10.223.43.145 with SMTP id w17mr1388224fae.12.1308064544827; Tue, 14 Jun 2011 08:15:44 -0700 (PDT) Received: by 10.204.180.139 with HTTP; Tue, 14 Jun 2011 08:15:44 -0700 (PDT) In-Reply-To: <61335943-0172-4483-A221-5C77CD8BAEFB@itassistans.se> References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> <20110614150613.GB27199@DataIX.net> <61335943-0172-4483-A221-5C77CD8BAEFB@itassistans.se> Date: Tue, 14 Jun 2011 17:15:44 +0200 Message-ID: From: =?UTF-8?Q?=C5=A0imun_Mikecin?= To: Per von Zweigbergk , freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 15:15:46 -0000 2011/6/14 Per von Zweigbergk > >> If you use "du" to determine disk size for a deduplication, what would > be the result? Which file would the common data be accounted to? Or would it > be accounted to both files somehow, in part or in full? > > > > Logical answer would be that both files should be showing thier > > resulting size regardless of how ZFS processes them. Being deduped does > > not mean representing files to the user any different. > > That would be the file size, yes, as opposed to the disk usage. What about hard linked files? They are similar to dedup in this regard. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 15:17:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B79E0106566B for ; Tue, 14 Jun 2011 15:17:36 +0000 (UTC) (envelope-from pvz@itassistans.se) Received: from zcs1.itassistans.net (zcs1.itassistans.net [212.112.191.37]) by mx1.freebsd.org (Postfix) with ESMTP id 6D4178FC1A for ; Tue, 14 Jun 2011 15:17:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs1.itassistans.net (Postfix) with ESMTP id 6250DC0260; Tue, 14 Jun 2011 17:17:35 +0200 (CEST) X-Virus-Scanned: amavisd-new at zcs1.itassistans.net Received: from zcs1.itassistans.net ([127.0.0.1]) by localhost (zcs1.itassistans.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lVBq0lcb6ueY; Tue, 14 Jun 2011 17:17:34 +0200 (CEST) Received: from [192.168.1.239] (c213-89-160-61.bredband.comhem.se [213.89.160.61]) by zcs1.itassistans.net (Postfix) with ESMTPSA id A21DAC01C5; Tue, 14 Jun 2011 17:17:34 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=windows-1252 From: Per von Zweigbergk In-Reply-To: Date: Tue, 14 Jun 2011 17:17:33 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> <20110614150613.GB27199@DataIX.net> <61335943-0172-4483-A221-5C77CD8BAEFB@itassistans.se> To: =?windows-1252?Q?=8Aimun_Mikecin?= X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 15:17:36 -0000 14 jun 2011 kl. 17.15 skrev =8Aimun Mikecin: > What about hard linked files? They are similar to dedup in this = regard. I know that a hard linked file will show the same disk usage, so if for = example: $ du a 12345 a $ ln a b $ du a b 12345 a 12345 b $ But in this case it's not the entire file being hardlinked, rather just = some parts of the file being deduplicated so it's not exactly the same. = Or is it? This is why I asked on the mailing list. :-) From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 15:27:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A9601065670 for ; Tue, 14 Jun 2011 15:27:13 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 8A8C08FC29 for ; Tue, 14 Jun 2011 15:27:12 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p5EFR1eA056909 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 18:27:07 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DF77DC5.7030503@digsys.bg> Date: Tue, 14 Jun 2011 18:27:01 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110519 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> <20110614150613.GB27199@DataIX.net> <61335943-0172-4483-A221-5C77CD8BAEFB@itassistans.se> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 15:27:13 -0000 On 14.06.11 18:17, Per von Zweigbergk wrote: > > But in this case it's not the entire file being hardlinked, rather just some parts of the file being deduplicated so it's not exactly the same. Or is it? This is why I asked on the mailing list. :-) > > Consider, 'storage' is different than file allocation. With ZFS dedup, the storage (layer) decides whether to store new record, or to link it to an existing record. You have no control over this. If you ask how many blocks the file occupies in storage, that would be the entire file size. If some of the blocks are shared with other files (or whatever) that does not change how many blocks the file uses. It is different with compression. Daniel From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 20:26:52 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B68FF106566B for ; Tue, 14 Jun 2011 20:26:52 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (aslan.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 6FEE68FC1D for ; Tue, 14 Jun 2011 20:26:52 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5EKSDYQ015510 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 14:28:13 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF7C406.1080903@scsiguy.com> Date: Tue, 14 Jun 2011 14:26:46 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@FreeBSD.org Content-Type: multipart/mixed; boundary="------------060201040900000201070407" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Tue, 14 Jun 2011 14:28:13 -0600 (MDT) Cc: Subject: [CFR][ZFS] Show removed devices by GUID in zpool output. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 20:26:52 -0000 This is a multi-part message in MIME format. --------------060201040900000201070407 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit The current behavior of zpool_vdev_name() is to report the vdev path (e.g. /dev/da0) unless a vdev has the ZPOOL_CONFIG_NOT_PRESENT attribute set. This attribute is only set when a vdev is not found during import/mount of a pool. The attached patch also displays a vdev by GUID if it cannot be opened post import or is marked removed (e.g. via a GEOM orphan event). The main motivation for this change is that vdev paths are not unique to a physical leaf vdev. It is easy to get into a situation where you need to "detach /dev/da0" event though da0 is an active member of the same pool in which a "previous da0" was once removed. With zpool_vdev_name() reporting the GUID, the user is equipped to provide an unambiguous command that represents their desired action. -- Justin --------------060201040900000201070407 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="zpool.diffs" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="zpool.diffs" Index: libzfs_pool.c =================================================================== --- libzfs_pool.c (revision 223089) +++ libzfs_pool.c (working copy) @@ -3082,15 +3082,25 @@ char buf[64]; vdev_stat_t *vs; uint_t vsc; + int have_stats; + int have_path; - if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, - &value) == 0) { + have_stats = nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_VDEV_STATS, + (uint64_t **)&vs, &vsc) == 0; + have_path = nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) == 0; + + /* + * If the device is not currently present, assume it will not + * come back at the same device path. Display the device by GUID. + */ + if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, &value) == 0 || + have_path && have_stats && vs->vs_state <= VDEV_STATE_CANT_OPEN) { verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &value) == 0); (void) snprintf(buf, sizeof (buf), "%llu", (u_longlong_t)value); path = buf; - } else if (nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) == 0) { + } else if (have_path) { /* * If the device is dead (faulted, offline, etc) then don't @@ -3098,8 +3108,7 @@ * open a misbehaving device, which can have undesirable * effects. */ - if ((nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_VDEV_STATS, - (uint64_t **)&vs, &vsc) != 0 || + if ((have_stats == 0 || vs->vs_state >= VDEV_STATE_DEGRADED) && zhp != NULL && nvlist_lookup_string(nv, ZPOOL_CONFIG_DEVID, &devid) == 0) { --------------060201040900000201070407-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 20:51:56 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A057A106566C for ; Tue, 14 Jun 2011 20:51:56 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (www.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 724B88FC15 for ; Tue, 14 Jun 2011 20:51:56 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5EKrHk8015664 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 14:53:17 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF7C9E6.1030800@scsiguy.com> Date: Tue, 14 Jun 2011 14:51:50 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@FreeBSD.org Content-Type: multipart/mixed; boundary="------------080601000502040509010506" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Tue, 14 Jun 2011 14:53:17 -0600 (MDT) Cc: Subject: [CFR][ZFS] Show "previous device location" for removed vdevs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 20:51:56 -0000 This is a multi-part message in MIME format. --------------080601000502040509010506 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit When a vdev cannot be found during ZFS pool import/mount time, "zpool status" reports the device GUID and a "device was at" message as a user aid. This patch provides the same behavior when a device is removed post zpool mount/import. -- Justin --------------080601000502040509010506 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="zpool.diffs" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="zpool.diffs" diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c --- vendor/FreeBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c 2011-02-28 13:51:22.120585187 -0700 +++ SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c 2011-06-08 17:22:53.450540438 -0600 @@ -1084,7 +1209,8 @@ } if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, - ¬present) == 0) { + ¬present) == 0 || + vs->vs_state <= VDEV_STATE_CANT_OPEN) { char *path; verify(nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) == 0); (void) printf(" was %s", path); --------------080601000502040509010506-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 21:08:38 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9645C1065672 for ; Tue, 14 Jun 2011 21:08:38 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (aslan.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 637C08FC17 for ; Tue, 14 Jun 2011 21:08:38 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5EL9w9E015751 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 15:09:59 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF7CDD0.8040108@scsiguy.com> Date: Tue, 14 Jun 2011 15:08:32 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@FreeBSD.org Content-Type: multipart/mixed; boundary="------------090202020409040002080001" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Tue, 14 Jun 2011 15:09:59 -0600 (MDT) Cc: Subject: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 21:08:38 -0000 This is a multi-part message in MIME format. --------------090202020409040002080001 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit ZFS rightfully has a lot of safety belts in place to ward off unintended data loss. But in some scenarios, the safety belts are so restrictive, the only way to proceed is to wipe the label information off of a drive. Here's an example: Pull a drive that is active in a pool on one system and stick it into another system. ZFS will correctly reject this drive as a member of a new pool or as the argument of a replace command. But if you really want to use that drive, how do you clear it's "potentially active" status? If the pool were imported, you could destroy it, but ZFS wont allow you to import a pool unless there are sufficient members for it to serve I/O (I know about the undocumented -F option for import, but users aren't going to find that). You can use dd to wipe the label data off, but where exactly does ZFS keep its for copies of the label? "zpool labelclear" allows the user to unbuckle a few seatbelts in a controlled manner. Without the "-f" flag, it will allow you to wipe label data from a destroyed pool. With the "-f" flag, it will allow you to wipe metadata from "potentially active" vdevs - vdevs that are part of an exported pool, or are listed as active in a pool that is not mounted. It will not allow you to clear the label data from a vdev that is active in the system. Note: there are some const correctness fixes to zpool_state_to_name() in these patches. I added a zpool_pool_state_to_name() method with a const correct signature, and it made sense to me to update the sibling method at the same time. -- Justin --------------090202020409040002080001 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="label_clear.diffs" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="label_clear.diffs" diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool.8 SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool.8 --- vendor/FreeBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool.8 2011-02-28 13:51:22.118582660 -0700 +++ SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool.8 2011-06-06 17:28:33.578173928 -0600 @@ -82,6 +82,11 @@ .LP .nf +\fBzpool labelclear\fR [\fB-f\fR] \fIdevice\fR +.fi + +.LP +.nf \fBzpool list\fR [\fB-H\fR] [\fB-o\fR \fIproperty\fR[,...]] [\fIpool\fR] ... .fi @@ -1205,6 +1210,28 @@ .ne 2 .mk .na +\fB\fBzpool labelclear\fR [\fB-f\fR] \fIdevice\fR +.ad +.sp .6 +.RS 4n +Removes ZFS label information from the specified device. The device must not be part of an active pool configuration. +.sp +.ne 2 +.mk +.na +\fB\fB-f\fR\fR +.ad +.RS 12n +.rt +Treat exported or foreign devices as inactive. +.RE + +.RE + +.sp +.ne 2 +.mk +.na \fB\fBzpool list\fR [\fB-H\fR] [\fB-o\fR \fIprops\fR[,...]] [\fIpool\fR] ...\fR .ad .sp .6 diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c --- vendor/FreeBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c 2011-02-28 13:51:22.120585187 -0700 +++ SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c 2011-06-08 17:22:53.450540438 -0600 @@ -57,6 +57,7 @@ static int zpool_do_add(int, char **); static int zpool_do_remove(int, char **); +static int zpool_do_labelclear(int, char **); static int zpool_do_list(int, char **); static int zpool_do_iostat(int, char **); @@ -113,6 +114,7 @@ HELP_HISTORY, HELP_IMPORT, HELP_IOSTAT, + HELP_LABELCLEAR, HELP_LIST, HELP_OFFLINE, HELP_ONLINE, @@ -149,6 +151,8 @@ { "add", zpool_do_add, HELP_ADD }, { "remove", zpool_do_remove, HELP_REMOVE }, { NULL }, + { "labelclear", zpool_do_labelclear, HELP_LABELCLEAR }, + { NULL }, { "list", zpool_do_list, HELP_LIST }, { "iostat", zpool_do_iostat, HELP_IOSTAT }, { "status", zpool_do_status, HELP_STATUS }, @@ -215,6 +219,8 @@ case HELP_IOSTAT: return (gettext("\tiostat [-v] [-T d|u] [pool] ... [interval " "[count]]\n")); + case HELP_LABELCLEAR: + return (gettext("\tlabelclear [-f] \n")); case HELP_LIST: return (gettext("\tlist [-H] [-o property[,...]] " "[-T d|u] [pool] ... [interval [count]]\n")); @@ -561,6 +567,125 @@ } /* + * zpool labelclear + * + * Verifies that the vdev is not active and zeros out the label information + * on the device. + */ +int +zpool_do_labelclear(int argc, char **argv) +{ + char *vdev, *name; + int c, fd = -1, ret = 0; + pool_state_t state; + boolean_t inuse = B_FALSE; + boolean_t force = B_FALSE; + + /* check options */ + while ((c = getopt(argc, argv, "f")) != -1) { + switch (c) { + case 'f': + force = B_TRUE; + break; + default: + (void) fprintf(stderr, gettext("invalid option '%c'\n"), + optopt); + usage(B_FALSE); + } + } + + argc -= optind; + argv += optind; + + /* get vdev name */ + if (argc < 1) { + (void) fprintf(stderr, gettext("missing vdev device name\n")); + usage(B_FALSE); + } + + vdev = argv[0]; + if ((fd = open(vdev, O_RDWR)) < 0) { + (void) fprintf(stderr, gettext("Unable to open %s\n"), vdev); + return (B_FALSE); + } + + name = NULL; + if (zpool_in_use(g_zfs, fd, &state, &name, &inuse) != 0) { + if (force) + goto wipe_label; + + (void) fprintf(stderr, + gettext("Unable to determine pool state for %s\n" + "Use -f to force the clearing any label data\n"), vdev); + + return (1); + } + + if (inuse) { + switch (state) { + default: + case POOL_STATE_ACTIVE: + case POOL_STATE_SPARE: + case POOL_STATE_L2CACHE: + (void) fprintf(stderr, +gettext("labelclear operation failed.\n" + "\tVdev %s is a member (%s), of pool \"%s\".\n" + "\tTo remove label information from this device, export or destroy\n" + "\tthe pool, or remove %s from the configuration of this pool\n" + "\tand retry the labelclear operation\n"), + vdev, zpool_pool_state_to_name(state), name, vdev); + ret = 1; + goto errout; + + case POOL_STATE_EXPORTED: + if (force) + break; + + (void) fprintf(stderr, +gettext("labelclear operation failed.\n" + "\tVdev %s is a member of the exported pool \"%s\".\n" + "\tUse \"zpool labelclear -f %s\" to force the removal of label\n" + "\tinformation.\n"), + vdev, name, vdev); + ret = 1; + goto errout; + + case POOL_STATE_POTENTIALLY_ACTIVE: + if (force) + break; + + (void) fprintf(stderr, +gettext("labelclear operation failed.\n" + "\tVdev %s is a member of the pool \"%s\".\n" + "\tThis pool is unknown to this system, but may be active on\n" + "\tanother system. Use \'zpool labelclear -f %s\' to force the\n" + "\tremoval of label information.\n"), + vdev, name, vdev); + ret = 1; + goto errout; + + case POOL_STATE_DESTROYED: + /* inuse should never be set for a destoryed pool... */ + break; + } + } + +wipe_label: + if (zpool_clear_label(fd) != 0) { + (void) fprintf(stderr, + gettext("Label clear failed on vdev %s\n"), vdev); + ret = 1; + } + +errout: + close(fd); + if (name != NULL) + free(name); + + return (ret); +} + +/* * zpool create [-fn] [-o property=value] ... * [-O file-system-property=value] ... * [-R root] [-m mountpoint] ... @@ -1052,7 +1177,7 @@ char *vname; uint64_t notpresent; spare_cbdata_t cb; - char *state; + const char *state; if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_CHILDREN, &child, &children) != 0) diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h --- vendor/FreeBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h 2011-02-28 13:51:22.250862280 -0700 +++ SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h 2011-06-07 13:59:20.387329174 -0600 @@ -200,7 +200,8 @@ extern void zpool_close(zpool_handle_t *); extern const char *zpool_get_name(zpool_handle_t *); extern int zpool_get_state(zpool_handle_t *); -extern char *zpool_state_to_name(vdev_state_t, vdev_aux_t); +extern const char *zpool_state_to_name(vdev_state_t, vdev_aux_t); +extern const char *zpool_pool_state_to_name(pool_state_t); extern void zpool_free_handles(libzfs_handle_t *); /* @@ -249,7 +250,7 @@ boolean_t *, boolean_t *); extern nvlist_t *zpool_find_vdev_by_physpath(zpool_handle_t *, const char *, boolean_t *, boolean_t *, boolean_t *); -extern int zpool_label_disk(libzfs_handle_t *, zpool_handle_t *, char *); +extern int zpool_label_disk(libzfs_handle_t *, zpool_handle_t *, const char *); /* * Functions to manage pool properties diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c --- vendor/FreeBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c 2011-02-28 13:51:22.254931589 -0700 +++ SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c 2011-06-08 17:16:38.396640386 -0600 @@ -1084,8 +1084,8 @@ /* * Given a file descriptor, clear (zero) the label information. This function - * is currently only used in the appliance stack as part of the ZFS sysevent - * module. + * is used in the appliance stack as part of the ZFS sysevent module and + * to implement the "zpool labelclear" command. */ int zpool_clear_label(int fd) diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c --- vendor/FreeBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c 2011-02-28 13:51:22.272292499 -0700 +++ SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c 2011-06-06 17:32:19.581512402 -0600 @@ -174,7 +174,7 @@ /* * Map VDEV STATE to printed strings. */ -char * +const char * zpool_state_to_name(vdev_state_t state, vdev_aux_t aux) { switch (state) { @@ -202,6 +202,34 @@ } /* + * Map POOL STATE to printed strings. + */ +const char * +zpool_pool_state_to_name(pool_state_t state) +{ + switch (state) { + case POOL_STATE_ACTIVE: + return (gettext("ACTIVE")); + case POOL_STATE_EXPORTED: + return (gettext("EXPORTED")); + case POOL_STATE_DESTROYED: + return (gettext("DESTROYED")); + case POOL_STATE_SPARE: + return (gettext("SPARE")); + case POOL_STATE_L2CACHE: + return (gettext("L2CACHE")); + case POOL_STATE_UNINITIALIZED: + return (gettext("UNINITIALIZED")); + case POOL_STATE_UNAVAIL: + return (gettext("UNAVAIL")); + case POOL_STATE_POTENTIALLY_ACTIVE: + return (gettext("POTENTIALLY_ACTIVE")); + } + + return (gettext("UNKNOWN")); +} + +/* * Get a zpool property value for 'prop' and return the value in * a pre-allocated buffer. */ @@ -3605,7 +3642,7 @@ * stripped of any leading /dev path. */ int -zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, char *name) +zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, const char *name) { #ifdef sun char path[MAXPATHLEN]; --------------090202020409040002080001-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 21:58:54 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 107A2106564A for ; Tue, 14 Jun 2011 21:58:54 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id BCB408FC08 for ; Tue, 14 Jun 2011 21:58:53 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5EM0EnE016000 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 16:00:14 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF7D997.8020100@scsiguy.com> Date: Tue, 14 Jun 2011 15:58:47 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@freebsd.org Content-Type: multipart/mixed; boundary="------------030808090403080204000900" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Tue, 14 Jun 2011 16:00:14 -0600 (MDT) Cc: Subject: [CFR][ZFS] Use vdev_path instead of vdev_physpath in sysevent X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 21:58:54 -0000 This is a multi-part message in MIME format. --------------030808090403080204000900 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit In spa_async_autoexpand(), we can avoid the creation of a valid devfs name (which is currently performed using Solaris' "/devices") by just using the vdev_path. Since the device is online, there is no benefit to using the physical path location. Further, the physical path data may include prefix data (e.g. physical path quality) that will require more code to strip out. -- Justin --------------030808090403080204000900 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="spa.diffs" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="spa.diffs" diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c --- vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c 2011-02-28 13:51:27.986816115 -0700 +++ SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c 2011-06-06 15:50:11.169668215 -0600 @@ -5051,7 +5067,6 @@ { sysevent_id_t eid; nvlist_t *attr; - char *physpath; if (!spa->spa_autoexpand) return; @@ -5061,20 +5076,16 @@ spa_async_autoexpand(spa, cvd); } - if (!vd->vdev_ops->vdev_op_leaf || vd->vdev_physpath == NULL) + if (!vd->vdev_ops->vdev_op_leaf || vd->vdev_path == NULL) return; - physpath = kmem_zalloc(MAXPATHLEN, KM_SLEEP); - (void) snprintf(physpath, MAXPATHLEN, "/devices%s", vd->vdev_physpath); - VERIFY(nvlist_alloc(&attr, NV_UNIQUE_NAME, KM_SLEEP) == 0); - VERIFY(nvlist_add_string(attr, DEV_PHYS_PATH, physpath) == 0); + VERIFY(nvlist_add_string(attr, DEV_PATH, vd->vdev_path) == 0); (void) ddi_log_sysevent(zfs_dip, SUNW_VENDOR, EC_DEV_STATUS, ESC_ZFS_VDEV_AUTOEXPAND, attr, &eid, DDI_SLEEP); nvlist_free(attr); - kmem_free(physpath, MAXPATHLEN); } static void --------------030808090403080204000900-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 22:45:13 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E5D91065670 for ; Tue, 14 Jun 2011 22:45:13 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (ns1.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id DAE018FC18 for ; Tue, 14 Jun 2011 22:45:12 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5EMkXoI016177 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 16:46:33 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF7E472.9030601@scsiguy.com> Date: Tue, 14 Jun 2011 16:45:06 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@freebsd.org Content-Type: multipart/mixed; boundary="------------040305070307000903020709" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Tue, 14 Jun 2011 16:46:34 -0600 (MDT) Cc: Subject: [CFR][ZFS] Allow async event processing with a R/O root FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 22:45:13 -0000 This is a multi-part message in MIME format. --------------040305070307000903020709 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Allow ZFS asynchronous event handling to proceed even if the root file system is mounted read-only. This restriction appears to have been put in place to avoid errors with updating the configuration cache file. However: o The majority of asynchronous event handling does not involve configuration cache file updates. o The configuration cache file need not be on the root file system, so the check was not complete. o Other classes of errors (e.g. file system full) can also prevent a successful update yet do not prevent asynchronous event processing. o Configurations such as NanoBSD never have a read-write root, so ZFS event processing is permanently disabled in these systems. o Failure to handle asynchronous events promptly can extend the window of time that a pool is in a critical state. At worst, a missed configuration cache update will force the operator to perform a manual "zfs import" (note -f is not required) to inform the system about a newly created pool. To minimize the likelihood of this rare occurrence, configuration cache write failures now emit FMA events so the operator can take corrective action, and the write is retried every 5 minutes. The retry interval, in seconds, is tunable via the sysctl "vfs.zfs.ccw_retry_interval". sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: o Add the sysctl "vfs.zfs.ccw_retry_interval". The value defaults to 5 minutes and is used to rate limit, on a per-pool basis, configuration cache file write attempts. o Modify spa_async_dispatch to honor configuration cache write limiting. If other events are pending, a configuration cache write will be attempted at the same time, so the rate limiting only applies when the asynchronous dispatch system is otherwise idle. Async events should be rare (e.g. device arrival/departure) and configuration cache writes rarer, so a more complicated system to strictly honor the retry limit seems unwarranted. o Remove check in spa_async_dispatch() for the root file system being read-write. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c: Instead of silently ignoring configuration cache write failures, report them via a new FMA event as well as to the console. The current zfs_ereport_post() doesn't allow arbitrary name=value pairs to be appended to the report, so the configuration cache file name is only available on the console output. This limitation should be addressed in a future update. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h: Add a uint64_t to the spa data structure to track the time (via LBOLT) of the last configuration cache file write failure. This is referenced in spa_async_dispatch() to effect the rate limiting. sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h: Add FM_EREPORT_ZFS_CONFIG_CACHE_WRITE as an ereport class. --------------040305070307000903020709 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="spa_cache.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="spa_cache.diff" diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c --- vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c 2011-02-28 13:51:27.986816115 -0700 +++ SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c 2011-06-14 16:08:11.387632264 -0600 @@ -73,10 +73,20 @@ /* Check hostid on import? */ static int check_hostid = 1; +/* + * The interval at which failed configuration cache file writes + * should be retried. + */ +static int zfs_ccw_retry_interval = 300; + SYSCTL_DECL(_vfs_zfs); TUNABLE_INT("vfs.zfs.check_hostid", &check_hostid); SYSCTL_INT(_vfs_zfs, OID_AUTO, check_hostid, CTLFLAG_RW, &check_hostid, 0, "Check hostid on import?"); +TUNABLE_INT("vfs.zfs.ccw_retry_interval", &zfs_ccw_retry_interval); +SYSCTL_INT(_vfs_zfs, OID_AUTO, ccw_retry_interval, CTLFLAG_RW, + &zfs_ccw_retry_interval, 0, + "Configuration cache file write, retry after failure, interval (seconds)"); typedef enum zti_modes { zti_mode_fixed, /* value is # of threads (min 1) */ @@ -5183,13 +5188,34 @@ mutex_exit(&spa->spa_async_lock); } +static int +spa_async_tasks_pending(spa_t *spa) +{ + u_int non_config_tasks; + u_int config_task; + boolean_t config_task_suspended; + + non_config_tasks = spa->spa_async_tasks & ~SPA_ASYNC_CONFIG_UPDATE; + config_task = spa->spa_async_tasks & SPA_ASYNC_CONFIG_UPDATE; + if (spa->spa_ccw_fail_time == 0) { + config_task_suspended = B_FALSE; + } else { + config_task_suspended = + (ddi_get_lbolt64() - spa->spa_ccw_fail_time) + < (zfs_ccw_retry_interval * hz); + } + + return (non_config_tasks || (config_task && !config_task_suspended)); +} + static void spa_async_dispatch(spa_t *spa) { mutex_enter(&spa->spa_async_lock); - if (spa->spa_async_tasks && !spa->spa_async_suspended && + if (spa_async_tasks_pending(spa) && + !spa->spa_async_suspended && spa->spa_async_thread == NULL && - rootdir != NULL && !vn_is_readonly(rootdir)) + rootdir != NULL) spa->spa_async_thread = thread_create(NULL, 0, spa_async_thread, spa, 0, &p0, TS_RUN, maxclsyspri); mutex_exit(&spa->spa_async_lock); diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c --- vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c 2011-02-28 13:51:27.987815981 -0700 +++ SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c 2011-03-25 18:24:55.085601405 -0600 @@ -24,6 +24,7 @@ */ #include +#include #include #include #include @@ -136,7 +137,7 @@ kobj_close_file(file); } -static void +static int spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl) { size_t buflen; @@ -144,13 +145,14 @@ vnode_t *vp; int oflags = FWRITE | FTRUNC | FCREAT | FOFFMAX; char *temp; + int err; /* * If the nvlist is empty (NULL), then remove the old cachefile. */ if (nvl == NULL) { - (void) vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE); - return; + err = vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE); + return (err); } /* @@ -171,11 +173,12 @@ */ (void) snprintf(temp, MAXPATHLEN, "%s.tmp", dp->scd_path); - if (vn_open(temp, UIO_SYSSPACE, oflags, 0644, &vp, CRCREAT, 0) == 0) { - if (vn_rdwr(UIO_WRITE, vp, buf, buflen, 0, UIO_SYSSPACE, - 0, RLIM64_INFINITY, kcred, NULL) == 0 && - VOP_FSYNC(vp, FSYNC, kcred, NULL) == 0) { - (void) vn_rename(temp, dp->scd_path, UIO_SYSSPACE); + err = vn_open(temp, UIO_SYSSPACE, oflags, 0644, &vp, CRCREAT, 0); + if (err == 0) { + if ((err = vn_rdwr(UIO_WRITE, vp, buf, buflen, 0, UIO_SYSSPACE, + 0, RLIM64_INFINITY, kcred, NULL)) == 0 && + (err = VOP_FSYNC(vp, FSYNC, kcred, NULL)) == 0) { + err = vn_rename(temp, dp->scd_path, UIO_SYSSPACE); } (void) VOP_CLOSE(vp, oflags, 1, 0, kcred, NULL); } @@ -184,6 +187,7 @@ kmem_free(buf, buflen); kmem_free(temp, MAXPATHLEN); + return (err); } /* @@ -195,6 +199,8 @@ { spa_config_dirent_t *dp, *tdp; nvlist_t *nvl; + boolean_t ccw_failure; + int error; ASSERT(MUTEX_HELD(&spa_namespace_lock)); @@ -206,6 +212,7 @@ * cachefile is changed, the new one is pushed onto this list, allowing * us to update previous cachefiles that no longer contain this pool. */ + ccw_failure = B_FALSE; for (dp = list_head(&target->spa_config_list); dp != NULL; dp = list_next(&target->spa_config_list, dp)) { spa_t *spa = NULL; @@ -238,10 +245,35 @@ mutex_exit(&spa->spa_props_lock); } - spa_config_write(dp, nvl); + error = spa_config_write(dp, nvl); + if (error != 0) { + + printf("ZFS ERROR: Update of cache file %s failed: " + "Errno %d\n", dp->scd_path, error); + ccw_failure = B_TRUE; + } + nvlist_free(nvl); } + if (ccw_failure) { + /* + * Keep trying so that configuration data is + * written if/when any temporary filesystem + * resource issues are resolved. + */ + target->spa_ccw_fail_time = ddi_get_lbolt64(); + spa_async_request(target, SPA_ASYNC_CONFIG_UPDATE); + zfs_ereport_post(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE, + target, NULL, NULL, 0, 0); + } else { + /* + * Do not rate limit future attempts to update + * the config cache. + */ + target->spa_ccw_fail_time = 0; + } + /* * Remove any config entries older than the current one. */ diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h --- vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h 2011-02-28 13:51:28.086708890 -0700 +++ SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h 2011-03-25 18:24:55.837175120 -0600 @@ -216,6 +216,7 @@ int spa_vdev_locks; /* locks grabbed */ uint64_t spa_creation_version; /* version at pool creation */ uint64_t spa_prev_software_version; + int64_t spa_ccw_fail_time; /* Conf cache write fail time */ /* * spa_refcnt & spa_config_lock must be the last elements * because refcount_t changes size based on compilation options. diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h --- vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h 2011-02-28 13:51:28.230602494 -0700 +++ SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h 2011-03-25 18:24:59.769604479 -0600 @@ -46,6 +46,7 @@ #define FM_EREPORT_ZFS_IO_FAILURE "io_failure" #define FM_EREPORT_ZFS_PROBE_FAILURE "probe_failure" #define FM_EREPORT_ZFS_LOG_REPLAY "log_replay" +#define FM_EREPORT_ZFS_CONFIG_CACHE_WRITE "config_cache_write" #define FM_EREPORT_PAYLOAD_ZFS_POOL "pool" #define FM_EREPORT_PAYLOAD_ZFS_POOL_FAILMODE "pool_failmode" --------------040305070307000903020709-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 14 23:06:12 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FB901065676 for ; Tue, 14 Jun 2011 23:06:12 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (aslan.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 5E0448FC1B for ; Tue, 14 Jun 2011 23:06:12 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5EN7XYu016263 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Jun 2011 17:07:33 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF7E95E.60507@scsiguy.com> Date: Tue, 14 Jun 2011 17:06:06 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@freebsd.org Content-Type: multipart/mixed; boundary="------------020608030508010500020709" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Tue, 14 Jun 2011 17:07:33 -0600 (MDT) Cc: Subject: [CFR][ZFS] vdev_geom enhancements X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 23:06:12 -0000 This is a multi-part message in MIME format. --------------020608030508010500020709 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Modify the geom vdev provider's open behavior so that it will only unconditionally open a device by path if the open is part of a pool create or device add operation, and a search of all known geom provider's label data doesn't yield a device with matching pool and vdev GUIDs. This fixes a bug where the wrong disk could be associated with a vdev's configuration data when device devfs paths change due to insert and remove events. While, ZFS detects this kind of coding mixup and immediately flags the device as faulted before the confusion can cause permanent data loss, a reboot was necessary in order to resurrect the configuration. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: o When opening by GUID, require both the pool and vdev GUIDs to match. While it is highly unlikely for two vdevs to have the same vdev GUIDs, the ZFS storage pool allocator only guarantees they are unique within a pool. o Modify the open behavior to: - Open by recorded device path with GUID matching - If that fails, search all geom providers for a device with matching GUIDs. - If that fails and we are opening a "new to a pool configuration" vdev, open by path. - Otherwise fail the open. Fix race conditions in the GEOM vdev provider that can occur when both a "reopen" is attempted due to triggering the ZFS probe code on an I/O failure while an asynchronous vdev removal request is generated. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: Initialize the private field in vdev_geom's consumer objects to reference ZFS's vdev object before GEOM's topology lock is released. This insures that, should this consumer be orphaned before ZFS's open processing completes, the proper data is available to post an async removal request. Export physical path information to ZFS sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: Subscribe to attribute change notifications and update vdev physical path information (in core and on disk) when a GEOM::physpath event indicates they have changed. NOTE: These diffs still contain the removal of the drop of the spa namespace lock during vdev_geom open. I'm not proposing to commit this change now as pjd has informed me that the drop is required to support Zvols. I'm still looking into how to resolve both the lock order reversal that can occur with the SCL locks here and the Zvol issue. --------------020608030508010500020709 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="vdev_geom.diffs" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="vdev_geom.diffs" diff -u -r -x cscope.out -x out -x ctl -x compile vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c --- vendor/FreeBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 2011-02-28 13:51:28.112874995 -0700 +++ SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 2011-06-08 17:12:55.953572950 -0600 @@ -84,8 +84,52 @@ spa_async_request(vd->vdev_spa, SPA_ASYNC_REMOVE); } +static void +vdev_geom_attrchanged(struct g_consumer *cp, const char *attr) +{ + vdev_t *vd; + spa_t *spa; + char *physpath; + int error, physpath_len; + + g_topology_assert(); + + if (strcmp(attr, "GEOM::physpath") != 0) + return; + + if (g_access(cp, 1, 0, 0) != 0) + return; + + /* + * Record/Update physical path information for this device. + */ + vd = cp->private; + spa = vd->vdev_spa; + physpath_len = MAXPATHLEN; + physpath = g_malloc(physpath_len, M_WAITOK|M_ZERO); + error = g_io_getattr("GEOM::physpath", cp, &physpath_len, physpath); + g_access(cp, -1, 0, 0); + if (error == 0) { + int held_lock; + + held_lock = spa_config_held(spa, SCL_STATE, RW_WRITER); + if (held_lock == 0) + spa_config_enter(spa, SCL_STATE, FTAG, RW_WRITER); + + if (vd->vdev_physpath != NULL) + spa_strfree(vd->vdev_physpath); + vd->vdev_physpath = spa_strdup(physpath); + + spa_async_request(spa, SPA_ASYNC_CONFIG_UPDATE); + + if (held_lock == 0) + spa_config_exit(spa, SCL_STATE, FTAG); + } + g_free(physpath); +} + static struct g_consumer * -vdev_geom_attach(struct g_provider *pp) +vdev_geom_attach(struct g_provider *pp, vdev_t *vd) { struct g_geom *gp; struct g_consumer *cp; @@ -104,6 +148,7 @@ if (gp == NULL) { gp = g_new_geomf(&zfs_vdev_class, "zfs::vdev"); gp->orphan = vdev_geom_orphan; + gp->attrchanged = vdev_geom_attrchanged; cp = g_new_consumer(gp); if (g_attach(cp, pp) != 0) { g_wither_geom(gp, ENXIO); @@ -140,6 +185,12 @@ ZFS_LOG(1, "Used existing consumer for %s.", pp->name); } } + + cp->private = vd; + + /* Fetch initial physical path information for this device. */ + vdev_geom_attrchanged(cp, "GEOM::physpath"); + return (cp); } @@ -170,20 +221,26 @@ } } -static uint64_t -nvlist_get_guid(nvlist_t *list) +static void +nvlist_get_guids(nvlist_t *list, uint64_t *pguid, uint64_t *vguid) { nvpair_t *elem = NULL; - uint64_t value; + *vguid = 0; + *pguid = 0; while ((elem = nvlist_next_nvpair(list, elem)) != NULL) { - if (nvpair_type(elem) == DATA_TYPE_UINT64 && - strcmp(nvpair_name(elem), "guid") == 0) { - VERIFY(nvpair_value_uint64(elem, &value) == 0); - return (value); + if (nvpair_type(elem) != DATA_TYPE_UINT64) + continue; + + if (strcmp(nvpair_name(elem), ZPOOL_CONFIG_POOL_GUID) == 0) { + VERIFY(nvpair_value_uint64(elem, pguid) == 0); + } else if (strcmp(nvpair_name(elem), ZPOOL_CONFIG_GUID) == 0) { + VERIFY(nvpair_value_uint64(elem, vguid) == 0); } + + if (*pguid != 0 && *vguid != 0) + break; } - return (0); } static int @@ -221,8 +278,8 @@ return (error); } -static uint64_t -vdev_geom_read_guid(struct g_consumer *cp) +static void +vdev_geom_read_guids(struct g_consumer *cp, uint64_t *pguid, uint64_t *vguid) { struct g_provider *pp; vdev_label_t *label; @@ -230,11 +287,12 @@ size_t buflen; uint64_t psize; off_t offset, size; - uint64_t guid; int error, l, len; g_topology_assert_not(); + *pguid = 0; + *vguid = 0; pp = cp->provider; ZFS_LOG(1, "Reading guid from %s...", pp->name); @@ -244,7 +302,6 @@ size = sizeof(*label) + pp->sectorsize - ((sizeof(*label) - 1) % pp->sectorsize) - 1; - guid = 0; label = kmem_alloc(size, KM_SLEEP); buflen = sizeof(label->vl_vdev_phys.vp_nvlist); @@ -262,16 +319,16 @@ if (nvlist_unpack(buf, buflen, &config, 0) != 0) continue; - guid = nvlist_get_guid(config); + nvlist_get_guids(config, pguid, vguid); nvlist_free(config); - if (guid != 0) + if (*pguid != 0 && *vguid != 0) break; } kmem_free(label, size); - if (guid != 0) - ZFS_LOG(1, "guid for %s is %ju", pp->name, (uintmax_t)guid); - return (guid); + if (*pguid != 0 && *vguid != 0) + ZFS_LOG(1, "guid for %s is %ju:%ju", pp->name, + (uintmax_t)*pguid, (uintmax_t)*vguid); } static void @@ -283,13 +340,14 @@ } static struct g_consumer * -vdev_geom_attach_by_guid(uint64_t guid) +vdev_geom_attach_by_guid(vdev_t *vd) { struct g_class *mp; struct g_geom *gp, *zgp; struct g_provider *pp; struct g_consumer *cp, *zcp; uint64_t pguid; + uint64_t vguid; g_topology_assert(); @@ -314,13 +372,14 @@ continue; } g_topology_unlock(); - pguid = vdev_geom_read_guid(zcp); + vdev_geom_read_guids(zcp, &pguid, &vguid); g_topology_lock(); g_access(zcp, -1, 0, 0); g_detach(zcp); - if (pguid != guid) + if (pguid != spa_guid(vd->vdev_spa) || + vguid != vd->vdev_guid) continue; - cp = vdev_geom_attach(pp); + cp = vdev_geom_attach(pp, vd); if (cp == NULL) { printf("ZFS WARNING: Unable to attach to %s.\n", pp->name); @@ -341,7 +400,7 @@ } static struct g_consumer * -vdev_geom_open_by_guid(vdev_t *vd) +vdev_geom_open_by_guids(vdev_t *vd) { struct g_consumer *cp; char *buf; @@ -349,8 +408,9 @@ g_topology_assert(); - ZFS_LOG(1, "Searching by guid [%ju].", (uintmax_t)vd->vdev_guid); - cp = vdev_geom_attach_by_guid(vd->vdev_guid); + ZFS_LOG(1, "Searching by guid [%ju:%ju].", + (uintmax_t)spa_guid(vd->vdev_spa), (uintmax_t)vd->vdev_guid); + cp = vdev_geom_attach_by_guid(vd); if (cp != NULL) { len = strlen(cp->provider->name) + strlen("/dev/") + 1; buf = kmem_alloc(len, KM_SLEEP); @@ -359,10 +419,12 @@ spa_strfree(vd->vdev_path); vd->vdev_path = buf; - ZFS_LOG(1, "Attach by guid [%ju] succeeded, provider %s.", + ZFS_LOG(1, "Attach by guid [%ju:%ju] succeeded, provider %s.", + (uintmax_t)spa_guid(vd->vdev_spa), (uintmax_t)vd->vdev_guid, vd->vdev_path); } else { - ZFS_LOG(1, "Search by guid [%ju] failed.", + ZFS_LOG(1, "Search by guid [%ju:%ju] failed.", + (uintmax_t)spa_guid(vd->vdev_spa), (uintmax_t)vd->vdev_guid); } @@ -374,7 +436,8 @@ { struct g_provider *pp; struct g_consumer *cp; - uint64_t guid; + uint64_t pguid; + uint64_t vguid; g_topology_assert(); @@ -382,18 +445,21 @@ pp = g_provider_by_name(vd->vdev_path + sizeof("/dev/") - 1); if (pp != NULL) { ZFS_LOG(1, "Found provider by name %s.", vd->vdev_path); - cp = vdev_geom_attach(pp); + cp = vdev_geom_attach(pp, vd); if (cp != NULL && check_guid && ISP2(pp->sectorsize) && pp->sectorsize <= VDEV_PAD_SIZE) { g_topology_unlock(); - guid = vdev_geom_read_guid(cp); + vdev_geom_read_guids(cp, &pguid, &vguid); g_topology_lock(); - if (guid != vd->vdev_guid) { + if (pguid != spa_guid(vd->vdev_spa) || + vguid != vd->vdev_guid) { vdev_geom_detach(cp, 0); cp = NULL; ZFS_LOG(1, "guid mismatch for provider %s: " - "%ju != %ju.", vd->vdev_path, - (uintmax_t)vd->vdev_guid, (uintmax_t)guid); + "%ju:%ju != %ju:%ju.", vd->vdev_path, + (uintmax_t)spa_guid(vd->vdev_spa), + (uintmax_t)vd->vdev_guid, + (uintmax_t)pguid, (uintmax_t)vguid); } else { ZFS_LOG(1, "guid match for provider %s.", vd->vdev_path); @@ -410,7 +476,7 @@ struct g_provider *pp; struct g_consumer *cp; size_t bufsize; - int error, lock; + int error; /* * We must have a pathname, and it must be absolute. @@ -422,34 +488,48 @@ vd->vdev_tsd = NULL; - if (mutex_owned(&spa_namespace_lock)) { - mutex_exit(&spa_namespace_lock); - lock = 1; - } else { - lock = 0; - } DROP_GIANT(); g_topology_lock(); error = 0; /* - * If we're creating or splitting a pool, just find the GEOM provider - * by its name and ignore GUID mismatches. + * Try using the recorded path for this device, but only + * accept it if its label data contains the expected GUIDs. */ - if (vd->vdev_spa->spa_load_state == SPA_LOAD_NONE || - vd->vdev_spa->spa_splitting_newspa == B_TRUE) + cp = vdev_geom_open_by_path(vd, 1); + if (cp == NULL) { + /* + * The device at vd->vdev_path doesn't have the + * expected GUIDs. The disks might have merely + * moved around so try all other GEOM providers + * to find one with the right GUIDs. + */ + cp = vdev_geom_open_by_guids(vd); + } + + if (cp == NULL && + ((vd->vdev_prevstate == VDEV_STATE_UNKNOWN && + vd->vdev_spa->spa_load_state == SPA_LOAD_NONE) || + vd->vdev_spa->spa_splitting_newspa == B_TRUE)) { + /* + * We are dealing with a vdev that hasn't been previosly + * opened (since boot), and we are not loading an + * existing pool configuration (add vdev to new or + * existing pool) or we are splitting a pool. + * Find ithe GEOM provider by its name, ignoring GUID + * mismatches. + * + * XXPOLICY: It would be safer to only allow a device + * that is labeled but missing GUID information + * to be opened in this fashion. This would + * require a new option to the zpool command line + * tool allowing the label information to be reset + * on a device, and augmented error reporting + * so the user can understand why their request + * failed and the required steps to repurpose + * the device. + */ cp = vdev_geom_open_by_path(vd, 0); - else { - cp = vdev_geom_open_by_path(vd, 1); - if (cp == NULL) { - /* - * The device at vd->vdev_path doesn't have the - * expected guid. The disks might have merely - * moved around so try all other GEOM providers - * to find one with the right guid. - */ - cp = vdev_geom_open_by_guid(vd); - } } if (cp == NULL) { @@ -459,11 +539,7 @@ !ISP2(cp->provider->sectorsize)) { ZFS_LOG(1, "Provider %s has unsupported sectorsize.", vd->vdev_path); - - g_topology_lock(); vdev_geom_detach(cp, 0); - g_topology_unlock(); - error = EINVAL; cp = NULL; } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { @@ -484,18 +560,15 @@ cp = NULL; } } + g_topology_unlock(); PICKUP_GIANT(); - if (lock) - mutex_enter(&spa_namespace_lock); if (cp == NULL) { vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; return (error); } - - cp->private = vd; - vd->vdev_tsd = cp; pp = cp->provider; + vd->vdev_tsd = cp; /* * Determine the actual size of the device. @@ -513,12 +586,6 @@ */ vd->vdev_nowritecache = B_FALSE; - if (vd->vdev_physpath != NULL) - spa_strfree(vd->vdev_physpath); - bufsize = sizeof("/dev/") + strlen(pp->name); - vd->vdev_physpath = kmem_alloc(bufsize, KM_SLEEP); - snprintf(vd->vdev_physpath, bufsize, "/dev/%s", pp->name); - return (0); } --------------020608030508010500020709-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 11:59:14 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F4D71065747 for ; Wed, 15 Jun 2011 11:59:14 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id CD46B8FC1B for ; Wed, 15 Jun 2011 11:59:13 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 1519A45CA6; Wed, 15 Jun 2011 13:59:11 +0200 (CEST) Received: from localhost (58.wheelsystems.com [83.12.187.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id C8CA245C89; Wed, 15 Jun 2011 13:58:55 +0200 (CEST) Date: Wed, 15 Jun 2011 13:58:53 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110615115853.GG1975@garage.freebsd.pl> References: <4DF7C406.1080903@scsiguy.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="oXNgvKVxGWJ0RPMJ" Content-Disposition: inline In-Reply-To: <4DF7C406.1080903@scsiguy.com> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-3.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, RCVD_IN_SORBS_DUL autolearn=ham version=3.0.4 Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Show removed devices by GUID in zpool output. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 11:59:14 -0000 --oXNgvKVxGWJ0RPMJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 14, 2011 at 02:26:46PM -0600, Justin T. Gibbs wrote: > The current behavior of zpool_vdev_name() is to report the vdev path > (e.g. /dev/da0) unless > a vdev has the ZPOOL_CONFIG_NOT_PRESENT attribute set. This > attribute is only set when > a vdev is not found during import/mount of a pool. The attached > patch also displays a vdev > by GUID if it cannot be opened post import or is marked removed > (e.g. via a GEOM orphan > event). >=20 > The main motivation for this change is that vdev paths are not > unique to a physical leaf vdev. > It is easy to get into a situation where you need to "detach > /dev/da0" event though da0 is > an active member of the same pool in which a "previous da0" was once > removed. With > zpool_vdev_name() reporting the GUID, the user is equipped to > provide an unambiguous > command that represents their desired action. That's a useful change. It confused users in the past. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --oXNgvKVxGWJ0RPMJ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk34nn0ACgkQForvXbEpPzSAqwCgti3FTi2oxoOJPIbVwDQXUSY6 0ywAoI3aBjV9eBJGXQlc7rg8Fo2mij2t =R8cW -----END PGP SIGNATURE----- --oXNgvKVxGWJ0RPMJ-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 12:00:36 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17F301065707 for ; Wed, 15 Jun 2011 12:00:36 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 9B14B8FC16 for ; Wed, 15 Jun 2011 12:00:35 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 3CD5745E11; Wed, 15 Jun 2011 14:00:34 +0200 (CEST) Received: from localhost (58.wheelsystems.com [83.12.187.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id E2C1745CDC; Wed, 15 Jun 2011 14:00:26 +0200 (CEST) Date: Wed, 15 Jun 2011 14:00:24 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110615120024.GH1975@garage.freebsd.pl> References: <4DF7C9E6.1030800@scsiguy.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="K1n7F7fSdjvFAEnM" Content-Disposition: inline In-Reply-To: <4DF7C9E6.1030800@scsiguy.com> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-3.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, RCVD_IN_SORBS_DUL autolearn=ham version=3.0.4 Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Show "previous device location" for removed vdevs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 12:00:36 -0000 --K1n7F7fSdjvFAEnM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 14, 2011 at 02:51:50PM -0600, Justin T. Gibbs wrote: > When a vdev cannot be found during ZFS pool import/mount time, > "zpool status" > reports the device GUID and a "device was at" message as a user aid. This > patch provides the same behavior when a device is removed post zpool > mount/import. Sounds good. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --K1n7F7fSdjvFAEnM Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk34ntcACgkQForvXbEpPzSwwQCfbF4iQgtGIhV9rzj6UA5wIQ/o J8wAn1M291MKt0gUMaQJGSsRoYE16jDz =kz5L -----END PGP SIGNATURE----- --K1n7F7fSdjvFAEnM-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 12:05:40 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DFD91065677 for ; Wed, 15 Jun 2011 12:05:40 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 927D58FC0C for ; Wed, 15 Jun 2011 12:05:39 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 0F2B645CD9; Wed, 15 Jun 2011 14:05:37 +0200 (CEST) Received: from localhost (58.wheelsystems.com [83.12.187.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 34CF545C8C; Wed, 15 Jun 2011 14:05:27 +0200 (CEST) Date: Wed, 15 Jun 2011 14:05:24 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110615120524.GI1975@garage.freebsd.pl> References: <4DF7CDD0.8040108@scsiguy.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UthUFkbMtH2ceUK2" Content-Disposition: inline In-Reply-To: <4DF7CDD0.8040108@scsiguy.com> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-3.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, RCVD_IN_SORBS_DUL autolearn=ham version=3.0.4 Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 12:05:40 -0000 --UthUFkbMtH2ceUK2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 14, 2011 at 03:08:32PM -0600, Justin T. Gibbs wrote: > ZFS rightfully has a lot of safety belts in place to ward off unintended > data loss. But in some scenarios, the safety belts are so restrictive, > the only way to proceed is to wipe the label information off of a drive. >=20 > Here's an example: >=20 > Pull a drive that is active in a pool on one system and stick it into > another system. ZFS will correctly reject this drive as a member of > a new pool or as the argument of a replace command. But if you really > want to use that drive, how do you clear it's "potentially active" > status? If the pool were imported, you could destroy it, but ZFS wont > allow you to import a pool unless there are sufficient members for it > to serve I/O (I know about the undocumented -F option for import, > but users aren't going to find that). You can use dd to wipe the label > data off, but where exactly does ZFS keep its for copies of the label? In most of the cases like that you can use -f switch, eg. you can create pool or replace vdev using one that is active when you use that switch. I'm sure you are aware of this, so I guess it doesn't always work for you? --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --UthUFkbMtH2ceUK2 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk34oAQACgkQForvXbEpPzTMPQCfdOF/B55vnF1dEVZpS5/ZFyAd PGQAnR/Mq+I8jT8plWA2axWUoBE9Y+Fc =yrzz -----END PGP SIGNATURE----- --UthUFkbMtH2ceUK2-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 12:15:45 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32C16106566B for ; Wed, 15 Jun 2011 12:15:45 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 999FF8FC12 for ; Wed, 15 Jun 2011 12:15:44 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 2B1B545CD9; Wed, 15 Jun 2011 14:15:41 +0200 (CEST) Received: from localhost (58.wheelsystems.com [83.12.187.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 72E7745C8C; Wed, 15 Jun 2011 14:15:32 +0200 (CEST) Date: Wed, 15 Jun 2011 14:15:29 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110615121529.GJ1975@garage.freebsd.pl> References: <4DF7E472.9030601@scsiguy.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PEkEgRdBLZYkpbX2" Content-Disposition: inline In-Reply-To: <4DF7E472.9030601@scsiguy.com> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-3.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, RCVD_IN_SORBS_DUL autolearn=ham version=3.0.4 Cc: fs@freebsd.org Subject: Re: [CFR][ZFS] Allow async event processing with a R/O root FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 12:15:45 -0000 --PEkEgRdBLZYkpbX2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 14, 2011 at 04:45:06PM -0600, Justin T. Gibbs wrote: > Allow ZFS asynchronous event handling to proceed even if the > root file system is mounted read-only. This restriction appears > to have been put in place to avoid errors with updating the > configuration cache file. However: >=20 > o The majority of asynchronous event handling does not involve > configuration cache file updates. > o The configuration cache file need not be on the root file system, > so the check was not complete. Why is that? I keep cache file on the ZFS root file system. It is not used by the boot loader, but is loaded by the loader: # tail -7 /boot/defaults/loader.conf --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --PEkEgRdBLZYkpbX2 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk34omEACgkQForvXbEpPzS/LwCg063aHn7VuYfVWVptLp323o4j 0aUAoLqNrfJOyb8M4WubElKbBrPBaf+j =M8Cm -----END PGP SIGNATURE----- --PEkEgRdBLZYkpbX2-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 12:43:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62E341065676 for ; Wed, 15 Jun 2011 12:43:11 +0000 (UTC) (envelope-from mail.lexa@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2DEF38FC0C for ; Wed, 15 Jun 2011 12:43:10 +0000 (UTC) Received: by iwn33 with SMTP id 33so365373iwn.13 for ; Wed, 15 Jun 2011 05:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type; bh=i1ngK9tlY3xQChhbSk5ZUBflEJc+s5NmTALoLLhaWDo=; b=DU1Hr4IYeF25bMI96pIx0GmxXrUSbl5TsqSgPUubGjt/ySrCQZi7rDO4ojleeFIuyn pO2KoyGcLcvQb7BXa9YxgyCuRy35A9Ziayi2oo2sAUEKNIyXyW9hEAgTteCrT6S4gjXC wQYVA8JhP6puwaeBgqPWlB7tkMdlWX81MLFtg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=Rb5A3S7Jfizi10Oj2KS1R7sMQOsHyxKPaqloxwPqTHF7uvX6O4a7iYz0hRU4oJ7j5d r9stt63u9jWmYWHAEhN25FSlVXno/kTswN1zmb3D9RxiqcVcHpDAQ3fTq2kUInlwj04J nw3s86ts3wyfn4v0OSXHORInz4/LB5NOfcbRc= Received: by 10.42.39.139 with SMTP id h11mr416444ice.170.1308140176050; Wed, 15 Jun 2011 05:16:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.179.131 with HTTP; Wed, 15 Jun 2011 05:15:56 -0700 (PDT) From: =?UTF-8?B?0JDQu9C10LrRgdC10Lk=?= Date: Wed, 15 Jun 2011 16:15:56 +0400 Message-ID: To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS raid1 crash kernel panic. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 12:43:11 -0000 Hello, I have a ZFS raid1 from 2 drives to 1TB . Recently, my system OS: FreeBSD 8.2-RELEASE has crashed, with kernel panic: -------------------------------- panic: solaris assert: ss->ss_end >= end (0x6a80753600 >= 0x6a80753800), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 174 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: solaris assert: ss->ss_end >= end (0x6a80753600 >= 0x6a80753800), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 174 cpuid = 0 KDB: stack backtrace: #0 0xffffffff805f4e0e at kdb_backtrace+0x5e #1 0xffffffff805c2d07 at panic+0x187 #2 0xffffffff80ee36f6 at space_map_remove+0x296 #3 0xffffffff80ee3d9b at space_map_load+0x1bb #4 0xffffffff80ed4c19 at metaslab_activate+0x89 #5 0xffffffff80ed586e at metaslab_alloc+0x6ae #6 0xffffffff80f00299 at zio_dva_allocate+0x69 #7 0xffffffff80efe287 at zio_execute+0x77 #8 0xffffffff80e9e303 at taskq_run_safe+0x13 #9 0xffffffff805ffeb5 at taskqueue_run_locked+0x85 #10 0xffffffff8060004e at taskqueue_thread_loop+0x4e #11 0xffffffff805994f8 at fork_exit+0x118 #12 0xffffffff8089547e at fork_trampoline+0xe --------------- Reinstall OS and import zfs pool not change anything. smartctl-a says that everything is OK Can anybody tell me what is it? From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 12:47:30 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54D371065673; Wed, 15 Jun 2011 12:47:30 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 2CA2F8FC0C; Wed, 15 Jun 2011 12:47:29 +0000 (UTC) Received: from Justins-MacBook-Pro.local (macbook.scsiguy.com [192.168.0.99]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5FCmpuT020189 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 15 Jun 2011 06:48:51 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) Message-ID: <4DF8A9E1.6050501@FreeBSD.org> Date: Wed, 15 Jun 2011 06:47:29 -0600 From: "Justin T. Gibbs" Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <201106141710.p5EHAXYS044119@svn.freebsd.org> <20110615094202.GB1975@garage.freebsd.pl> In-Reply-To: <20110615094202.GB1975@garage.freebsd.pl> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Wed, 15 Jun 2011 06:48:51 -0600 (MDT) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: fs@FreeBSD.org Subject: Re: svn commit: r223089 - in head: sys/cam/ata sys/cam/scsi sys/geom sys/sys usr.sbin/diskinfo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gibbs@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 12:47:30 -0000 On 6/15/11 3:42 AM, Pawel Jakub Dawidek wrote: > On Tue, Jun 14, 2011 at 05:10:33PM +0000, Justin T. Gibbs wrote: > > Author: gibbs > > Date: Tue Jun 14 17:10:32 2011 > > New Revision: 223089 > > URL: http://svn.freebsd.org/changeset/base/223089 > > > > Log: > [...] > > sys/sys/geom/geom.h: > > sys/geom/geom_event.c: > > - Provide the g_attr_changed() function that providers > > can use to advertise attribute changes. > > - Perform delivery of attribute change notifications > > from a thread context via the standard GEOM event > > mechanism. > > Would be nice to discuss it before the commit (or did I miss the > dicussion?). I thought we discussed this at BSDCAN, but perhaps it was when I was talking with mav? Sorry about that. > I was working on something that could be easly merged with > your changes. I had a patch to implement provider's properties change > notification to consumers and devd: > > http://people.freebsd.org/~pjd/patches/geom_property_change.patch > > Currently it implements only mediasize changes, so the upper layers can > act accordingly. The patch also implements ZFS bits to detect vdev size > changes and eventually autoexpand the pool. > > Could you look at the patch and see how we could add property changes to > your API? You say "property," I say "attribute". I used GEOM's string names for properties, you used flags. Other than that, I don't see much difference in the implementations. As far as supporting size changes in the API I committed, the intention is for the API to support notification of arbitrary provider changes, but changes that don't necessarily require consumers to go through a costly "re-taste process". If we define a string constant for the size property, emit that via "g_attr_changed()" in all the places you currently emit g_property_change(), and use strcmp instead of bit tests in your handlers, it should work. If you'd like, I can merge your changes into the API. However, I leave for a two week vacation on Friday, so it will have to be after that. One other issue I need to resolve is that notifying ZFS of things like a physical path change opens us up to lock order reversals. What I'd really like ZFS to do is, during any vdev config generation call, just call a vdev method to refresh attributes. We could then just post a SPA async event from the attr_changed handler and do all of the attribute fetching from the "refresh" method knowing that locks are acquired in the correct order (g_topology last). -- Justin From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 13:00:04 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 432321065675; Wed, 15 Jun 2011 13:00:04 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (ns1.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 157248FC0A; Wed, 15 Jun 2011 13:00:03 +0000 (UTC) Received: from Justins-MacBook-Pro.local (macbook.scsiguy.com [192.168.0.99]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5FD1OPG020269 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 15 Jun 2011 07:01:25 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF8ACD3.1070202@scsiguy.com> Date: Wed, 15 Jun 2011 07:00:03 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4DF7CDD0.8040108@scsiguy.com> <20110615120524.GI1975@garage.freebsd.pl> In-Reply-To: <20110615120524.GI1975@garage.freebsd.pl> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Wed, 15 Jun 2011 07:01:25 -0600 (MDT) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 13:00:04 -0000 On 6/15/11 6:05 AM, Pawel Jakub Dawidek wrote: > On Tue, Jun 14, 2011 at 03:08:32PM -0600, Justin T. Gibbs wrote: > > ZFS rightfully has a lot of safety belts in place to ward off unintended > > data loss. But in some scenarios, the safety belts are so restrictive, > > the only way to proceed is to wipe the label information off of a drive. > > > > Here's an example: > > > > Pull a drive that is active in a pool on one system and stick it into > > another system. ZFS will correctly reject this drive as a member of > > a new pool or as the argument of a replace command. But if you really > > want to use that drive, how do you clear it's "potentially active" > > status? If the pool were imported, you could destroy it, but ZFS wont > > allow you to import a pool unless there are sufficient members for it > > to serve I/O (I know about the undocumented -F option for import, > > but users aren't going to find that). You can use dd to wipe the label > > data off, but where exactly does ZFS keep its for copies of the label? > > In most of the cases like that you can use -f switch, eg. you can create > pool or replace vdev using one that is active when you use that switch. > I'm sure you are aware of this, so I guess it doesn't always work for you? Most of my testing has been on v15, so perhaps the situation is better on v28? On v15, "replace -f" certainly didn't work. Even if "replace -f" does work in v28 (or is made to work), what would be the correct way to just delete the label off of such a drive in the current zpool command set? At Spectra Logic, we've found it very useful in our drive fault testing to be able to easily restore a drive to an unlabeled state in order to verify that ZFSD does the right thing with both labeled and unlabeled drives. If our use case is considered rare, I don't need to push this change back into FreeBSD. However, a quick search indicates that at least some Solaris users have desired a similar command: http://opensolaris.org/jive/thread.jspa?messageID=462337 -- Justin From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 13:07:16 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA962106564A; Wed, 15 Jun 2011 13:07:16 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id AE4FC8FC14; Wed, 15 Jun 2011 13:07:16 +0000 (UTC) Received: from Justins-MacBook-Pro.local (macbook.scsiguy.com [192.168.0.99]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5FD8b0q020298 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 15 Jun 2011 07:08:38 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Message-ID: <4DF8AE84.9000400@scsiguy.com> Date: Wed, 15 Jun 2011 07:07:16 -0600 From: "Justin T. Gibbs" User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4DF7E472.9030601@scsiguy.com> <20110615121529.GJ1975@garage.freebsd.pl> In-Reply-To: <20110615121529.GJ1975@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Wed, 15 Jun 2011 07:08:38 -0600 (MDT) Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Allow async event processing with a R/O root FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 13:07:16 -0000 On 6/15/11 6:15 AM, Pawel Jakub Dawidek wrote: > On Tue, Jun 14, 2011 at 04:45:06PM -0600, Justin T. Gibbs wrote: >> Allow ZFS asynchronous event handling to proceed even if the >> root file system is mounted read-only. This restriction appears >> to have been put in place to avoid errors with updating the >> configuration cache file. However: >> >> o The majority of asynchronous event handling does not involve >> configuration cache file updates. >> o The configuration cache file need not be on the root file system, >> so the check was not complete. > Why is that? I keep cache file on the ZFS root file system. > It is not used by the boot loader, but is loaded by the loader: > > # tail -7 /boot/defaults/loader.conf > You can specify an alternate cache file location that doesn't have to be on the root file system. By "the check was not complete", I mean the "root is R/W" check doesn't ensure that the cache file is located on a file system mounted R/W. -- Justin From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 13:27:50 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59A2C1065673 for ; Wed, 15 Jun 2011 13:27:50 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 9522B8FC15 for ; Wed, 15 Jun 2011 13:27:49 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 9F37345CA6; Wed, 15 Jun 2011 15:27:47 +0200 (CEST) Received: from localhost (58.wheelsystems.com [83.12.187.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 09D5545683; Wed, 15 Jun 2011 15:27:40 +0200 (CEST) Date: Wed, 15 Jun 2011 15:27:36 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110615132736.GL1975@garage.freebsd.pl> References: <4DF7E472.9030601@scsiguy.com> <20110615121529.GJ1975@garage.freebsd.pl> <4DF8AE84.9000400@scsiguy.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="L/Qt9NZ8t00Dhfad" Content-Disposition: inline In-Reply-To: <4DF8AE84.9000400@scsiguy.com> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-3.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, RCVD_IN_SORBS_DUL autolearn=ham version=3.0.4 Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Allow async event processing with a R/O root FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 13:27:50 -0000 --L/Qt9NZ8t00Dhfad Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 15, 2011 at 07:07:16AM -0600, Justin T. Gibbs wrote: > On 6/15/11 6:15 AM, Pawel Jakub Dawidek wrote: > >On Tue, Jun 14, 2011 at 04:45:06PM -0600, Justin T. Gibbs wrote: > >>Allow ZFS asynchronous event handling to proceed even if the > >>root file system is mounted read-only. This restriction appears > >>to have been put in place to avoid errors with updating the > >>configuration cache file. However: > >> > >> o The majority of asynchronous event handling does not involve > >> configuration cache file updates. > >> o The configuration cache file need not be on the root file system, > >> so the check was not complete. > >Why is that? I keep cache file on the ZFS root file system. > >It is not used by the boot loader, but is loaded by the loader: > > > > # tail -7 /boot/defaults/loader.conf > > > You can specify an alternate cache file location that doesn't have to be > on the root file system. By "the check was not complete", I mean the > "root is R/W" check doesn't ensure that the cache file is located on a > file system mounted R/W. Ah, sorry for my lame English. I read 'need not be on the root file system' as your recommendation against putting cache file there. All is clear now. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --L/Qt9NZ8t00Dhfad Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk34s0gACgkQForvXbEpPzSlAACg4xnfHr8/TOA68ohZqtJp8EVr /LEAoI4QgiMLUS0WP7WTnc+tI2LBrpvd =YhvI -----END PGP SIGNATURE----- --L/Qt9NZ8t00Dhfad-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 13:32:39 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5409A106566C for ; Wed, 15 Jun 2011 13:32:39 +0000 (UTC) (envelope-from pvz@itassistans.se) Received: from zcs1.itassistans.net (zcs1.itassistans.net [212.112.191.37]) by mx1.freebsd.org (Postfix) with ESMTP id 0A0088FC0C for ; Wed, 15 Jun 2011 13:32:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs1.itassistans.net (Postfix) with ESMTP id DD2E6C026A; Wed, 15 Jun 2011 15:13:35 +0200 (CEST) X-Virus-Scanned: amavisd-new at zcs1.itassistans.net Received: from zcs1.itassistans.net ([127.0.0.1]) by localhost (zcs1.itassistans.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id psUul7M5Tvi9; Wed, 15 Jun 2011 15:13:35 +0200 (CEST) Received: from [192.168.1.239] (c213-89-160-61.bredband.comhem.se [213.89.160.61]) by zcs1.itassistans.net (Postfix) with ESMTPSA id 7386DC0267; Wed, 15 Jun 2011 15:13:35 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Per von Zweigbergk In-Reply-To: <4DF8ACD3.1070202@scsiguy.com> Date: Wed, 15 Jun 2011 15:13:23 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4DF7CDD0.8040108@scsiguy.com> <20110615120524.GI1975@garage.freebsd.pl> <4DF8ACD3.1070202@scsiguy.com> To: "Justin T. Gibbs" X-Mailer: Apple Mail (2.1084) Cc: Pawel Jakub Dawidek , fs@FreeBSD.org Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 13:32:39 -0000 We here at IT-assistans have found the ability to clear ZFS labels very = useful for testing. We have an internal utility written in-house to = clear ZFS labels. It's rather dumb and dangerous which is why I have = chosen not to publish it. (Oh, the irony of being held accountable for a = utility designed to cause data loss causing data loss.) Having a = labelclear command with some extra molly guards on it in the normal = userspace ZFS commands would be useful to us at least for testing.= From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 14:09:11 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1BEE106566B; Wed, 15 Jun 2011 14:09:11 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (ns1.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 897028FC14; Wed, 15 Jun 2011 14:09:11 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5FEAWTY020636 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 15 Jun 2011 08:10:32 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) Message-ID: <4DF8BD01.5040206@FreeBSD.org> Date: Wed, 15 Jun 2011 08:09:05 -0600 From: "Justin T. Gibbs" Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <201106141710.p5EHAXYS044119@svn.freebsd.org> <20110615094202.GB1975@garage.freebsd.pl> <4DF8A934.8070500@FreeBSD.org> <20110615132458.GK1975@garage.freebsd.pl> In-Reply-To: <20110615132458.GK1975@garage.freebsd.pl> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Wed, 15 Jun 2011 08:10:32 -0600 (MDT) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: fs@FreeBSD.org Subject: Re: svn commit: r223089 - in head: sys/cam/ata sys/cam/scsi sys/geom sys/sys usr.sbin/diskinfo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gibbs@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 14:09:11 -0000 On 6/15/11 7:24 AM, Pawel Jakub Dawidek wrote: > On Wed, Jun 15, 2011 at 06:44:36AM -0600, Justin T. Gibbs wrote: > >> http://people.freebsd.org/~pjd/patches/geom_property_change.patch > >> > >> Currently it implements only mediasize changes, so the upper layers can > >> act accordingly. The patch also implements ZFS bits to detect vdev size > >> changes and eventually autoexpand the pool. > >> > >> Could you look at the patch and see how we could add property changes to > >> your API? > > > > You say "property," I say "attribute". I used GEOM's string names for > > properties, you used flags. Other than that, I don't see much difference > > in the implementations. > > > > As far as supporting size changes in the API I committed, the intention > > is for the API to support notification of arbitrary provider changes, > > but changes that don't necessarily require consumers to go through a > > costly "re-taste process". If we define a string constant for the size > > property, emit that via "g_attr_changed()" in all the places you currently > > emit g_property_change(), and use strcmp instead of bit tests in your > > handlers, it should work. > > Well, you notify about attributes passed around with BIO_GETATTR, my > change is about changes in provider properties like mediasize and maybe > name in the future. This is different namespace. But I agree that > reserving and using "mediasize" or "GEOM::mediasize" should be fine. Ah. I understand your distinction now. I think it would be best to just have one notification scheme regardless of how the properties/attributes are accessed (directly in a struct or via a getattr call). I even considered folding the spoiling stuff into this but then thought that might be considered too radical a change. As for a size change, at what point is it safe to change the size field in the provider? I know that the ZFS vdevs cache the size data, so the provider bumping its size field shouldn't be a problem, but what about other GEOM consumers? Will the GEOM RAID transforms suddenly and unintentionaly start putting their label information in a different location? Similar situations may apply to other properties/attributes. -- Justin From owner-freebsd-fs@FreeBSD.ORG Wed Jun 15 15:24:27 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B8F61065673; Wed, 15 Jun 2011 15:24:27 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 163348FC15; Wed, 15 Jun 2011 15:24:26 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 2143545C89; Wed, 15 Jun 2011 17:24:25 +0200 (CEST) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 1A69145683; Wed, 15 Jun 2011 17:24:19 +0200 (CEST) Date: Wed, 15 Jun 2011 17:24:14 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20110615152414.GA2068@garage.freebsd.pl> References: <201106141710.p5EHAXYS044119@svn.freebsd.org> <20110615094202.GB1975@garage.freebsd.pl> <4DF8A934.8070500@FreeBSD.org> <20110615132458.GK1975@garage.freebsd.pl> <4DF8BD01.5040206@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="J/dobhs11T7y2rNN" Content-Disposition: inline In-Reply-To: <4DF8BD01.5040206@FreeBSD.org> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: fs@FreeBSD.org Subject: Re: svn commit: r223089 - in head: sys/cam/ata sys/cam/scsi sys/geom sys/sys usr.sbin/diskinfo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2011 15:24:27 -0000 --J/dobhs11T7y2rNN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 15, 2011 at 08:09:05AM -0600, Justin T. Gibbs wrote: > As for a size change, at what point is it safe to change the size field > in the provider? I know that the ZFS vdevs cache the size data, so the > provider bumping its size field shouldn't be a problem, but what about ot= her > GEOM consumers? Will the GEOM RAID transforms suddenly and unintentionaly > start putting their label information in a different location? Similar > situations may apply to other properties/attributes. I thought about that - I was wondering if we should allow given consumer to veto the change, but it will be too complex for various reasons. For example if you change disk size for your virtual machine it would be hard to report the error back. Another problem is that when you have more than one consumer and you start inform them about size change what would you do if the last one returns an error? Would you inform the previous consumers that provider shrinked? It might be too late. Maybe the default behaviour (unless you override it) should be to disconnect from such provider (eg. by sending the orphan event to consumers that don't handle mediasize change)? Currently if a GEOM class is offline and you resize partition that the class "owns" and you bring the class online it won't be able to find its metadata or will do something strange. We consider it an administrator mistake. Doing online resize is a bit different but maybe not that much different and we should also consider it the same? --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --J/dobhs11T7y2rNN Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk34zp0ACgkQForvXbEpPzTOCwCgovlkOfeW8xcjeaF2U5AlmS5z fD4An2HRmfD/5PIPhFt8Jshq7Z/qR1ty =vaLO -----END PGP SIGNATURE----- --J/dobhs11T7y2rNN-- From owner-freebsd-fs@FreeBSD.ORG Thu Jun 16 14:52:20 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05D531065672 for ; Thu, 16 Jun 2011 14:52:20 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id BEA0A8FC12 for ; Thu, 16 Jun 2011 14:52:19 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAD0Y+k2DaFvO/2dsb2JhbABShEmjDqoNjkiQeYErg3KBCgSRWZAS X-IronPort-AV: E=Sophos;i="4.65,375,1304308800"; d="scan'208";a="124209774" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 16 Jun 2011 10:52:18 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F2E42B3F09 for ; Thu, 16 Jun 2011 10:52:18 -0400 (EDT) Date: Thu, 16 Jun 2011 10:52:18 -0400 (EDT) From: Rick Macklem To: FreeBSD FS Message-ID: <2030796212.662722.1308235938983.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Subject: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jun 2011 14:52:20 -0000 Hi, I'm doing NFS interop testing this week and found out that the Linux NFSv4 client needs to do the Access Op during mounting. This is a one line patch, but I realized that if this exports semantic should change, that now is seems to be the right time. Background: So that clients could do NFSv4 mounts using the same paths as would be used for NFSv3, I put a "hack" in the NFSv4 server that allowed a minimal set of non-modifying operations be done on file systems that aren't exported so that they could be traversed during a mount. Subsequent to this, I found out that this doesn't work for ZFS. So, currently... - The semantics for UFS/FFS are not the same as for ZFS. - Allowing this minimal set of operations introduces the potential for a security risk because... - This semantic is confusing to users. (Partially because the exports.5 man page doesn't explain it well/at all.) As such, I think it might be better to remove the "hack" and simply require that all file systems from the NFSv4 root down be exported (which is what is needed for ZFS now, afaik). The downside of doing this is that the mount paths for NFSv4 are different than for NFSv3 unless all file systems on the server are at least exported read-only. (A work around is to build a small file system that mimicks the directory tree above the exported file systems with leaves that point to the mount points and export that with the root of that being the NFSv4 root. At least I think this works, although I haven't tested it with symlinks for the leaves to the mount points.) So, what to you think w.r.t. removing this "hack" for FreeBSD9? Thanks in advance for any comments, rick ps: Sorry this is so long, but I thought I'd better try and explain it. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 16 22:11:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A07A106566C for ; Thu, 16 Jun 2011 22:11:52 +0000 (UTC) (envelope-from ady@ady.ro) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0BCF38FC0A for ; Thu, 16 Jun 2011 22:11:51 +0000 (UTC) Received: by ewy1 with SMTP id 1so1028907ewy.13 for ; Thu, 16 Jun 2011 15:11:50 -0700 (PDT) Received: by 10.14.127.68 with SMTP id c44mr618087eei.103.1308260892681; Thu, 16 Jun 2011 14:48:12 -0700 (PDT) MIME-Version: 1.0 Sender: ady@ady.ro Received: by 10.14.27.205 with HTTP; Thu, 16 Jun 2011 14:47:48 -0700 (PDT) From: Adrian Penisoara Date: Thu, 16 Jun 2011 23:47:48 +0200 X-Google-Sender-Auth: shoRHM4eIOD7aZII6oGRM765p78 Message-ID: To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Pawel Jakub Dawidek Subject: ZFS sharenfs mangling NFS options X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jun 2011 22:11:52 -0000 Hi, I was nicely surprised to see that the "sharenfs" option for a ZFS dataset does the expected trick and automatically shares on NFS the dataset via /etc/zfs/exports. However, I had some not so pleasant experiences customizing the NFS sharing parameters -- e.g. when the contents of the "sharenfs" property was automatically translated into [/etc/zfs/]exports entries: * whatever hostname contains a "-" (dash) it gets malformed by being split over the dash character * whatever NFS parameter is prefixed with a "-" (dash) and it's not the first in the list it gets transformed into a hostname entry After some hunting into the CDDL sources I have been able to pinpoint the exact (library) code doing the translation as function translate_opts() in src/cddl/compat/opensolaris/misc/fsshare.c : while ((o = strsep(&s, "-, ")) != NULL) { if (o[0] == '\0') continue; for (i = 0; known_opts[i] != NULL; i++) { len = strlen(known_opts[i]); if (strncmp(known_opts[i], o, len) == 0 && (o[len] == '\0' || o[len] == '=')) { strlcat(newopts, "-", sizeof(newopts)); break; } } strlcat(newopts, o, sizeof(newopts)); strlcat(newopts, " ", sizeof(newopts)); } If I'm able to read C correctly, then it looks like the code above fails to take into consideration the case of hostnames containing dashes and the case of options prefixed with dashes (although it is advertised as valid format in the comments). On the other hand, I'm not too convinced that the contents of the sharenfs property should be translated at all (when different from values "on" and "off") -- I can't seem to find a good documentation reference for [Open]Solaris, but it looks like they are expecting share(1M) style options. I would assume then that for FreeBSD it would await exports(5) style options, so then why was this translation step needed anyway ? Take into account that known_opts[] is a statically assigned vector of strings identifying valid exports(5) keywords -- thus it needs to be kept in sync with mountd code, which was already the case recently with new NFSv4 option(s). If any translation should be occurring at all then I would expect a single common syntax for sharenfs among all OSes (e.g. share(1M) style options) and the translation to be done over to the OS'es native exports(5) format. Thank you for your time, Adrian Penisoara EnterpriseBSD.com From owner-freebsd-fs@FreeBSD.ORG Thu Jun 16 22:34:40 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66393106564A; Thu, 16 Jun 2011 22:34:40 +0000 (UTC) (envelope-from ady@ady.ro) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6E64F8FC08; Thu, 16 Jun 2011 22:34:39 +0000 (UTC) Received: by eyg7 with SMTP id 7so1037228eyg.13 for ; Thu, 16 Jun 2011 15:34:38 -0700 (PDT) Received: by 10.14.186.14 with SMTP id v14mr617135eem.7.1308263678216; Thu, 16 Jun 2011 15:34:38 -0700 (PDT) MIME-Version: 1.0 Sender: ady@ady.ro Received: by 10.14.27.205 with HTTP; Thu, 16 Jun 2011 15:34:18 -0700 (PDT) In-Reply-To: References: From: Adrian Penisoara Date: Fri, 17 Jun 2011 00:34:18 +0200 X-Google-Sender-Auth: imWZVtv09jYQjEcccKCy3v0hk34 Message-ID: To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Pawel Jakub Dawidek Subject: Re: ZFS sharenfs mangling NFS options X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jun 2011 22:34:40 -0000 On Thu, Jun 16, 2011 at 11:47 PM, Adrian Penisoara wro= te: > Hi, > > =A0I was nicely surprised to see that the "sharenfs" option for a ZFS > dataset does the expected trick and automatically shares on NFS the > dataset via /etc/zfs/exports. However, I had some not so pleasant > experiences customizing the NFS sharing parameters -- e.g. when the > contents of the "sharenfs" property was automatically translated into > [/etc/zfs/]exports entries: > =A0 * whatever hostname contains a "-" (dash) it gets malformed by > being split over the dash character > =A0 * whatever NFS parameter is prefixed with a "-" (dash) and it's not > the first in the list it gets transformed into a hostname entry > > =A0After some hunting into the CDDL sources I have been able to > pinpoint the exact (library) code doing the translation as function > translate_opts() in src/cddl/compat/opensolaris/misc/fsshare.c : > > =A0 =A0 =A0 =A0while ((o =3D strsep(&s, "-, ")) !=3D NULL) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (o[0] =3D=3D '\0') > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for (i =3D 0; known_opts[i] !=3D NULL; i++= ) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0len =3D strlen(known_opts[= i]); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (strncmp(known_opts[i],= o, len) =3D=3D 0 && > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(o[len] =3D=3D '\0= ' || o[len] =3D=3D '=3D')) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0strlcat(ne= wopts, "-", sizeof(newopts)); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0strlcat(newopts, o, sizeof(newopts)); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0strlcat(newopts, " ", sizeof(newopts)); > =A0 =A0 =A0 =A0} > > =A0If I'm able to read C correctly, then it looks like the code above > fails to take into consideration the case of hostnames containing > dashes and the case of options prefixed with dashes (although it is > advertised as valid format in the comments). To be more clear, here is a sample sharenfs property entry: "-maproot=3D0 -alldirs clienf-nfs1" This gets transformed to something like "-maproot=3D0 alldirs client nfs1" = ... Regards, Adrian Penisoara EntepriseBSD.com From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 02:19:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D178F106564A for ; Fri, 17 Jun 2011 02:19:25 +0000 (UTC) (envelope-from thejll@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 894808FC14 for ; Fri, 17 Jun 2011 02:19:25 +0000 (UTC) Received: by qyk30 with SMTP id 30so193111qyk.13 for ; Thu, 16 Jun 2011 19:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jlauser.net; s=google; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:content-type:content-transfer-encoding; bh=1FbHiehdyoAjm+BHW+DBlsnFSo4t0zHTBRyJlu8PatQ=; b=iIkpKZY97jJ6cSspBDu+hZntyieHgy4NbXDi9T/d4OkFHYr/Pq3of1Foq1FMP/SJO/ w7B3NJdLifUshx57HTakjYC/RI4lZ7uuUFQrKkoXwuvx70neQkiaPLOTA/YMqNN/WgUZ P2ruBvt/MqBxFmGhg8H6k3UEGmGh5tBNhZvnQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=jlauser.net; s=google; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=YR7InTRg4/zK6HWKf0qAzZX4cQQA1IWEeKFlcOy8p6vP098L8yOqZQp0Kf83igqHD8 xl4sdEFa5tPuU5VFybruPQObSV09BYv5cZoFYTE2hfoU8qKMR+9U2RdQM7AXFoBQ75Lb CqLF6RrWQZb7aqvrRLlmj3Vlrs+xdb85XCFVQ= Received: by 10.224.191.10 with SMTP id dk10mr1109178qab.170.1308275326721; Thu, 16 Jun 2011 18:48:46 -0700 (PDT) Received: from [10.0.10.253] (cpe-74-76-129-142.nycap.res.rr.com [74.76.129.142]) by mx.google.com with ESMTPS id j18sm1555605qck.27.2011.06.16.18.48.45 (version=SSLv3 cipher=OTHER); Thu, 16 Jun 2011 18:48:45 -0700 (PDT) Message-ID: <4DFAB27B.7030402@jlauser.net> Date: Thu, 16 Jun 2011 21:48:43 -0400 From: "James L. Lauser" User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Another zfs sharenfs issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 02:19:26 -0000 Adrian's email reminded me that I'm having a persistent issues with 'zfs sharenfs' as well, which I don't think is related. On my server, I have two datasets exported, pool/mythtv and pool/home/james: [Sledge:~] james$ zfs get sharenfs | grep local pool/home/james sharenfs -alldirs -mapall james quadraplex local pool/mythtv sharenfs -alldirs -mapall mythtv herman local Whenever I have to reboot sledge, I lose the ability to mount my home directory export from quadraplex. When I attempt the mount, I get a 'permission denied' error from the server. Running "sudo zfs set sharenfs='-alldirs -mapall james quadraplex' pool/home/james" (setting it to what it's already set to) fixes the problem, and it remains fixed until I reboot the machine again. This ONLY affects the share to quadraplex. The mythtv share to herman works without issue. The server is 8-STABLE (built a few days ago, right after ZFSv28 was MFC'd) on amd64. The machine is currently running ZFSv28, but this problem has existed since well before that upgrade. I am running the "experimental" NFS server, but not using NFSv4. Switching to the old server does not seem to affect anything. Both clients are Ubuntu 11.04 on x86_64. The only thought I had was the fact that I'm trying to export /home/james, but /home itself is not exported. Exporting it as well, however, did not fix the problem. Other useful info: [Sledge:~] james$ uname -a FreeBSD Sledge.home.jlauser.net 8.2-STABLE FreeBSD 8.2-STABLE #13: Wed Jun 8 00:08:53 EDT 2011 root@Sledge.home.jlauser.net:/usr/obj/usr/src/sys/GENERIC amd64 [Sledge:~] james$ grep nfs /etc/rc.conf nfs_server_enable="YES" nfsv4_server_enable="YES" [Quadraplex:~] james$ grep sledge /etc/fstab sledge:/home/james /mnt/sledge nfs rw,tcp,nfsvers=3,async 0 0 Any insight would be appreciated, though seeing as how I only normally reboot the server about 4 times per year, this isn't exactly a very high priority issue. -- -- James L. Lauser james@jlauser.net http://jlauser.net/ From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 03:45:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F12F1065670 for ; Fri, 17 Jun 2011 03:45:51 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id 3FF658FC18 for ; Fri, 17 Jun 2011 03:45:50 +0000 (UTC) Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta03.emeryville.ca.mail.comcast.net with comcast id wriN1g0051vN32cA3rlpd3; Fri, 17 Jun 2011 03:45:49 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta22.emeryville.ca.mail.comcast.net with comcast id wrle1g01z1t3BNj8irlfTj; Fri, 17 Jun 2011 03:45:41 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 63919102C19; Thu, 16 Jun 2011 20:45:47 -0700 (PDT) Date: Thu, 16 Jun 2011 20:45:47 -0700 From: Jeremy Chadwick To: "James L. Lauser" Message-ID: <20110617034547.GA97087@icarus.home.lan> References: <4DFAB27B.7030402@jlauser.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DFAB27B.7030402@jlauser.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Another zfs sharenfs issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 03:45:51 -0000 On Thu, Jun 16, 2011 at 09:48:43PM -0400, James L. Lauser wrote: > Adrian's email reminded me that I'm having a persistent issues with > 'zfs sharenfs' as well, which I don't think is related. > > On my server, I have two datasets exported, pool/mythtv and pool/home/james: > > [Sledge:~] james$ zfs get sharenfs | grep local > pool/home/james sharenfs -alldirs -mapall james > quadraplex local > pool/mythtv sharenfs -alldirs -mapall mythtv > herman local > > Whenever I have to reboot sledge, I lose the ability to mount my > home directory export from quadraplex. When I attempt the mount, I > get a 'permission denied' error from the server. Running "sudo zfs > set sharenfs='-alldirs -mapall james quadraplex' pool/home/james" > (setting it to what it's already set to) fixes the problem, and it > remains fixed until I reboot the machine again. This ONLY affects > the share to quadraplex. The mythtv share to herman works without > issue. > > The server is 8-STABLE (built a few days ago, right after ZFSv28 was > MFC'd) on amd64. The machine is currently running ZFSv28, but this > problem has existed since well before that upgrade. I am running > the "experimental" NFS server, but not using NFSv4. Switching to > the old server does not seem to affect anything. > > Both clients are Ubuntu 11.04 on x86_64. > > The only thought I had was the fact that I'm trying to export > /home/james, but /home itself is not exported. Exporting it as > well, however, did not fix the problem. > > Other useful info: > > [Sledge:~] james$ uname -a > FreeBSD Sledge.home.jlauser.net 8.2-STABLE FreeBSD 8.2-STABLE #13: > Wed Jun 8 00:08:53 EDT 2011 > root@Sledge.home.jlauser.net:/usr/obj/usr/src/sys/GENERIC amd64 > [Sledge:~] james$ grep nfs /etc/rc.conf > nfs_server_enable="YES" > nfsv4_server_enable="YES" > > [Quadraplex:~] james$ grep sledge /etc/fstab > sledge:/home/james /mnt/sledge nfs > rw,tcp,nfsvers=3,async 0 0 > > Any insight would be appreciated, though seeing as how I only > normally reboot the server about 4 times per year, this isn't > exactly a very high priority issue. On our FreeBSD (RELENG_8-based) NFS filer for our local network, we never bothered with the "sharenfs" attribute of the filesystems because, simply put, it didn't seem to work reliably. We use /etc/exports natively and everything Just Works(tm). We've had literally zero problems over the years with this method, and have rebooted the filer numerous times without any repercussions on the client side. Given that this is the 2nd "sharenfs is wonky" thread in the past few hours, I'm left wondering why people bother with it and don't just use /etc/exports. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 06:57:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 713281065670 for ; Fri, 17 Jun 2011 06:57:50 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 1A1A98FC14 for ; Fri, 17 Jun 2011 06:57:49 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC43341.dip.t-dialin.net [79.196.51.65]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 2513C84400D; Fri, 17 Jun 2011 08:57:36 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::3:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 5B86725F8; Fri, 17 Jun 2011 08:57:33 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1308293853; bh=N+A+L/qiqwJQ8uwVd4ajAb0Kn+Axflfr8tMnOyF4tA0=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=kjBIoVPfyodtZHZca3XoT24CPJaRd9n1w0EOe2HtmVjhq9nnSNIvYHAjdzMUqjoG1 m8v2EkfPT+woZ7nFrd6fpxnjQKw5UQLgT4/Nu3bUszxHp/Okkr2XfdT0Tzj8V1hbgD W+jd1IhToEjQEXn1PfYVPZmRgHFo1zpA2uDkqByHA2WKMv+Ol9yJ/oVfpNMHqKPltZ 36+FtqMMOWz/Y8T1wLC7WfKeMP2g5yhc4E8jRiPv5HiItc2/CDN02cBFFGDWuDucLy rNj4jqoewalgqyN1uotMgo3Wf/2PJ3+YRMwjKKL67WwY/VDHonrV/F15N8CSEhLgrO x1qISf3ylg69A== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.14.4/Submit) id p5H6vWKX012543; Fri, 17 Jun 2011 08:57:32 +0200 (CEST) (envelope-from Alexander@Leidinger.net) X-Authentication-Warning: webmail.leidinger.net: www set sender to Alexander@Leidinger.net using -f Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Fri, 17 Jun 2011 08:57:32 +0200 Message-ID: <20110617085732.34932j5fvh8v93vg@webmail.leidinger.net> Date: Fri, 17 Jun 2011 08:57:32 +0200 From: Alexander Leidinger To: Rick Macklem References: <2030796212.662722.1308235938983.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <2030796212.662722.1308235938983.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 2513C84400D.A13EA X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.1, required 6, autolearn=disabled, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1308898657.16316@lvsPeaaNEb+yX4MbNLjlIQ X-EBL-Spam-Status: No Cc: FreeBSD FS Subject: Re: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 06:57:50 -0000 Quoting Rick Macklem (from Thu, 16 Jun 2011 10:52:18 -0400 (EDT)): > As such, I think it might be better to remove the "hack" and > simply require that all file systems from the NFSv4 root down > be exported (which is what is needed for ZFS now, afaik). This does not match the behavior on Solaris. There we have pool/not_exported_dataset/exported_dataset and a v4 mount works (I didn't see how to verify if a mounted FS is NFSv4, but I modified /etc/default/nfs to have NFS_CLIENT_VERSMIN=4). Bye, Alexander. -- Marxist Law of Distribution of Wealth: Shortages will be divided equally among the peasants. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 07:00:31 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16EEF1065670 for ; Fri, 17 Jun 2011 07:00:31 +0000 (UTC) (envelope-from marcus@odin.blazingdot.com) Received: from odin.blazingdot.com (odin.blazingdot.com [199.48.133.254]) by mx1.freebsd.org (Postfix) with ESMTP id F0AC78FC43 for ; Fri, 17 Jun 2011 07:00:30 +0000 (UTC) Received: by odin.blazingdot.com (Postfix, from userid 1001) id 641191140ED; Fri, 17 Jun 2011 06:45:22 +0000 (UTC) Date: Fri, 17 Jun 2011 06:45:22 +0000 From: Marcus Reid To: Per von Zweigbergk Message-ID: <20110617064522.GA91945@blazingdot.com> References: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9544F7B9-E286-4266-86E3-B4D1A667CBBD@itassistans.se> X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Disk usage and ZFS deduplication X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 07:00:31 -0000 On Tue, Jun 14, 2011 at 09:19:32AM +0200, Per von Zweigbergk wrote: > I've been following the "Impossible compression ratio on ZFS" thread > with some interest, and it made me ask myself this: > > Let us say we have a hypothetical zfs filesystem with the equally > hypothetical files A and B. The filesystem has deduplication enabled. > Both files have an apparent file size of 100 MB, but 50 MB of that > data is common between the two files and thus can be deduplicated. > This would mean that total disk usage would be 150 MB. > > If you use "du" to determine disk size for a deduplication, what would > be the result? Which file would the common data be accounted to? Or > would it be accounted to both files somehow, in part or in > full? Pretty simple test. [root@luna /root]# zfs create -o mountpoint=/dedup -o dedup=on data/dedup [root@luna /usr/data]# dd if=/dev/urandom of=set_a_50MiB bs=1m count=50 [root@luna /usr/data]# dd if=/dev/urandom of=set_b_50MiB bs=1m count=50 [root@luna /usr/data]# dd if=/dev/urandom of=set_c_50MiB bs=1m count=50 [root@luna /usr/data]# cat set_a_50MiB set_b_50MiB > file_1 [root@luna /usr/data]# cat set_a_50MiB set_c_50MiB > file_2 [root@luna /usr/data]# cp file_1 /dedup [root@luna /usr/data]# cp file_2 /dedup [root@luna /usr/data]# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 101G 32.8G 68.2G 32% 1.33x ONLINE - [root@luna /usr/data]# cd /dedup [root@luna /dedup]# du -sk * 102479 file_1 102479 file_2 Marcus From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 09:10:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C4101065672 for ; Fri, 17 Jun 2011 09:10:28 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id D8E158FC08 for ; Fri, 17 Jun 2011 09:10:27 +0000 (UTC) Received: by gxk28 with SMTP id 28so685539gxk.13 for ; Fri, 17 Jun 2011 02:10:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=bUwN3dXzqVA4Vn84Cf86D/qp1UwIpNpPKRjYBGbXvaQ=; b=XnebTuNLCNi/PavRN0snFfEvDZAl2Tg/EYK87ooFaJjEkLBp9thgB0Xx35ezD8DMaN 08klO5hxpT87Z6Kx7gRmJf9A4gP3ALHa/si3CnBePedR5mXTevGxjxySzNJ7Ure8CkhG dx7J+Fe+gMM2XQIYFopedMmHJBM8sICvKfz5U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=AIzapv9PbCK9Dkh1d6L6pWCMeIZvK4nHDDSp+cHwVQXSrABDh/p5HoyiOT+PdQKKHG EOjm77kobqUGK3awCYB0D4Qfmv2AfHz3tCvMNDirPj7oJ/7zinbRwOJGCWUMCIvNQMsU MoZQBmq7lq09Yl96yOmGrfJJMyrig2DRPvkGE= MIME-Version: 1.0 Received: by 10.236.152.9 with SMTP id c9mr3199586yhk.38.1308301826900; Fri, 17 Jun 2011 02:10:26 -0700 (PDT) Received: by 10.236.102.178 with HTTP; Fri, 17 Jun 2011 02:10:26 -0700 (PDT) In-Reply-To: References: <20110613094803.GA10290@icarus.home.lan> <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> <20110613193529.GA21103@DataIX.net> Date: Fri, 17 Jun 2011 10:10:26 +0100 Message-ID: From: krad To: Steven Hartland Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 09:10:28 -0000 On 13 June 2011 23:50, Steven Hartland wrote: > > ----- Original Message ----- From: "jhell" > To: "Steven Hartland" > Cc: "Jeremy Chadwick" ; > Sent: Monday, June 13, 2011 8:35 PM > > Subject: Re: Impossible compression ratio on ZFS > >> >> > Hi Steve, >> >> Knowing that there were patches out for v28 on 8.X can you confirm that >> in fact you are using v15 ZFS ? I would assume you are because of the >> release but I don't want to do that. >> > > Confirmed this is a pure 8.2 release build machine no additional patches > except for compiling libz without assembly optimisations as thats known > to cause crashes. > > Specifically the following as directed by Xin LI:- > cd /usr/src/lib/libz > make cleandir > make cleandir (yes, do it the second time) > make MACHINE_ARCH=x86_64 obj depend all > make MACHINE_ARCH=x86_64 install > > > > If not, then seeing you have compression turned on... did you just dump >> that whole table into the database ? its quite possible that the >> compression was still happening in ARC before it was finally written out >> and this would also explain why that happened. >> > > The table was just rebuilt due to changing an index, so in effect yes > the data would have been copied from the old table into a fresh new copy > and then renamed. > > > Also what level of compression are you using ? >> > > Standard lzjb, which is achieving 1.9 overall and 2.45 on this table file. > > Does indeed sound like this data was still being processed in some way but > surprised it took quite so long to show something other than the initial > file > creation size. > > Its not a big issue in this case, but does raise concerns that if it wasn't > showing the "correct" file size that the data may not have been commited to > disk, hence could have been unsafe for this quite extended period. > > Setting that may be relavent in the case within mysql are:- > innodb_log_file_size = 1024M > innodb_log_buffer_size = 8M > innodb_flush_method = O_DIRECT > innodb_use_native_aio = 1 > > So its possible that the table was in the innodb log, but I've never > witnessed that before tbh but its also only very recently we have moved > our db server from ufs to zfs, hence the questions. > > > Regards > Steve > > > > ==============================**================== > This e.mail is private and confidential between Multiplay (UK) Ltd. and the > person or entity to whom it is addressed. In the event of misdirection, the > recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > ______________________________**_________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org > " > is that cvs'd to release or stable though? If stable when? zfs v28 was commited to stable a week or so ago. Do a 'zpool upgrade' to check From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 09:29:31 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7191106567E for ; Fri, 17 Jun 2011 09:29:31 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6F40F8FC1E for ; Fri, 17 Jun 2011 09:29:31 +0000 (UTC) Received: by gyb13 with SMTP id 13so802964gyb.13 for ; Fri, 17 Jun 2011 02:29:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=0JAVTop1TKoIWFOLBHw9GnLXGNj6pllOCNztnX78nwc=; b=SWAB0mXYOQr/LnrUu/qb5Vg4jBzoBvNskRZKK3ZTlFmOIfmPv5th77+PtFRLjwQSbF zEFriwTRgCD8mQFtT/r1qLUsMJKv2ZYf3vXMt5s9ai6gU4RJ5Cg/KpqNyJ72F1y8g22+ Z2aTtTQ3w+4fMUBP5jb2m5RFzKyBVYwBGF978= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=fMXcvf/DsxUtfC97LdvZmfWIEVCqc1gYgq5fSYsipnajk8o8FanwtcKI7t2modOXMy 5113WCA8/TPiVfwblJ+TtSScpSCsmRMM7dVBny0HGPGaoFfwu3GVq6bT5yLMh6zNX0xl 7G5mkscK6fuvXatUMSoDWLfn3j5/YyHlpl6gQ= MIME-Version: 1.0 Received: by 10.236.109.136 with SMTP id s8mr2652031yhg.306.1308301351010; Fri, 17 Jun 2011 02:02:31 -0700 (PDT) Received: by 10.236.102.178 with HTTP; Fri, 17 Jun 2011 02:02:30 -0700 (PDT) In-Reply-To: References: <4DF7CDD0.8040108@scsiguy.com> <20110615120524.GI1975@garage.freebsd.pl> <4DF8ACD3.1070202@scsiguy.com> Date: Fri, 17 Jun 2011 10:02:30 +0100 Message-ID: From: krad To: Per von Zweigbergk Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "Justin T. Gibbs" , Pawel Jakub Dawidek , fs@freebsd.org Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 09:29:31 -0000 On 15 June 2011 14:13, Per von Zweigbergk wrote: > We here at IT-assistans have found the ability to clear ZFS labels very > useful for testing. We have an internal utility written in-house to clear > ZFS labels. It's rather dumb and dangerous which is why I have chosen not to > publish it. (Oh, the irony of being held accountable for a utility designed > to cause data loss causing data loss.) Having a labelclear command with some > extra molly guards on it in the normal userspace ZFS commands would be > useful to us at least for > testing._______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Its worth noting that its not necessarily a good idea to let zfs handle a raw disk. There is a good argument that you should partition/slice it up with a gig or so less capacity than the drive has.This is due to all drives X TB capacity not being equal. One block less and you new 2TB drive wont work in you existing array. Because of this I always manually gpt up my disk 1st. Therefore I only have to wipe the GPT label to clear the disk From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 10:02:11 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCDE2106564A for ; Fri, 17 Jun 2011 10:02:11 +0000 (UTC) (envelope-from edhoprima@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5DA768FC14 for ; Fri, 17 Jun 2011 10:02:10 +0000 (UTC) Received: by bwz12 with SMTP id 12so2881752bwz.13 for ; Fri, 17 Jun 2011 03:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=m7gOMfs0dQHhtzSFVvMm3dEBNFbPsXQ4eiQGucPne0o=; b=w86ang8olUZWs1LO5mp1ZGMp5ZYouZX+1U/2VCx5OVOIssv+dr/6ByriWMjoADWJ0N qMATc9pGIhTqul5iwc/x5y83EHYijjhczJVi9C+q5A3bAL9Xaq8tIIAS8Newz3zdu1Dg iWiPAEyp5uHp59Ioi+Y9v0NvAAGnLXgUfuIUk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=tHLFIfXswBJbmNpwcshcxqugGLtYC+kON3s5mDagw5Bjv3iZ6bkjkXgMMYbh+gQQB5 47dEHzr85xda2gwwNpUnxj6yDAjNOcWYOFp61Y5BjeLhxK5WxswnoVX1qe0ghPEMSMOE +q3pinZQThEyHBHEViQGmyozvGgusbAj7PnKI= Received: by 10.204.128.198 with SMTP id l6mr1481723bks.19.1308303517016; Fri, 17 Jun 2011 02:38:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.144.207 with HTTP; Fri, 17 Jun 2011 02:37:48 -0700 (PDT) In-Reply-To: References: <4DF7CDD0.8040108@scsiguy.com> <20110615120524.GI1975@garage.freebsd.pl> <4DF8ACD3.1070202@scsiguy.com> From: Edho P Arief Date: Fri, 17 Jun 2011 16:37:48 +0700 Message-ID: To: krad Content-Type: text/plain; charset=UTF-8 Cc: "Justin T. Gibbs" , Pawel Jakub Dawidek , fs@freebsd.org Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 10:02:11 -0000 On Fri, Jun 17, 2011 at 4:02 PM, krad wrote: > > Its worth noting that its not necessarily a good idea to let zfs handle a > raw disk. There is a good argument that you should partition/slice it up > with a gig or so less capacity than the drive has.This is due to all drives > X TB capacity not being equal. One block less and you new 2TB drive wont > work in you existing array. Actually it will. At least up to 10 MB based on my previous experience (under openindiana oi_148, at least). From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 10:05:34 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0682A106564A for ; Fri, 17 Jun 2011 10:05:34 +0000 (UTC) (envelope-from edhoprima@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7B6888FC13 for ; Fri, 17 Jun 2011 10:05:33 +0000 (UTC) Received: by bwz12 with SMTP id 12so2884733bwz.13 for ; Fri, 17 Jun 2011 03:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=azZ84DxyL93O0wx3wfUpaigtJCjYcHAchr5Owcy0OpA=; b=uzu1Mhbz7AhUJB/juYM29JDTmpOMnfboUZdr/PudJh5+sRLkxlMq7PbveCuG8lwwyy afsEooGd+Yv2Y5UqdU4wXNTDg+2goST11oFN3bD6pMUibj5TSyud2thJdpx8ADvrxeHd AnYhob9Io/zG09kHNdqlYWbWE0/Uuy30nAFX0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=iiDGRS0qF2GrtSWf+jKpiBw0Czx4Gnc0dAGKGuWvHX7kQumX5YgvQ0ss8WD2TGchsd WpT9M6rCWMEdZ7sUXMgNr33gJ+o7io2nGCbhZT7h+DVzs01icRsXY7tzaZotif6S9I+W n7iFVrLG77gW0oX12nUZGfYCaKsgjcQd+GQh8= Received: by 10.204.32.65 with SMTP id b1mr1452499bkd.73.1308303634132; Fri, 17 Jun 2011 02:40:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.144.207 with HTTP; Fri, 17 Jun 2011 02:40:14 -0700 (PDT) In-Reply-To: References: <4DF7CDD0.8040108@scsiguy.com> <20110615120524.GI1975@garage.freebsd.pl> <4DF8ACD3.1070202@scsiguy.com> From: Edho P Arief Date: Fri, 17 Jun 2011 16:40:14 +0700 Message-ID: To: krad Content-Type: text/plain; charset=UTF-8 Cc: "Justin T. Gibbs" , Pawel Jakub Dawidek , fs@freebsd.org Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 10:05:34 -0000 On Fri, Jun 17, 2011 at 4:37 PM, Edho P Arief wrote: > On Fri, Jun 17, 2011 at 4:02 PM, krad wrote: >> >> Its worth noting that its not necessarily a good idea to let zfs handle a >> raw disk. There is a good argument that you should partition/slice it up >> with a gig or so less capacity than the drive has.This is due to all drives >> X TB capacity not being equal. One block less and you new 2TB drive wont >> work in you existing array. > > Actually it will. At least up to 10 MB based on my previous experience > (under openindiana oi_148, at least). > It also must be noted that giving raw device to solaris' zpool doesn't make it to use entire disk - zpool command automatically creates gpt label with two partitions (of type usr and reserved) and creates the pool on the usr partition (not entire disk). From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 12:41:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AE1A7106564A for ; Fri, 17 Jun 2011 12:41:17 +0000 (UTC) (envelope-from cmdlnkid@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 695888FC16 for ; Fri, 17 Jun 2011 12:41:17 +0000 (UTC) Received: by iyj12 with SMTP id 12so2813331iyj.13 for ; Fri, 17 Jun 2011 05:41:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:date:from:to:cc:subject:message-id :references:mime-version:content-type:content-disposition :in-reply-to; bh=qwh54qo9cS1phfKfhuDUy4jmDctstfA+bBIIlraVXLI=; b=QxgXaTQ9s30NdZ0dok5ps9pCHwcncpdbUUA0QWHVMckGIbjbwTAvCp8NLbdlh2Oepd CIgcX2o2XhiNbD3ef8JUinH6FsUJT51mYpGgPTuBGsV3YjNB6p4X7twrmwYOQ5uqgHhl ISg2xBhs5YrCSRzADz2m7PWnj8UKNDaUr6pHw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; b=g8eW+Zt9AVg6yInija3FvPAdEOEzi9ryljYQVEv1y4U5bNuMg/aygZOxU41d7sNnGy xCfdiyDWr5TZSo+r4+qL9DOOUWlOrDTWSHXgsjnKG0GXgb7eEvwdS8fKOwgMskUeHPTj YZdCXWQ1gjDzaHTjZakavPtFdvAqhnxh8G1XE= Received: by 10.42.131.71 with SMTP id y7mr2143409ics.315.1308314476561; Fri, 17 Jun 2011 05:41:16 -0700 (PDT) Received: from DataIX.net ([108.73.113.243]) by mx.google.com with ESMTPS id s2sm2545916icw.17.2011.06.17.05.41.14 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 17 Jun 2011 05:41:15 -0700 (PDT) Sender: The Command Line Kid Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.4/8.14.4) with ESMTP id p5HCfCDd012857 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Jun 2011 08:41:12 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.4/8.14.4/Submit) id p5HCfBgw012856; Fri, 17 Jun 2011 08:41:11 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Fri, 17 Jun 2011 08:41:11 -0400 From: jhell To: krad Message-ID: <20110617124111.GA12660@DataIX.net> References: <20110613094803.GA10290@icarus.home.lan> <4E09C82B45BA46019281930B2EB13AC1@multiplay.co.uk> <20110613193529.GA21103@DataIX.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: freebsd-fs@freebsd.org Subject: Re: Impossible compression ratio on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 12:41:17 -0000 On Fri, Jun 17, 2011 at 10:10:26AM +0100, krad wrote: > On 13 June 2011 23:50, Steven Hartland wrote: > > > > > ----- Original Message ----- From: "jhell" > > To: "Steven Hartland" > > Cc: "Jeremy Chadwick" ; > > Sent: Monday, June 13, 2011 8:35 PM > > > > Subject: Re: Impossible compression ratio on ZFS > > > >> > >> > > Hi Steve, > >> > >> Knowing that there were patches out for v28 on 8.X can you confirm that > >> in fact you are using v15 ZFS ? I would assume you are because of the > >> release but I don't want to do that. > >> > > > > Confirmed this is a pure 8.2 release build machine no additional patches > > except for compiling libz without assembly optimisations as thats known > > to cause crashes. > > > > Specifically the following as directed by Xin LI:- > > cd /usr/src/lib/libz > > make cleandir > > make cleandir (yes, do it the second time) > > make MACHINE_ARCH=x86_64 obj depend all > > make MACHINE_ARCH=x86_64 install > > > > > > > > If not, then seeing you have compression turned on... did you just dump > >> that whole table into the database ? its quite possible that the > >> compression was still happening in ARC before it was finally written out > >> and this would also explain why that happened. > >> > > > > The table was just rebuilt due to changing an index, so in effect yes > > the data would have been copied from the old table into a fresh new copy > > and then renamed. > > > > > > Also what level of compression are you using ? > >> > > > > Standard lzjb, which is achieving 1.9 overall and 2.45 on this table file. > > > > Does indeed sound like this data was still being processed in some way but > > surprised it took quite so long to show something other than the initial > > file > > creation size. > > > > Its not a big issue in this case, but does raise concerns that if it wasn't > > showing the "correct" file size that the data may not have been commited to > > disk, hence could have been unsafe for this quite extended period. > > > > Setting that may be relavent in the case within mysql are:- > > innodb_log_file_size = 1024M > > innodb_log_buffer_size = 8M > > innodb_flush_method = O_DIRECT > > innodb_use_native_aio = 1 > > > > So its possible that the table was in the innodb log, but I've never > > witnessed that before tbh but its also only very recently we have moved > > our db server from ufs to zfs, hence the questions. > > > > > > Regards > > Steve > > > > > > > > ==============================**================== > > This e.mail is private and confidential between Multiplay (UK) Ltd. and the > > person or entity to whom it is addressed. In the event of misdirection, the > > recipient is prohibited from using, copying, printing or otherwise > > disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please > > telephone +44 845 868 1337 > > or return the E.mail to postmaster@multiplay.co.uk. > > > > ______________________________**_________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/**mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org > > " > > > > is that cvs'd to release or stable though? If stable when? zfs v28 was > commited to stable a week or so ago. Do a 'zpool upgrade' to check I think its safe to say that by Steve's reply if youve been following that it is v15. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 12:54:03 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BCB01065670 for ; Fri, 17 Jun 2011 12:54:03 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id DE1BD8FC1D for ; Fri, 17 Jun 2011 12:54:02 +0000 (UTC) Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71]) by qmta09.westchester.pa.mail.comcast.net with comcast id x0jg1g0071YDfWL590u3Sv; Fri, 17 Jun 2011 12:54:03 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.westchester.pa.mail.comcast.net with comcast id x0u11g00f1t3BNj3g0u2Lu; Fri, 17 Jun 2011 12:54:02 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 8E24A102C19; Fri, 17 Jun 2011 05:54:00 -0700 (PDT) Date: Fri, 17 Jun 2011 05:54:00 -0700 From: Jeremy Chadwick To: ?????????????? Message-ID: <20110617125400.GA6291@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS raid1 crash kernel panic. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 12:54:03 -0000 On Wed, Jun 15, 2011 at 04:15:56PM +0400, ?????????????? wrote: > I have a ZFS raid1 from 2 drives to 1TB . > Recently, my system OS: FreeBSD 8.2-RELEASE has crashed, with kernel > panic: > > -------------------------------- > panic: solaris assert: ss->ss_end >= end (0x6a80753600 >= 0x6a80753800), > file: > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, > line: 174 > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > panic: solaris assert: ss->ss_end >= end (0x6a80753600 >= 0x6a80753800), > file: > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, > line: 174 > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff805f4e0e at kdb_backtrace+0x5e > #1 0xffffffff805c2d07 at panic+0x187 > #2 0xffffffff80ee36f6 at space_map_remove+0x296 > #3 0xffffffff80ee3d9b at space_map_load+0x1bb > #4 0xffffffff80ed4c19 at metaslab_activate+0x89 > #5 0xffffffff80ed586e at metaslab_alloc+0x6ae > #6 0xffffffff80f00299 at zio_dva_allocate+0x69 > #7 0xffffffff80efe287 at zio_execute+0x77 > #8 0xffffffff80e9e303 at taskq_run_safe+0x13 > #9 0xffffffff805ffeb5 at taskqueue_run_locked+0x85 > #10 0xffffffff8060004e at taskqueue_thread_loop+0x4e > #11 0xffffffff805994f8 at fork_exit+0x118 > #12 0xffffffff8089547e at fork_trampoline+0xe > --------------- > > Reinstall OS and import zfs pool not change anything. > smartctl-a says that everything is OK > Can anybody tell me what is it? The panic happens intentionally due to an assertion check. Assuming these "solaris asserts" work the same way as KASSERT(), then what the output says is: "ss->ss_end should be >= end, but obviously 0x6a80753600 is less than 0x6a80753800, thus panic". ZFS developers, do you have any ideas/explanations for this crash or what Aleksey can provide to help diagnose the cause? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 13:19:18 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8159B106566B for ; Fri, 17 Jun 2011 13:19:18 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id 186918FC1A for ; Fri, 17 Jun 2011 13:19:17 +0000 (UTC) Received: from higson.cam.lispworks.com (higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id p5HD8rOi073197; Fri, 17 Jun 2011 14:08:53 +0100 (BST) (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com (localhost.localdomain [127.0.0.1]) by higson.cam.lispworks.com (8.14.4) id p5HD8rNK030557; Fri, 17 Jun 2011 14:08:53 +0100 Received: (from martin@localhost) by higson.cam.lispworks.com (8.14.4/8.14.4/Submit) id p5HD8rY4030553; Fri, 17 Jun 2011 14:08:53 +0100 Date: Fri, 17 Jun 2011 14:08:53 +0100 Message-Id: <201106171308.p5HD8rY4030553@higson.cam.lispworks.com> From: Martin Simmons To: fs@freebsd.org In-reply-to: (message from krad on Fri, 17 Jun 2011 10:02:30 +0100) References: <4DF7CDD0.8040108@scsiguy.com> <20110615120524.GI1975@garage.freebsd.pl> <4DF8ACD3.1070202@scsiguy.com> Cc: Subject: Re: [CFR][ZFS] Add "zpool labelclear" command. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 13:19:18 -0000 >>>>> On Fri, 17 Jun 2011 10:02:30 +0100, krad said: > > Therefore I only have to wipe the GPT label to clear the disk That doesn't sound sufficient to me. Won't the old ZFS label reappear if you reuse the disk and create the same partitioning scheme? __Martin From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 14:47:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9495E1065673 for ; Fri, 17 Jun 2011 14:47:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 55FBA8FC15 for ; Fri, 17 Jun 2011 14:47:57 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAIpo+02DaFvO/2dsb2JhbABShEmjHIhzrh2QZ4Erg3KBCgSRXpAZ X-IronPort-AV: E=Sophos;i="4.65,381,1304308800"; d="scan'208";a="128212953" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 17 Jun 2011 10:47:57 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4F8BBB3F24; Fri, 17 Jun 2011 10:47:57 -0400 (EDT) Date: Fri, 17 Jun 2011 10:47:57 -0400 (EDT) From: Rick Macklem To: Alexander Leidinger Message-ID: <728179041.718184.1308322077278.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110617085732.34932j5fvh8v93vg@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: FreeBSD FS Subject: Re: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 14:47:58 -0000 > Quoting Rick Macklem (from Thu, 16 Jun 2011 > 10:52:18 -0400 (EDT)): > > > As such, I think it might be better to remove the "hack" and > > simply require that all file systems from the NFSv4 root down > > be exported (which is what is needed for ZFS now, afaik). > > This does not match the behavior on Solaris. There we have > pool/not_exported_dataset/exported_dataset > and a v4 mount works (I didn't see how to verify if a mounted FS is > NFSv4, but I modified /etc/default/nfs to have NFS_CLIENT_VERSMIN=4). > Yes, one of the reasons I originally did the "hack" was that it made things "Solaris compatible". However, I found out Solaris does this by building what generally gets called a "pseudo file system" which, as I understand it, is basically a file system of empty directories that mimmics the unexported paths to the exported ones. You could build such a file system on a small volume. (My comment w.r.t. a workaround.) Isilon does have a pseudo file system, but my most recent discussion with them suggested that theirs might not be suitable for upstreaming. (I once wrote one, but it was garbage that I threw away.:-) rick From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 15:26:03 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7D6C106564A for ; Fri, 17 Jun 2011 15:26:03 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6B0FC8FC15 for ; Fri, 17 Jun 2011 15:26:03 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC43341.dip.t-dialin.net [79.196.51.65]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 3148184400D; Fri, 17 Jun 2011 17:25:48 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::3:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 742452633; Fri, 17 Jun 2011 17:25:45 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1308324345; bh=hQzti5etYbPQdhVJDDmqQ7FBD2bNJ+TRarA1c15UGBQ=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=j2nuiKRCDDtozxp72Ey+z+Ja0k4+WHRwvKYtIonVMiVkBkyFBu2AD+OpWkLcU0xqe jtwASiOrIUfA2jxmPv8MjuUY2ifOqr1+Am6P+l4u4NZ9bkye2IhN4Zpk32fT055TSS S0lEVD3FK1ngBVlYs2+jfF/1UC3JjPCnvQ9qYbUuiLy5VysayC9gjP66CTnd9VLcQ3 0fAExSweL+PrMWPDK0v/diGUUopjvZNry/lfmoeyf3VsEsM547yBXQoNAwPqeBlU68 xH+d/+xArRed1UJv22p1hxQslVjZ0oEvqWQPz4TPKJjXnjo7PE1OMU4f/xzT0BiVgZ lvbiXgqI9mvXw== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.14.4/Submit) id p5HFPjsW043641; Fri, 17 Jun 2011 17:25:45 +0200 (CEST) (envelope-from Alexander@Leidinger.net) X-Authentication-Warning: webmail.leidinger.net: www set sender to Alexander@Leidinger.net using -f Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Fri, 17 Jun 2011 17:25:45 +0200 Message-ID: <20110617172545.175366za32r42gvt@webmail.leidinger.net> Date: Fri, 17 Jun 2011 17:25:45 +0200 From: Alexander Leidinger To: Rick Macklem References: <728179041.718184.1308322077278.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <728179041.718184.1308322077278.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 3148184400D.AF3B0 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.1, required 6, autolearn=disabled, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1308929150.00932@ZR9UMZSt/eg3WmmBTKEEsA X-EBL-Spam-Status: No Cc: FreeBSD FS Subject: Re: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 15:26:04 -0000 Quoting Rick Macklem (from Fri, 17 Jun 2011 10:47:57 -0400 (EDT)): >> Quoting Rick Macklem (from Thu, 16 Jun 2011 >> 10:52:18 -0400 (EDT)): >> >> > As such, I think it might be better to remove the "hack" and >> > simply require that all file systems from the NFSv4 root down >> > be exported (which is what is needed for ZFS now, afaik). >> >> This does not match the behavior on Solaris. There we have >> pool/not_exported_dataset/exported_dataset >> and a v4 mount works (I didn't see how to verify if a mounted FS is >> NFSv4, but I modified /etc/default/nfs to have NFS_CLIENT_VERSMIN=4). >> > Yes, one of the reasons I originally did the "hack" was that it made > things "Solaris compatible". However, I found out Solaris does this by > building what generally gets called a "pseudo file system" which, as I > understand it, is basically a file system of empty directories that > mimmics the unexported paths to the exported ones. You could build such > a file system on a small volume. (My comment w.r.t. a workaround.) The workarounds you propose contradict everything people are used to. They are not easy or you need to care what you put in the parent directories of the one you want to export. It basically means that you can only use NFSv4 on newly setup systems, upgraded or migrated ones look out of the question (yes, I'm over-simplificating a bit). I really hope someone can come up with a fix for this, else it would mean I would not use NFSv4 anywhere. Bye, Alexander. -- My haircut is totally traditional! http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 15:41:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E6101065673 for ; Fri, 17 Jun 2011 15:41:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E1B438FC2B for ; Fri, 17 Jun 2011 15:41:26 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAEF0+02DaFvO/2dsb2JhbABShEmjKYhzrmuQZ4Erg3KBCgSRXpAZ X-IronPort-AV: E=Sophos;i="4.65,382,1304308800"; d="scan'208";a="128222929" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 17 Jun 2011 11:41:24 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 57E9FB3F1F; Fri, 17 Jun 2011 11:41:24 -0400 (EDT) Date: Fri, 17 Jun 2011 11:41:24 -0400 (EDT) From: Rick Macklem To: Alexander Leidinger Message-ID: <183637159.722200.1308325284347.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110617172545.175366za32r42gvt@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: FreeBSD FS Subject: Re: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 15:41:27 -0000 > The workarounds you propose contradict everything people are used to. > They are not easy or you need to care what you put in the parent > directories of the one you want to export. It basically means that you > can only use NFSv4 on newly setup systems, upgraded or migrated ones > look out of the question (yes, I'm over-simplificating a bit). > > I really hope someone can come up with a fix for this, else it would > mean I would not use NFSv4 anywhere. > Ok, can I assume that's a vote for "leave the hack in"? Beyond that, all I can say is the NFSv4 model (and protocol) is very different (one tree rooted at some point). Conversion may or may not be worth the pain, in any individual case. rick From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 18:25:00 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6306B106566B; Fri, 17 Jun 2011 18:25:00 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3AC978FC13; Fri, 17 Jun 2011 18:25:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5HIP0FP091820; Fri, 17 Jun 2011 18:25:00 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5HIOx1c091812; Fri, 17 Jun 2011 18:24:59 GMT (envelope-from rmacklem) Date: Fri, 17 Jun 2011 18:24:59 GMT Message-Id: <201106171824.p5HIOx1c091812@freefall.freebsd.org> To: claudiu.vasadi@gmail.com, rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org From: rmacklem@FreeBSD.org Cc: Subject: Re: kern/157684: [nfs] NFSv4 ignoring "-ro" option in exports file X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 18:25:00 -0000 Synopsis: [nfs] NFSv4 ignoring "-ro" option in exports file State-Changed-From-To: open->closed State-Changed-By: rmacklem State-Changed-When: Fri Jun 17 18:23:20 UTC 2011 State-Changed-Why: An email discussion determined that the file system was actually exported read/write to the client. Some improvements to the exports.5 man page are needed. http://www.freebsd.org/cgi/query-pr.cgi?pr=157684 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 18:28:13 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 896D01065670; Fri, 17 Jun 2011 18:28:13 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 621998FC19; Fri, 17 Jun 2011 18:28:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5HISDei091932; Fri, 17 Jun 2011 18:28:13 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5HISCGR091928; Fri, 17 Jun 2011 18:28:12 GMT (envelope-from rmacklem) Date: Fri, 17 Jun 2011 18:28:12 GMT Message-Id: <201106171828.p5HISCGR091928@freefall.freebsd.org> To: eugene@zhegan.in, rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org From: rmacklem@FreeBSD.org Cc: Subject: Re: kern/157365: [nfs] cannot umount an nfs from dead server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 18:28:13 -0000 Synopsis: [nfs] cannot umount an nfs from dead server State-Changed-From-To: feedback->closed State-Changed-By: rmacklem State-Changed-When: Fri Jun 17 18:27:01 UTC 2011 State-Changed-Why: The changes that allow a "umount -f" to work on an NFS mount point against an unresponsive NFS server are now in stable/8. http://www.freebsd.org/cgi/query-pr.cgi?pr=157365 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 18:43:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39804106566C for ; Fri, 17 Jun 2011 18:43:42 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (unknown [IPv6:2001:470:8761:2:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id E08A28FC0A for ; Fri, 17 Jun 2011 18:43:37 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p5HIgQjn018296; Fri, 17 Jun 2011 11:42:30 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201106171842.p5HIgQjn018296@chez.mckusick.com> To: Hans Ottevanger In-reply-to: <20110617153415.GA92803@testsoekris.hotsoft.nl> Date: Fri, 17 Jun 2011 11:42:26 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@freebsd.org, Jeff Roberson Subject: Re: SU+J: negative used diskspace (for a while) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 18:43:42 -0000 > Date: Fri, 17 Jun 2011 17:34:15 +0200 > From: Hans Ottevanger > To: freebsd-current@freebsd.org > Subject: SU+J: negative used diskspace (for a while) > > Hi, > > I found a possible issue with SU+J on recent versions of -CURRENT. > > After deleting a large file hierarchy (copy of /usr/src, ~1.5 Gbyte), > df reports a negative number of blocks "Used" for a while. > > I am using a GENERIC kernel (r223184) on an amd64 platform. The hardware > is relatively simple: Intel DP965LT mainboard with a Q6600 CPU, 8 Gbyte > RAM and two Samsung 501LJ 500 Gbyte SATA disks. > > The issue can be demonstrated by copying /usr/src to the current directory > (cp -R /usr/src .) and running the following script to delete the copy > and print the free space at 10 second intervals: > > #!/bin/sh > > df . > > time rm -rf src > > echo 'src is gone ...' > > while true > do > df . | tail -1 > sleep 10 > done > > This yields the following output: > > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ada0s1g 416144900 1612066 381241242 0% /home > 51.21 real 1.00 user 17.38 sys > src is gone ... > /dev/ada0s1g 416144900 -164692 383018000 -0% /home > /dev/ada0s1g 416144900 -165082 383018390 -0% /home > /dev/ada0s1g 416144900 -246852 383100160 -0% /home > /dev/ada0s1g 416144900 -246852 383100160 -0% /home > /dev/ada0s1g 416144900 -246852 383100160 -0% /home > /dev/ada0s1g 416144900 -64146 382917454 -0% /home > /dev/ada0s1g 416144900 -64146 382917454 -0% /home > /dev/ada0s1g 416144900 -64146 382917454 -0% /home > /dev/ada0s1g 416144900 32910 382820398 0% /home > /dev/ada0s1g 416144900 32910 382820398 0% /home > > So it takes more than a minute before the disk space is back to "normal" > values. > > After disabling journaling (tunefs -j disable) I get the following output: > > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ada0s1g 416144900 1579284 381274024 0% /home > 35.40 real 0.96 user 13.32 sys > src is gone ... > /dev/ada0s1g 416144900 128 382853180 0% /home > /dev/ada0s1g 416144900 128 382853180 0% /home > /dev/ada0s1g 416144900 128 382853180 0% /home > /dev/ada0s1g 416144900 128 382853180 0% /home > > which is as it should be. > > The problem also does not occur with journaling enabled when I revert > to r222723. > > Is anybody else seeing these weird phenomena? > Could this be related to the recent changes to UFS? > > Kind regards, > > Hans Ottevanger We used to account for deleted blocks at the instant that they were removed. This accounting was rather complex, so as part of doing SU+J, Jeff simplified it. Under the simplification, the removal is not accounted for until part way through the removal process. The result is that you now get these false negative block counts until the blocks have been partially reclaimed. If this behavior causes enough trouble, Jeff might be convinced that the more accurate block accounting is necessary. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Fri Jun 17 22:02:07 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC75C106566B; Fri, 17 Jun 2011 22:02:07 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 94BAF8FC0C; Fri, 17 Jun 2011 22:02:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5HM27Pc092583; Fri, 17 Jun 2011 22:02:07 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5HM2796092579; Fri, 17 Jun 2011 22:02:07 GMT (envelope-from linimon) Date: Fri, 17 Jun 2011 22:02:07 GMT Message-Id: <201106172202.p5HM2796092579@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/157929: [nfs] NFS slow read X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 22:02:07 -0000 Old Synopsis: NFS slow read New Synopsis: [nfs] NFS slow read Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jun 17 22:01:52 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=157929 From owner-freebsd-fs@FreeBSD.ORG Sat Jun 18 00:51:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA7361065674 for ; Sat, 18 Jun 2011 00:51:27 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.westchester.pa.mail.comcast.net (qmta05.westchester.pa.mail.comcast.net [76.96.62.48]) by mx1.freebsd.org (Postfix) with ESMTP id 895738FC13 for ; Sat, 18 Jun 2011 00:51:27 +0000 (UTC) Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98]) by qmta05.westchester.pa.mail.comcast.net with comcast id xCeV1g00327AodY55CrTDN; Sat, 18 Jun 2011 00:51:27 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta19.westchester.pa.mail.comcast.net with comcast id xCrR1g00H1t3BNj3fCrTT5; Sat, 18 Jun 2011 00:51:27 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 7EA40102C19; Fri, 17 Jun 2011 17:51:24 -0700 (PDT) Date: Fri, 17 Jun 2011 17:51:24 -0700 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20110618005124.GA43568@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: MFC: graid(8) (RAID GEOM) support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jun 2011 00:51:27 -0000 Sorry for the cross-post, but I thought both lists would want to know about this. Looks like mav@ just committed this ~17 hours ago: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c Those who have historically wanted to use Intel MatrixRAID (now called Intel RST (Rapid Storage Technology)), but haven't due to the severe issues/risks with ataraid(4), will probably be very interested in this commit. I know I am! I plan on stress-testing the Intel support on a 2-disk system with RAID-1 enabled, and will document my experiences, procedures, etc... Thanks, mav@ and imp@ ! I'll be sending another mail momentarily asking about USB memory stick image building, since to accomplish the above, I want to do a "bare-bones" install on our test system (e.g. enable Intel RAID, set up 2 disks in a RAID-1 mirror, boot a USB memory stick that contains this latest RELENG_8 build, and do sysinstall, etc.. the normal way). ===================================================================== MFC r219974, r220209, r220210, r220790: Add new RAID GEOM class, that is going to replace ataraid(4) in supporting various BIOS-based software RAIDs. Unlike ataraid(4) this implementation does not depend on legacy ata(4) subsystem and can be used with any disk drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4) with `options ATA_CAM`). To make code more readable and extensible, this implementation follows modular design, including core part and two sets of modules, implementing support for different metadata formats and RAID levels. Support for such popular metadata formats is now implemented: Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage. Such RAID levels are now supported: RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT. For all of these RAID levels and metadata formats this class supports full cycle of volume operations: reading, writing, creation, deletion, disk removal and insertion, rebuilding, dirty shutdown detection and resynchronization, bad sector recovery, faulty disks tracking, hot-spare disks. For Intel and Promise formats there is support multiple volumes per disk set. Look graid(8) manual page for additional details. Co-authored by: imp Sponsored by: Cisco Systems, Inc. and iXsystems, Inc. ===================================================================== -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sat Jun 18 09:55:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B42F31065670 for ; Sat, 18 Jun 2011 09:55:14 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id E296E8FC2E for ; Sat, 18 Jun 2011 09:55:05 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC46FD5.dip.t-dialin.net [79.196.111.213]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 4976A84400D; Sat, 18 Jun 2011 11:54:50 +0200 (CEST) Received: from unknown (IO.Leidinger.net [192.168.2.110]) by outgoing.leidinger.net (Postfix) with ESMTP id 875D226F8; Sat, 18 Jun 2011 11:54:47 +0200 (CEST) Date: Sat, 18 Jun 2011 11:54:48 +0200 From: Alexander Leidinger To: Rick Macklem Message-ID: <20110618115448.00004b7f@unknown> In-Reply-To: <183637159.722200.1308325284347.JavaMail.root@erie.cs.uoguelph.ca> References: <20110617172545.175366za32r42gvt@webmail.leidinger.net> <183637159.722200.1308325284347.JavaMail.root@erie.cs.uoguelph.ca> X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 4976A84400D.A0910 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1, required 6, autolearn=disabled, ALL_TRUSTED -1.00) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1308995693.53382@BaNU+dTDy8/SKGrmIBZN5Q X-EBL-Spam-Status: No Cc: FreeBSD FS Subject: Re: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jun 2011 09:55:14 -0000 On Fri, 17 Jun 2011 11:41:24 -0400 (EDT) Rick Macklem wrote: > > The workarounds you propose contradict everything people are used > > to. They are not easy or you need to care what you put in the parent > > directories of the one you want to export. It basically means that > > you can only use NFSv4 on newly setup systems, upgraded or migrated > > ones look out of the question (yes, I'm over-simplificating a bit). > > > > I really hope someone can come up with a fix for this, else it would > > mean I would not use NFSv4 anywhere. > > > Ok, can I assume that's a vote for "leave the hack in"? If the pain to let the hack in is not too big: yes, please let it in. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Sat Jun 18 14:26:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B83B61065670; Sat, 18 Jun 2011 14:26:22 +0000 (UTC) (envelope-from stephane.lapie@darkbsd.org) Received: from quasar.darkbsd.org (shinigami.darkbsd.org [82.227.96.182]) by mx1.freebsd.org (Postfix) with ESMTP id D16338FC08; Sat, 18 Jun 2011 14:26:21 +0000 (UTC) Received: from quasar.darkbsd.org (localhost [127.0.0.1]) by quasar.darkbsd.org (Postfix) with ESMTP id 4F98A6FF5; Sat, 18 Jun 2011 16:07:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=darkbsd.org; h=message-id :date:from:mime-version:to:subject:content-type; s=selector1; bh=qxMQ+Z1joU4CsnjA1GngSTtHhqk=; b=mrQ0I/2cbCztGatgJll5G1XS6CSW 7wDz7XQpYdmMD9i6k3FeKNcYNm1XZGoVTTIG+4hQvhBEPaHPm1HfZ5Y2r3t2luph YkXZdwr7CMsMHyT1O/h3pFejCWP/6mWRjYrrlmOkvQpVO6WkyxRnHP8+SmWTj5L3 3s8gkbGjQoerC+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=darkbsd.org; h=message-id :date:from:mime-version:to:subject:content-type; q=dns; s= selector1; b=qS0+daBzZu7nx51fQowh3IC//wFYOuxpZTM9UVh87WQiu001Y2W eYDQbUnFP4VbjNrWQPSEs7D6JOfrDT6vPDnT/AwcJJxJigAqBKhTeEljSfKzizDg Bd+m9TnCyaVmKf9wczgERte81q2U/nLQOlB40kwSuHGr1joMxF0Rut8A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=darkbsd.org; h= content-type:content-type:subject:subject:mime-version :user-agent:from:from:date:date:message-id:received:received; s= selector1; t=1308406071; bh=HRSrx71OBoInPYqWhZzR/iCvG3HZKXjGv8EX 0GijCXc=; b=xX5WbHzQrh/GYCJ5WkYe/kbhdcNqFou5DoE6V1TSOjz3hPjO0k89 ilQemy/FMDtQtdhwVzkDNrJOySUfWClce883yye1EM2xDniXV1cKZpUwbJ+53+AJ Yi4Vx8JxEWWGk3jRJwEOkeZQrBY3oppTc4js+J/F40LePlQSKbj9I7I= Received: from quasar.darkbsd.org ([127.0.0.1]) by quasar.darkbsd.org (quasar.darkbsd.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dRF6YtYnkAJN; Sat, 18 Jun 2011 16:07:51 +0200 (CEST) Received: from [192.168.3.42] (archer.yomi.darkbsd.org [192.168.3.42]) (Authenticated sender: darksoul) by quasar.darkbsd.org (Postfix) with ESMTPSA id 1984C6FEE; Sat, 18 Jun 2011 16:07:47 +0200 (CEST) Message-ID: <4DFCB12A.6030805@darkbsd.org> Date: Sat, 18 Jun 2011 23:07:38 +0900 From: Stephane LAPIE User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-hardware@freebsd.org, freebsd-drivers@freebsd.org, freebsd-fs@freebsd.org X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig8BE633BB83E59ACA09FD0D2A" Cc: Subject: Problem with a LSILogic SAS/SATA adapter on 8.2-STABLE/ZFSv28 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jun 2011 14:26:22 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8BE633BB83E59ACA09FD0D2A Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello list, I have a problem with my 8.2-STABLE/ZFSv28 server. I am currently upgrading my disks from 1.5TB Seagate drives to 2TB Seagate drives, and therefore replacing devices within ZFS. (I have activated deduplication on a few file systems, for the record) I think this is more related to a hardware problem (flaky memory ? flaky controller/driver maybe ?), but I would appreciate any input. I experienced several kernel panics, all of which seem to point at mpt0 mis-handling interrupts : www.darkbsd.org/~darksoul/kernel-panic-mpt1.txt (no target cmd ptrs) www.darkbsd.org/~darksoul/kernel-panic-mpt2.txt (mpt_intr index =3D=3D ..= =2E) www.darkbsd.org/~darksoul/kernel-panic-mpt3.txt (NMI in kernel mode) www.darkbsd.org/~darksoul/kernel-panic-mpt4.txt (LAN CONTEXT REPLY) www.darkbsd.org/~darksoul/kernel-panic-mpt5.txt (LAN CONTEXT REPLY) www.darkbsd.org/~darksoul/kernel-panic-mpt6.txt (LAN CONTEXT REPLY) www.darkbsd.org/~darksoul/kernel-panic-mpt7.txt (LAN CONTEXT REPLY) I would appeciate any pointers to what on earth "LAN CONTEXT REPLY" means for an LSI controller (using driver mpt(4)), as I have no idea, and the source was not really helpful. The error message about an NMI and RAM parity error is what is scaring me the most here, and points me in the direction of flaky memory. This is a personal machine, so I can add debug options and try stuff if it can help figure out what is going on. Also, any critical data is replicated, backed up and accounted for. Thanks in advance for your time. Here is a zpool list and a zpool status : NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT prana 22.7T 17.4T 5.29T 76% 1.18x DEGRADED - pool: prana state: DEGRADED status: One or more devices is currently being resilvered. The pool will= continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Jun 18 12:43:02 2011 13.8T scanned out of 17.3T at 236/s, (scan is slow, no estimated time= ) 899G resilvered, 79.38% done config: NAME STATE READ WRITE CKSUM prana DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 da3 OFFLINE 0 0 0 ad14 ONLINE 0 0 0 ad12 ONLINE 0 0 0 da1 ONLINE 0 0 0 da0 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 ad26 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 da6/old OFFLINE 0 0 0 da6 ONLINE 0 0 0 (resilvering) da4 ONLINE 0 0 0 da7 ONLINE 0 0 0 da5 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 ad28 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad16 ONLINE 0 0 0 ad18 ONLINE 0 0 0 cache gptid/d9c047d5-c1a7-11df-b584-000e0c707d1e ONLINE 0 0 0= gptid/da695e56-c1a7-11df-b584-000e0c707d1e ONLINE 0 0 0= spares da8 AVAIL da9 AVAIL Here is my dmesg trace : Copyright (c) 1992-2011 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.2-STABLE #1: Thu Jun 16 23:22:47 JST 2011 darksoul@eirei-no-za.yomi.darkbsd.org:/usr/storage/tech/eirei-no-za.yomi.= darkbsd.org/usr/obj/usr/storage/tech/eirei-no-za.yomi.darkbsd.org/usr/src= /sys/DARK-2011KERN amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz (2666.68-MHz K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Family =3D 6 Model =3D 17 Stepping =3D 10 Features=3D0xbfebfbff Features2=3D0x408e3fd AMD Features=3D0x20100800 AMD Features2=3D0x1 TSC: P-state invariant real memory =3D 8589934592 (8192 MB) avail memory =3D 8254509056 (7872 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard kbd1 at kbdmux0 ichwd module loaded iscsi: version 2.2.4.2 cryptosoft0: on motherboard acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 6.0 on pci0 pci3: on pcib1 pcib2: at device 0.0 on pci3 pci4: on pcib2 em0: port 0x2000-0x203f mem 0xdf980000-0xdf99ffff,0xdf900000-0xdf93ffff irq 16 at device 4.0 on pci4 em0: [FILTER] em0: Ethernet address: 00:0e:0c:70:7d:1e em1: port 0x2040-0x207f mem 0xdf9a0000-0xdf9bffff,0xdf940000-0xdf97ffff irq 17 at device 4.1 on pci4 em1: [FILTER] em1: Ethernet address: 00:0e:0c:70:7d:1f pcib3: at device 0.2 on pci3 pci5: on pcib3 em2: port 0x1820-0x183f mem 0xdfb00000-0xdfb1ffff,0xdfb20000-0xdfb20fff irq 16 at device 25.0 on pci0= em2: Using an MSI interrupt em2: [FILTER] em2: Ethernet address: 00:30:48:de:84:88 uhci0: port 0x1840-0x185f irq 16 at device 26.0 on pci0 uhci0: [ITHREAD] usbus0: on uhci0 uhci1: port 0x1860-0x187f irq 17 at device 26.1 on pci0 uhci1: [ITHREAD] usbus1: on uhci1 uhci2: port 0x1880-0x189f irq 18 at device 26.2 on pci0 uhci2: [ITHREAD] usbus2: on uhci2 ehci0: mem 0xdfb22800-0xdfb22bff irq 18 at device 26.7 on pci0 ehci0: [ITHREAD] usbus3: EHCI version 1.0 usbus3: on ehci0 pcib4: irq 16 at device 28.0 on pci0 pci6: on pcib4 pcib5: at device 0.0 on pci6 pci7: on pcib5 mpt0: port 0x3000-0x30ff mem 0xdf310000-0xdf313fff,0xdf300000-0xdf30ffff irq 24 at device 1.0 on pci7 mpt0: [ITHREAD] mpt0: MPI Version=3D1.5.12.0 mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) mpt0: 0 Active Volumes (2 Max) mpt0: 0 Hidden Drive Members (10 Max) atapci0: port 0x3400-0x34ff mem 0xdf200000-0xdf2fffff irq 28 at device 7.0 on pci7 atapci0: [ITHREAD] ata2: on atapci0 ata2: [ITHREAD] ata3: on atapci0 ata3: [ITHREAD] ata4: on atapci0 ata4: [ITHREAD] ata5: on atapci0 ata5: [ITHREAD] ata6: on atapci0 ata6: [ITHREAD] ata7: on atapci0 ata7: [ITHREAD] ata8: on atapci0 ata8: [ITHREAD] ata9: on atapci0 ata9: [ITHREAD] uhci3: port 0x18a0-0x18bf irq 23 at device 29.0 on pci0 uhci3: [ITHREAD] usbus4: on uhci3 uhci4: port 0x18c0-0x18df irq 22 at device 29.1 on pci0 uhci4: [ITHREAD] usbus5: on uhci4 uhci5: port 0x18e0-0x18ff irq 18 at device 29.2 on pci0 uhci5: [ITHREAD] usbus6: on uhci5 ehci1: mem 0xdfb22c00-0xdfb22fff irq 23 at device 29.7 on pci0 ehci1: [ITHREAD] usbus7: EHCI version 1.0 usbus7: on ehci1 pcib6: at device 30.0 on pci0 pci17: on pcib6 em3: port 0x4080-0x40bf mem 0xdfa00000-0xdfa1ffff irq 20 at device 0.0 on pci17 em3: [FILTER] em3: Ethernet address: 00:07:e9:0f:a3:80 em4: port 0x40c0-0x40ff mem 0xdfa20000-0xdfa3ffff irq 21 at device 0.1 on pci17 em4: [FILTER] em4: Ethernet address: 00:07:e9:0f:a3:81 vgapci0: port 0x4000-0x407f mem 0xde800000-0xdeffffff,0xdfa40000-0xdfa4ffff at device 1.0 on pci17 fwohci0: mem 0xdfa54000-0xdfa547ff,0xdfa50000-0xdfa53fff irq 22 at device 3.0 on pci17= fwohci0: [ITHREAD] fwohci0: OHCI version 1.10 (ROM=3D1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:30:48:00:00:20:42:f6 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:30:48:20:42:f6 fwe0: Ethernet address: 02:30:48:20:42:f6 fwip0: on firewire0 fwip0: Firewire address: 00:30:48:00:00:20:42:f6 @ 0xfffe00000000, S400, maxrec 2048 dcons_crom0: on firewire0 dcons_crom0: bus_addr 0x80ab60 fwohci0: Initiate bus reset fwohci0: fwohci_intr_core: BUS reset fwohci0: fwohci_intr_core: node_id=3D0x00000000, SelfID Count=3D1, CYCLEMASTER mode atapci1: port 0x4420-0x4427,0x4414-0x4417,0x4418-0x441f,0x4410-0x4413,0x4400-0x440f irq 23 at device 4.0 on pci17 atapci1: [ITHREAD] ata10: on atapci1 ata10: [ITHREAD] isab0: at device 31.0 on pci0 isa0: on isab0 atapci2: port 0x1c70-0x1c77,0x1c64-0x1c67,0x1c68-0x1c6f,0x1c60-0x1c63,0x1c00-0x1c1f mem 0xdfb22000-0xdfb227ff irq 17 at device 31.2 on pci0 atapci2: [ITHREAD] atapci2: AHCI called from vendor specific driver atapci2: AHCI v1.20 controller with 6 3Gbps ports, PM supported ata11: on atapci2 ata11: [ITHREAD] ata12: on atapci2 ata12: [ITHREAD] ata13: on atapci2 ata13: [ITHREAD] ata14: on atapci2 ata14: [ITHREAD] ata15: on atapci2 ata15: [ITHREAD] ata16: on atapci2 ata16: [ITHREAD] ichsmb0: port 0x1100-0x111f mem 0xdfb23000-0xdfb230ff irq 17 at device 31.3 on pci0 ichsmb0: [ITHREAD] smbus0: on ichsmb0 smb0: on smbus0 pci0: at device 31.6 (no driver attached) acpi_button0: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] uart0: console (115200,n,8,1) uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 uart1: [FILTER] ichwd0: on isa0 ichwd0: Intel ICH9R watchdog timer (ICH9 or equivalent) orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0= coretemp0: on cpu0 est0: on cpu0 p4tcc0: on cpu0 coretemp1: on cpu1 est1: on cpu1 p4tcc1: on cpu1 coretemp2: on cpu2 est2: on cpu2 p4tcc2: on cpu2 coretemp3: on cpu3 est3: on cpu3 p4tcc3: on cpu3 ZFS filesystem version 5 ZFS storage pool version 28 Timecounters tick every 1.000 msec firewire0: 1 nodes, maxhop <=3D 0 cable IRM irm(0) (me) firewire0: bus manager 0 usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 12Mbps Full Speed USB v1.0 usbus7: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 ugen5.1: at usbus5 uhub5: on usbus5 ugen6.1: at usbus6 uhub6: on usbus6 ugen7.1: at usbus7 uhub7: on usbus7 ad6: 1907729MB at ata3-master UDMA100 SATA 3G= b/s ad8: 1907729MB at ata4-master UDMA100 SATA 3G= b/s ad12: 1430799MB at ata6-master UDMA100 SATA 3Gb/s ad14: 1907729MB at ata7-master UDMA100 SATA 3Gb/s uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered uhub6: 2 ports with 2 removable, self powered ad16: 1907729MB at ata8-master UDMA100 SATA 3Gb/s ad18: 1907729MB at ata9-master UDMA100 SATA 3Gb/s ata10: DMA limited to UDMA33, controller found non-ATA66 cable ad20: 3823MB at ata10-master UDMA33 ad21: 61136MB at ata10-slave UDMA133 ad26: 1907729MB at ata13-master UDMA100 SATA 3Gb/s ad28: 1907729MB at ata14-master UDMA100 SATA 3Gb/s uhub3: 6 ports with 6 removable, self powered uhub7: 6 ports with 6 removable, self powered ugen7.2: at usbus7 umass0: on usbus7 umass0: SCSI over Bulk-Only; quirks =3D 0x0000 ugen3.2: at usbus3 da0 at mpt0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da1 at mpt0 bus 0 scbus0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da2 at mpt0 bus 0 scbus0 target 2 lun 0 da2: Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing enabled da2: 61136MB (125206528 512 byte sectors: 255H 63S/T 7793C) da3 at mpt0 bus 0 scbus0 target 3 lun 0 da3: Fixed Direct Access SCSI-5 device da3: 300.000MB/s transfers da3: Command Queueing enabled da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da4 at mpt0 bus 0 scbus0 target 4 lun 0 da4: Fixed Direct Access SCSI-5 device da4: 300.000MB/s transfers da4: Command Queueing enabled da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da5 at mpt0 bus 0 scbus0 target 5 lun 0 da5: Fixed Direct Access SCSI-5 device da5: 300.000MB/s transfers da5: Command Queueing enabled da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da6 at mpt0 bus 0 scbus0 target 6 lun 0 da6: Fixed Direct Access SCSI-5 device da6: 300.000MB/s transfers da6: Command Queueing enabled da6: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da7 at mpt0 bus 0 scbus0 target 7 lun 0 da7: Fixed Direct Access SCSI-5 device da7: 300.000MB/s transfers da7: Command Queueing enabled da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! Root mount waiting for: usbus7 umass0:2:0:-1: Attached to scbus2da8 at umass-sim0 bus 0 scbus2 target 0 lun 0 da8: Fixed Direct Access SCSI-4 device da8: 40.000MB/s transfers da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) ugen7.3: at usbus7 uhub8: on usbus7 Root mount waiting for: usbus7 uhub8: 4 ports with 4 removable, self powered ugen7.4: at usbus7 umass1: on usbus7 umass1: SCSI over Bulk-Only; quirks =3D 0x0000 Root mount waiting for: usbus7 umass1:3:1:-1: Attached to scbus3da9 at umass-sim1 bus 1 scbus3 target 0 lun 0 da9: Fixed Direct Access SCSI-4 device da9: 40.000MB/s transfers da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) ugen7.5: at usbus7 uhub9: on usbus7 Root mount waiting for: usbus7 uhub9: 4 ports with 4 removable, self powered Trying to mount root from zfs:prana --=20 Stephane LAPIE, EPITA SRS, Promo 2005 "Even when they have digital readouts, I can't understand them." --MegaTokyo --------------enig8BE633BB83E59ACA09FD0D2A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk38sS4ACgkQ24Ql8u6TF2MdTQCfXGnImFL+4qSWHbV2SW6Qk0DT DkcAniV5OC8yVxhigvYA/4Cpb+UP1eNk =6Q2i -----END PGP SIGNATURE----- --------------enig8BE633BB83E59ACA09FD0D2A-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 18 14:45:39 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 929831065672 for ; Sat, 18 Jun 2011 14:45:39 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.emeryville.ca.mail.comcast.net (qmta10.emeryville.ca.mail.comcast.net [76.96.30.17]) by mx1.freebsd.org (Postfix) with ESMTP id 7AC868FC1A for ; Sat, 18 Jun 2011 14:45:39 +0000 (UTC) Received: from omta24.emeryville.ca.mail.comcast.net ([76.96.30.92]) by qmta10.emeryville.ca.mail.comcast.net with comcast id xSNA1g0051zF43QAASldPt; Sat, 18 Jun 2011 14:45:37 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta24.emeryville.ca.mail.comcast.net with comcast id xSl51g00u1t3BNj8kSl6hX; Sat, 18 Jun 2011 14:45:07 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AFC32102C36; Sat, 18 Jun 2011 07:45:36 -0700 (PDT) Date: Sat, 18 Jun 2011 07:45:36 -0700 From: Jeremy Chadwick To: Stephane LAPIE Message-ID: <20110618144536.GA15627@icarus.home.lan> References: <4DFCB12A.6030805@darkbsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DFCB12A.6030805@darkbsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, freebsd-drivers@freebsd.org, freebsd-hardware@freebsd.org Subject: Re: Problem with a LSILogic SAS/SATA adapter on 8.2-STABLE/ZFSv28 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jun 2011 14:45:39 -0000 On Sat, Jun 18, 2011 at 11:07:38PM +0900, Stephane LAPIE wrote: > I have a problem with my 8.2-STABLE/ZFSv28 server. I am currently > upgrading my disks from 1.5TB Seagate drives to 2TB Seagate drives, and > therefore replacing devices within ZFS. (I have activated deduplication > on a few file systems, for the record) > > I think this is more related to a hardware problem (flaky memory ? flaky > controller/driver maybe ?), but I would appreciate any input. > > I experienced several kernel panics, all of which seem to point at mpt0 > mis-handling interrupts : > www.darkbsd.org/~darksoul/kernel-panic-mpt1.txt (no target cmd ptrs) > www.darkbsd.org/~darksoul/kernel-panic-mpt2.txt (mpt_intr index == ...) > www.darkbsd.org/~darksoul/kernel-panic-mpt3.txt (NMI in kernel mode) > www.darkbsd.org/~darksoul/kernel-panic-mpt4.txt (LAN CONTEXT REPLY) > www.darkbsd.org/~darksoul/kernel-panic-mpt5.txt (LAN CONTEXT REPLY) > www.darkbsd.org/~darksoul/kernel-panic-mpt6.txt (LAN CONTEXT REPLY) > www.darkbsd.org/~darksoul/kernel-panic-mpt7.txt (LAN CONTEXT REPLY) > > I would appeciate any pointers to what on earth "LAN CONTEXT REPLY" > means for an LSI controller (using driver mpt(4)), as I have no idea, > and the source was not really helpful. > > The error message about an NMI and RAM parity error is what is scaring > me the most here, and points me in the direction of flaky memory. > > This is a personal machine, so I can add debug options and try stuff if > it can help figure out what is going on. Also, any critical data is > replicated, backed up and accounted for. For readers, the NMI and RAM parity error message in question is shown here: http://www.darkbsd.org/~darksoul/kernel-panic-mpt2.txt But is difficult to decode due to the well-established problem with the FreeBSD kernel interspersing text output. (I imagine this gets worse the more cores you have on your system, but that's not relevant to this discussion) Anyway, to expand on the "RAM parity error" and NMI message: this information I'm going to give you isn't specific to the LSI controller; it's a general piece of information. I've talked about this in the past. Please read it and focus on the SERR/PERR and NMI details: http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010938.html If you want to rule out actual system RAM issues, I would recommend running memtest86 for about 30 minutes, and then memtest86+ for the same amount of time. This might sound crazy ("why can't I just run one?!"), but you need to review the ChangeLog for memtest86 to see why. Their support for detecting corrected ECC errors was removed with 4.0, but in 4.0 they added multi-CPU support (which is good to have in this situation), while memtest86 may still have support for ECC. Neither of these utilities are as excellent as a hardware RAM tester (which does cool things like sending extreme amounts of voltage through each DRAM module, looks for soft and hard errors, etc.), but those are expensive. Usually system memory problems will show up in memtest86/86+ pretty quickly though. All that said: it may be possible that the NMIs you're seeing aren't being induced by system RAM issues at all, but somehow are being generated or caused by the LSI controller. I wasn't under the impression that a PCIe MSI and/or MSI-X generated an NMI, but I could be completely wrong. You may want to try the memtest86/86+ tests with and without the LSI controller plugged into the system to see if there's any difference as well. So that's another hour of testing. Anyway, hope this helps in some regard. P.S. -- In the future, try to avoid cross-posting. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sat Jun 18 22:18:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B1DF1065677 for ; Sat, 18 Jun 2011 22:18:46 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D96FA8FC1B for ; Sat, 18 Jun 2011 22:18:45 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAP4i/U2DaFvO/2dsb2JhbABShEmjDYhzq0SPdYErg3WBCgSRXpAd X-IronPort-AV: E=Sophos;i="4.65,387,1304308800"; d="scan'208";a="128325995" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 18 Jun 2011 18:18:44 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D83CDB3F07; Sat, 18 Jun 2011 18:18:44 -0400 (EDT) Date: Sat, 18 Jun 2011 18:18:44 -0400 (EDT) From: Rick Macklem To: Alexander Leidinger Message-ID: <1829558003.762155.1308435524848.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110618115448.00004b7f@unknown> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: FreeBSD FS Subject: Re: RFC: don't allow any access to unexported mounts for NFSv4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jun 2011 22:18:46 -0000 > On Fri, 17 Jun 2011 11:41:24 -0400 (EDT) Rick Macklem > wrote: > > > > The workarounds you propose contradict everything people are used > > > to. They are not easy or you need to care what you put in the > > > parent > > > directories of the one you want to export. It basically means that > > > you can only use NFSv4 on newly setup systems, upgraded or > > > migrated > > > ones look out of the question (yes, I'm over-simplificating a > > > bit). > > > > > > I really hope someone can come up with a fix for this, else it > > > would > > > mean I would not use NFSv4 anywhere. > > > > > Ok, can I assume that's a vote for "leave the hack in"? > > If the pain to let the hack in is not too big: yes, please let it in. > No pain at all. I just wanted to check to see what people thought of it. (I can easily add the Access case for Linux mounts and also a small patch that disallows lookups of regular files. With this, all clients can do is lookup dirs and get their attributes and access info. Neither Read nor Readdir are allowed, so clients must know/guess names.) Thanks for the input, rick