From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 00:15:54 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 52A0A6D1 for ; Sun, 9 Jun 2013 00:15:54 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id CE40C1FA0 for ; Sun, 9 Jun 2013 00:15:53 +0000 (UTC) Received: from mfilter16-d.gandi.net (mfilter16-d.gandi.net [217.70.178.144]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 67E4541C056; Sun, 9 Jun 2013 02:15:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter16-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter16-d.gandi.net (mfilter16-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id q9rpAwHWpxl3; Sun, 9 Jun 2013 02:15:34 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 2F0BD41C054; Sun, 9 Jun 2013 02:15:34 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 548CB73A1C; Sat, 8 Jun 2013 17:15:32 -0700 (PDT) Date: Sat, 8 Jun 2013 17:15:32 -0700 From: Jeremy Chadwick To: Steven Hartland Subject: Re: Changing the default for ZFS atime to off? Message-ID: <20130609001532.GA21540@icarus.home.lan> References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> <01719722FD8A41B4A4366611972A703A@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <01719722FD8A41B4A4366611972A703A@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 00:15:54 -0000 On Sun, Jun 09, 2013 at 12:34:29AM +0100, Steven Hartland wrote: > ----- Original Message ----- From: "Jeremy Chadwick" > > To: "Steven Hartland" > Cc: > Sent: Saturday, June 08, 2013 10:33 PM > Subject: Re: Changing the default for ZFS atime to off? > > > >On Sat, Jun 08, 2013 at 07:54:04PM +0100, Steven Hartland wrote: > >>One of the first changes we make here when installing machines > >>here to changing atime=off on all ZFS pool roots. > >> > >>I know there are a few apps which can rely on atime updates > >>such as qmail and possibly postfix, but those seem like special > >>cases for which admins should enable atime instead of the other > >>way round. > >> > >>This is going to of particular interest for flash based storage > >>which should avoid unnessacary writes to reduce wear, but it will > >>also help improve performance in general. > >> > >>So what do people think is it worth considering changing the > >>default from atime=on to atime=off moving forward? > >> > >>If so what about UFS, same change? > > > >I **strongly** oppose this change, for one key reason: the classic > >Berkeley UNIX mail spool format (known as "mbox"), which is still > >predominantly used on most UNIX systems today. > > > >Mail clients which read mbox files require a combination of atime and > >mtime to determine if new mail has arrived within the mailbox. If > >mtime > atime, then there's new mail. Not all mail clients support > >alternate methods of detection (for example mutt has check_mbox_size, > >which has had bugs/problems in the past (Google check_mbox_size), > >and is fallible in other ways). > .. > > To clarify when I say "by default" this only effect newly created > pools / volumes, it would not effect any existing volumes and hence > couldn't break existing installs. > > As I mentioned there are apps, mainly mail focused ones, which rely > on on atime, but thats easy to keep working by ensuring these are > stored on volumes which do have atime=on. The problem is that your proposed change (to set atime=off as the default) means the administrator: 1. Has to be aware that the default is now atime=off going forward, and thus, 2. Must manually set atime=on on filesystems where it matters, which may also mean creating a separate filesystem just for certain purposes/tasks (which may not be possible with UFS after-the-fact). The reality of #1, I'm sorry to say, is that barring some kind of mass announcement on every single FreeBSD mailing list (I don't mean just -announce, I mean EVERY LIST) to inform people of this change, as well as some gigantic 72pt font text on www.freebsd.org telling people, most people are not going to know about it. I know that reality doesn't work in your favour, but it's how things are. A single line in the Release Notes is going to be overlooked. I cannot even begin to cover all the situations/cases of #2, so I'll just do a brain dump as I think: i) ZFS: You might think this is as easy as creating a separate filesystem that's for /var/mail -- it is not that simple. Many people have their mail delivered to mboxes within $HOME, i.e. ~user/Mail, and /var/mail never gets used. It worsens when you consider people are being insane with ZFS filesystems, such as creating a separate filesystem for every single user on the system. ii) With UFS, you might think it's as easy as removing noatime from /etc/fstab for /var, but it isn't -- same situation as (i). iii) There is the situation with UFS and bsdinstall where you can choose the "quick and easy" partitioning/filesystem setu results in one big / and that's all. Now the admin has to remove noatime from /etc/fstab and basically loses any benefit noatime provided per your proposal. iv) It is very common for setups to have two separate places for mail storage, i.e. the default is /var/mail/username, but users with a .forward and/or .procmailrc may be siphoning mail to $HOME/Mail/folder instead. So now you have two filesystems where atime needs to be enabled. v) Non-mail-related stuff, meaning there may actually be users and administrators who rely upon access times to indicate something. None of these touche base on what Bruce Evans stated too: that atime=on by default is a requirement to be POSIX-compliant. That's also confirmed here at Wikipedia WRT stat(2) (which also mentions some other software that relies on atime too): http://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime > The messaging and changes to installers which support ZFS root > installs, such as mfsbsd, would need to be included in this but > I don't see that as a blocker. See above -- I think you are assuming mail always gets stored on one filesystem, which quite often not the case. > I suggesting this now as it seems like its time to consider that > the vast majority of systems don't need this option for all volumes > and the performance and reliability of systems are in question if > we don't consider it. My personal feeling is that this is extremely hasty -- do we have any idea how much software relies on atime? Because I certainly don't. Sorry for sounding rude (I don't mean to be, I just can't be bothered to phrase it differently), but: were you yourself even aware that atime was relied upon/used for classic UNIX mailboxes? I get the impression you weren't, which just strengthens my point. For example, I use atime everywhere, simply because I do not know what might break/stop working reliably if atime was disabled on some filesystems. I do not know the internals of every single daemon and program on a system (does anyone?), so I must take the stance of choosing stability/reliability. All said and done: I do appreciate having this discussion, particularly publicly on a list. Too many "key changes" in FreeBSD in the past few years have been results of closed-door meetings of sorts (private mail or in-person *con meetings), so the fact this is public is good. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 00:49:01 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 82AC2DB9 for ; Sun, 9 Jun 2013 00:49:01 +0000 (UTC) (envelope-from prvs=18721298a7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 26ECD11CC for ; Sun, 9 Jun 2013 00:49:00 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004223432.msg for ; Sun, 09 Jun 2013 01:48:59 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 09 Jun 2013 01:48:59 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=18721298a7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: fs@freebsd.org Message-ID: <459E2FCADB4E40079066E4ABDBE47AFE@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> <01719722FD8A41B4A4366611972A703A@multiplay.co.uk> <20130609001532.GA21540@icarus.home.lan> Subject: Re: Changing the default for ZFS atime to off? Date: Sun, 9 Jun 2013 01:48:57 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 00:49:01 -0000 ----- Original Message ----- From: "Jeremy Chadwick" >> To clarify when I say "by default" this only effect newly created >> pools / volumes, it would not effect any existing volumes and hence >> couldn't break existing installs. >> >> As I mentioned there are apps, mainly mail focused ones, which rely >> on on atime, but thats easy to keep working by ensuring these are >> stored on volumes which do have atime=on. > > The problem is that your proposed change (to set atime=off as the > default) means the administrator: > > 1. Has to be aware that the default is now atime=off going forward, > and thus, > > 2. Must manually set atime=on on filesystems where it matters, which may > also mean creating a separate filesystem just for certain > purposes/tasks (which may not be possible with UFS after-the-fact). > > The reality of #1, I'm sorry to say, is that barring some kind of mass > announcement on every single FreeBSD mailing list (I don't mean just > -announce, I mean EVERY LIST) to inform people of this change, as well > as some gigantic 72pt font text on www.freebsd.org telling people, most > people are not going to know about it. I know that reality doesn't work > in your favour, but it's how things are. A single line in the Release > Notes is going to be overlooked. > > I cannot even begin to cover all the situations/cases of #2, so I'll > just do a brain dump as I think: > > i) ZFS: You might think this is as easy as creating a separate > filesystem that's for /var/mail -- it is not that simple. Many people > have their mail delivered to mboxes within $HOME, i.e. ~user/Mail, and > /var/mail never gets used. It worsens when you consider people are > being insane with ZFS filesystems, such as creating a separate > filesystem for every single user on the system. > > ii) With UFS, you might think it's as easy as removing noatime from > /etc/fstab for /var, but it isn't -- same situation as (i). > > iii) There is the situation with UFS and bsdinstall where you can choose > the "quick and easy" partitioning/filesystem setu results in one big / > and that's all. Now the admin has to remove noatime from /etc/fstab and > basically loses any benefit noatime provided per your proposal. The initial question was for ZFS, with UFS being secondary, but yes UFS isn't as easy as UFS. > iv) It is very common for setups to have two separate places for mail > storage, i.e. the default is /var/mail/username, but users with a > .forward and/or .procmailrc may be siphoning mail to $HOME/Mail/folder > instead. So now you have two filesystems where atime needs to be > enabled. Could that not be covered by: /var /home for the common case at least? > v) Non-mail-related stuff, meaning there may actually be users and > administrators who rely upon access times to indicate something. > > None of these touche base on what Bruce Evans stated too: that atime=on > by default is a requirement to be POSIX-compliant. That's also > confirmed here at Wikipedia WRT stat(2) (which also mentions some other > software that relies on atime too): > > http://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime So yes others think its a less than stellar idea ;-) >> The messaging and changes to installers which support ZFS root >> installs, such as mfsbsd, would need to be included in this but >> I don't see that as a blocker. > > See above -- I think you are assuming mail always gets stored on one > filesystem, which quite often not the case. Its still seems simple to fix, see above. >> I suggesting this now as it seems like its time to consider that >> the vast majority of systems don't need this option for all volumes >> and the performance and reliability of systems are in question if >> we don't consider it. > > My personal feeling is that this is extremely hasty -- do we have any > idea how much software relies on atime? Because I certainly don't. Hasty no, just opening the idea up for discussion ;-) > Sorry for sounding rude (I don't mean to be, I just can't be bothered to > phrase it differently), but: were you yourself even aware that atime was > relied upon/used for classic UNIX mailboxes? I get the impression you > weren't, which just strengthens my point. Yes I am aware, which is why I mentioned mail in my original post. > For example, I use atime everywhere, simply because I do not know what > might break/stop working reliably if atime was disabled on some > filesystems. I do not know the internals of every single daemon and > program on a system (does anyone?), so I must take the stance of > choosing stability/reliability. I did already mention, we set atime=off on everything and have never had an issue, there's been similar mentions on the illumos list too. Now that doesn't mean its suitable for everthing, mail has already been mentioned, but thats still seems like a small set of use cases where its required. I guess where I'm coming from is making better for the vast majority. I believe there's no point in configuring for a rare case by default when it will make the much more common case worse. > All said and done: I do appreciate having this discussion, particularly > publicly on a list. Too many "key changes" in FreeBSD in the past few > years have been results of closed-door meetings of sorts (private mail > or in-person *con meetings), so the fact this is public is good. Everyone has their different uses of any OS, different experience etc, so things like this need open discussion IMO. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 01:04:47 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2EC1B1C7 for ; Sun, 9 Jun 2013 01:04:47 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (mail.distal.com [IPv6:2001:470:e24c:200::ae25]) by mx1.freebsd.org (Postfix) with ESMTP id DB8A41254 for ; Sun, 9 Jun 2013 01:04:46 +0000 (UTC) Received: from magrathea.distal.com (magrathea.distal.com [IPv6:2001:470:e24c:200:ea06:88ff:feca:960e]) (authenticated bits=0) by mail.distal.com (8.14.3/8.14.3) with ESMTP id r5914iip011649 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 8 Jun 2013 21:04:45 -0400 (EDT) Subject: Re: Changing the default for ZFS atime to off? Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Content-Type: text/plain; charset=us-ascii From: Chris Ross X-Priority: 3 In-Reply-To: <459E2FCADB4E40079066E4ABDBE47AFE@multiplay.co.uk> Date: Sat, 8 Jun 2013 21:04:44 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> <01719722FD8A41B4A4366611972A703A@multiplay.co.uk> <20130609001532.GA21540@icarus.home.lan> <459E2FCADB4E40079066E4ABDBE47AFE@multiplay.co.uk> To: "Steven Hartland" X-Mailer: Apple Mail (2.1503) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.2 (mail.distal.com [IPv6:2001:470:e24c:200::ae25]); Sat, 08 Jun 2013 21:04:45 -0400 (EDT) Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 01:04:47 -0000 I agree strongly with Jeremy's general opinion. But, am far less = established in the community, so only wanted to make a couple of small points. On Jun 8, 2013, at 20:48 , "Steven Hartland" = wrote: > I guess where I'm coming from is making better for the vast majority. >=20 > I believe there's no point in configuring for a rare case by default > when it will make the much more common case worse. I think the point being made, and certainly in my mind reading this = thread, is that you're considering the "rare" case to be more rare than you = factually know it to be, and more importantly (IMO), you're considering "worse" on something that I consider a very small issue. I understand the reasons = we choose to turn off atime (by adding it to the kernel, at the time, in = 1994) at UUNET for the USENET filesystems. It was just too much activity. But, = for a less than 110% active system, and given the relatively small number of = things that are accessed far more often than they're updated, I just don't = think it's that big of an issue. And, yes, I'm aware of the flash write issue, and I side with turning = off there, though I wouldn't be default. (And, defaulting filesystem parameters = based on some impression of the underlying hardware seems risky at best anyway.) I think there are a small number of cases where it's an issue, and = those people, yourself included, already know how to solve the problem. Myself, = personally, running only small systems, have never turned off atime updates. Don't = feel any need to. For specific heavy-load production systems, _everything_ = is looked at with a fine-toothed-comb. No reason to "default" something = that only those systems need. >> All said and done: I do appreciate having this discussion, = particularly >> publicly on a list. Too many "key changes" in FreeBSD in the past = few >> years have been results of closed-door meetings of sorts (private = mail >> or in-person *con meetings), so the fact this is public is good. >=20 > Everyone has their different uses of any OS, different experience etc, > so things like this need open discussion IMO. I agree very much, and while my opinions may not match many others, = I've been very pleased to read this discussion. Thank you for bringing it = up. - Chris From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 02:58:45 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3FC07E2A for ; Sun, 9 Jun 2013 02:58:45 +0000 (UTC) (envelope-from prvs=18721298a7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id D9DDA190C for ; Sun, 9 Jun 2013 02:58:44 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004224407.msg for ; Sun, 09 Jun 2013 03:58:43 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 09 Jun 2013 03:58:43 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=18721298a7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: fs@freebsd.org Message-ID: <798D298E63D34820AF2D804E6123997B@multiplay.co.uk> From: "Steven Hartland" To: , References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608200522.GA77122@neutralgood.org> Subject: Re: Changing the default for ZFS atime to off? Date: Sun, 9 Jun 2013 03:58:38 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 02:58:45 -0000 ----- Original Message ----- From: > On Sat, Jun 08, 2013 at 07:54:04PM +0100, Steven Hartland wrote: >> One of the first changes we make here when installing machines >> here to changing atime=off on all ZFS pool roots. >> >> I know there are a few apps which can rely on atime updates >> such as qmail and possibly postfix, but those seem like special >> cases for which admins should enable atime instead of the other >> way round. > > I believe mutt also uses them. Basically, any mail program using mbox mail > folders uses them to correctly report which mailboxes have not been read > yet. > > There are probably other cases as well. I don't think they should be > discounted simply because nobody here who bothers to speak up runs into > them. > > Turning off atime creates surprises for users. > >> This is going to of particular interest for flash based storage >> which should avoid unnessacary writes to reduce wear, but it will >> also help improve performance in general. >> >> So what do people think is it worth considering changing the >> default from atime=on to atime=off moving forward? > > I vote no. At least, don't change it unless the filesystem is actually on > a flash device. Otherwise we risk breakage down the road because something > that used to work doesn't work on a fresh FreeBSD install. I don't think having different defaults for different disks would be a good thing as that would just cause confusion. Would updating the installers to enable atime on the volumes that require it be an acceptable solution? > Has anyone done any kind of study to see exactly how much I/O is caused > by having atime updates be enabled? Does it _really_ make that much of > a difference to performance, and would it _really_ help prolong the life > of flash devices? I've just done some a very basic tests here on an 8.3-RELEASE machine:- 1. make buildkernel # atime=on adds 2k writes totalling 27MB 2. find /usr/src # atime=on adds 100 writes totaling 3MB Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 02:59:44 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4B6EFEBF; Sun, 9 Jun 2013 02:59:44 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by mx1.freebsd.org (Postfix) with ESMTP id 262121919; Sun, 9 Jun 2013 02:59:43 +0000 (UTC) Received: by mail-pd0-f173.google.com with SMTP id v14so2328773pde.18 for ; Sat, 08 Jun 2013 19:59:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=bK7ZJ30mOgv+o+SdSrCGZ6glYF6nqX0U4ob2VewHOSw=; b=q85BECHlYON5f9ydymr1f/uF0gziErmNIbGERxcnSLdnYxXX2t/5HRnurZblx8VCE9 rbGJUAntksUi/SemdEQ6WNlRKM+/BGPQhBm1J/0zpvNnSY0GfMRuNU3mUgetm1nkNU7B 34h7v7mI5fiPYWltWBOm13yp5gElRmO/rY2M/S2q/d2oAYobp0cWMQCAXVo7umbXZW1K 34uUBbRKwEtXwpwY8HX3gdNR3xCBmVD4hrC10cKM0CUtYP3VzYRQl087yorCMkJwtaEn RgJ9/iWt3tI+YYugeyd++IknPM3u2xuhpyrsm4VVdDCg0cxHoLxjmrz8O2ts265YcjL2 +/6A== MIME-Version: 1.0 X-Received: by 10.66.175.205 with SMTP id cc13mr8616764pac.191.1370746783517; Sat, 08 Jun 2013 19:59:43 -0700 (PDT) Received: by 10.70.31.195 with HTTP; Sat, 8 Jun 2013 19:59:43 -0700 (PDT) In-Reply-To: <20130608213331.GB18201@icarus.home.lan> References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> Date: Sat, 8 Jun 2013 21:59:43 -0500 Message-ID: Subject: Re: Changing the default for ZFS atime to off? From: Adam Vande More To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Steven Hartland , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 02:59:44 -0000 On Sat, Jun 8, 2013 at 4:33 PM, Jeremy Chadwick wrote: > I **strongly** oppose this change, for one key reason: the classic > Berkeley UNIX mail spool format (known as "mbox"), which is still > predominantly used on most UNIX systems today. > > Mail clients which read mbox files require a combination of atime and > mtime to determine if new mail has arrived within the mailbox. If > mtime > atime, then there's new mail. Not all mail clients support > alternate methods of detection (for example mutt has check_mbox_size, > which has had bugs/problems in the past (Google check_mbox_size), > and is fallible in other ways). > > Further points: > > - FreeBSD comes with sendmail (MTA/MDA), which supports only mbox > natively > - FreeBSD comes with mail/Mail/mailx (client), which only supports > only mbox natively > - FreeBSD comes with biff/comsat, as well as from(1), which supports > only mbox natively > Most modern linuce use relatime eg the benefits of noatime and preserving functionality for mail stuff. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/Relatime.html -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 03:04:58 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 601E01BE; Sun, 9 Jun 2013 03:04:58 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-qc0-x22f.google.com (mail-qc0-x22f.google.com [IPv6:2607:f8b0:400d:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 18D1D1A86; Sun, 9 Jun 2013 03:04:58 +0000 (UTC) Received: by mail-qc0-f175.google.com with SMTP id k14so2355367qcv.20 for ; Sat, 08 Jun 2013 20:04:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Q2fwWn7BP/WwDilp1kAoUkHimjEBFLi5yc+0oBcmmnU=; b=Je0bGJ+IXL/7nrWYrbqCKLMkEJilfMJ+HIiAR4CD5nz86/FfnpDQlR31ytpm3IRBrY TfgFsc32eNieISaCSD39RmKDkOznQSjvH3MJD4mxxmFuidsovR3pN9prNSV09/t5/Z2Q ZbweUQkYrzkEDGzpRUBj3rygo2laNLMS32UcTbT9MU+jxMZNQBXk5ZSI/kRCj6wopMF/ nKNwU3snPdsJTsgkSj7pBALLKZDgFo7k/0I4h2qwS3HOIvI1KKRP7Gt7LLn8K0FWMoZc k3Zy6CkDqiapAD763Y2Nc/33qmiqLMfYvt1oPqYE1ZdwveaQXdBxxCIPOOSJSO4eZzTk 8LCw== MIME-Version: 1.0 X-Received: by 10.224.51.7 with SMTP id b7mr8862370qag.8.1370747097551; Sat, 08 Jun 2013 20:04:57 -0700 (PDT) Received: by 10.49.42.73 with HTTP; Sat, 8 Jun 2013 20:04:57 -0700 (PDT) In-Reply-To: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> Date: Sat, 8 Jun 2013 20:04:57 -0700 Message-ID: Subject: Re: Changing the default for ZFS atime to off? From: Xin LI To: Steven Hartland Content-Type: text/plain; charset=UTF-8 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 03:04:58 -0000 I'd suggest implementing relative atime in VFS layer first: https://github.com/delphij/freebsd/commit/6a199821fbdbf424027499d4a0f8f113f6943e16 From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 03:14:23 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 99DE83C9 for ; Sun, 9 Jun 2013 03:14:23 +0000 (UTC) (envelope-from prvs=18721298a7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3F9AF1B0E for ; Sun, 9 Jun 2013 03:14:22 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004224506.msg for ; Sun, 09 Jun 2013 04:14:22 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 09 Jun 2013 04:14:22 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=18721298a7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: fs@freebsd.org Message-ID: <8C34552BD7074953A74E0443BAD1CCB7@multiplay.co.uk> From: "Steven Hartland" To: "Adam Vande More" , "Jeremy Chadwick" References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> Subject: Re: Changing the default for ZFS atime to off? Date: Sun, 9 Jun 2013 04:14:18 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 03:14:23 -0000 ----- Original Message ----- From: "Adam Vande More" To: "Jeremy Chadwick" Cc: "Steven Hartland" ; Sent: Sunday, June 09, 2013 3:59 AM Subject: Re: Changing the default for ZFS atime to off? > On Sat, Jun 8, 2013 at 4:33 PM, Jeremy Chadwick wrote: > >> I **strongly** oppose this change, for one key reason: the classic >> Berkeley UNIX mail spool format (known as "mbox"), which is still >> predominantly used on most UNIX systems today. >> >> Mail clients which read mbox files require a combination of atime and >> mtime to determine if new mail has arrived within the mailbox. If >> mtime > atime, then there's new mail. Not all mail clients support >> alternate methods of detection (for example mutt has check_mbox_size, >> which has had bugs/problems in the past (Google check_mbox_size), >> and is fallible in other ways). >> >> Further points: >> >> - FreeBSD comes with sendmail (MTA/MDA), which supports only mbox >> natively >> - FreeBSD comes with mail/Mail/mailx (client), which only supports >> only mbox natively >> - FreeBSD comes with biff/comsat, as well as from(1), which supports >> only mbox natively >> > > Most modern linuce use relatime eg the benefits of noatime and preserving > functionality for mail stuff. > > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/Relatime.html Now thats a clever idea, like it. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 04:46:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0EF2E478 for ; Sun, 9 Jun 2013 04:46:20 +0000 (UTC) (envelope-from rcartwri@asu.edu) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) by mx1.freebsd.org (Postfix) with ESMTP id 9C0F71045 for ; Sun, 9 Jun 2013 04:46:19 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id c11so574758wgh.1 for ; Sat, 08 Jun 2013 21:46:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=crxu/YAp6wm4o+Ec8vQytjho0IEdl5iVyOIoflllzRU=; b=l/EXFp0qriZlYFSiXogmkklJsfMrf3DKXMhr8E8K4ffJE8FZIGllA+iuQNtSo/4YGh 8gfmqjDA2GaAtjqMlcMJvTCKrufjwV6Vk6UPXDQmR5p6ImZNrwaMms2GLcBoZ9Tx3UQw AmGUiOHYvQcwaI0rJuk6uMebyoakv3CGyXk+8IIl9lE+X1wI6CBkreNcsSENEOBQ4EcU bUUvBLHzzLHZ+RXazXmUhk/1DNNSSoz4gk+wFCBoDPwoAXVvPLorTKnngOWf8vayA37m ODvRJ0ygVoWMDDl0PmLp5Cb4/Obs8s7h9kfSYF/jtFv7BMtAIIcpWbF5MeZTZk6D3/BD yDVg== MIME-Version: 1.0 X-Received: by 10.180.185.44 with SMTP id ez12mr2041578wic.7.1370753178216; Sat, 08 Jun 2013 21:46:18 -0700 (PDT) Received: by 10.180.76.114 with HTTP; Sat, 8 Jun 2013 21:46:18 -0700 (PDT) In-Reply-To: References: Date: Sat, 8 Jun 2013 21:46:18 -0700 Message-ID: Subject: Re: ZFS and Glabel From: "Reed A. Cartwright" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnYp8BcflJUSX0m8B6NbC9U3wMOAwwPbVIE7Wp3165LWSROFEXHx2TGB169IOXi8K5pBoYr X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 04:46:20 -0000 I'm looking at my dmesg.boot to figure out what settings I need to wire down my HDDs. I read the cam(4) documentation but I'm not sure I know what I'm doing. Any advice would be helpful. Let's assume that I want to wire everything down to their current positions, what should I put in loader.conf? I'll paste below some of my hardware configuration and lines from dmesg.boot that I think I need to look at. I have 4 LSI cards in the system: mps0, mps1, mps2, mps3. mps0: port 0xd000-0xd0ff mem 0xdff3c000-0xdff3ffff,0xdff40000-0xdff7ffff irq 24 at device 0.0 on pci5 mps1: port 0xc000-0xc0ff mem 0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 44 at device 0.0 on pci4 mps2: port 0xb000-0xb0ff mem 0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 32 at device 0.0 on pci3 mps3: port 0xe000-0xe0ff mem 0xdbf3c000-0xdbf3ffff,0xdbf40000-0xdbf7ffff irq 56 at device 0.0 on pci65 I have drives attached to two of those cards: da0 at mps0 bus 0 scbus0 target 0 lun 0 da1 at mps0 bus 0 scbus0 target 1 lun 0 da2 at mps0 bus 0 scbus0 target 2 lun 0 da3 at mps0 bus 0 scbus0 target 3 lun 0 da4 at mps0 bus 0 scbus0 target 4 lun 0 da5 at mps0 bus 0 scbus0 target 5 lun 0 da6 at mps0 bus 0 scbus0 target 6 lun 0 da7 at mps0 bus 0 scbus0 target 7 lun 0 da8 at mps3 bus 0 scbus9 target 0 lun 0 da9 at mps3 bus 0 scbus9 target 1 lun 0 da10 at mps3 bus 0 scbus9 target 2 lun 0 da11 at mps3 bus 0 scbus9 target 3 lun 0 da12 at mps3 bus 0 scbus9 target 4 lun 0 # camcontrol devlist -v scbus0 on mps0 bus 0: at scbus0 target 0 lun 0 (pass8,da7) at scbus0 target 1 lun 0 (pass9,da8) at scbus0 target 2 lun 0 (pass6,da5) at scbus0 target 3 lun 0 (pass7,da6) at scbus0 target 4 lun 0 (pass13,da12) at scbus0 target 5 lun 0 (pass12,da11) at scbus0 target 6 lun 0 (pass11,da10) at scbus0 target 7 lun 0 (pass10,da9) scbus1 on mps1 bus 0: scbus2 on mps2 bus 0: scbus3 on ahcich0 bus 0: <> at scbus3 target -1 lun -1 () scbus4 on ahcich1 bus 0: <> at scbus4 target -1 lun -1 () scbus5 on ahcich2 bus 0: <> at scbus5 target -1 lun -1 () scbus6 on ahcich3 bus 0: <> at scbus6 target -1 lun -1 () scbus7 on ata0 bus 0: <> at scbus7 target -1 lun -1 () scbus8 on ata1 bus 0: <> at scbus8 target -1 lun -1 () scbus9 on mps3 bus 0: at scbus9 target 0 lun 0 (da0,pass0) at scbus9 target 1 lun 0 (da1,pass1) at scbus9 target 2 lun 0 (da2,pass2) at scbus9 target 3 lun 0 (da3,pass3) at scbus9 target 4 lun 0 (da4,pass4) scbus10 on umass-sim0 bus 0: at scbus10 target 0 lun 0 (cd0,pass5) scbus-1 on xpt0 bus 0: <> at scbus-1 target -1 lun -1 (xpt0) # zpool status pool: storage state: ONLINE scan: scrub repaired 0 in 18h56m with 0 errors on Mon May 13 22:00:51 2013 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da6 ONLINE 0 0 0 da5 ONLINE 0 0 0 da8 ONLINE 0 0 0 da7 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 da11 ONLINE 0 0 0 da12 ONLINE 0 0 0 cache da3 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0 in 0h29m with 0 errors on Mon May 13 03:34:18 2013 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/8e7b4f79-7367-11e1-8722-00259058939a ONLINE 0 0 0 gptid/910120c5-7367-11e1-8722-00259058939a ONLINE 0 0 0 errors: No known data errors -- Reed A. Cartwright, PhD Assistant Professor of Genomics, Evolution, and Bioinformatics School of Life Sciences Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University - Address: The Biodesign Institute, PO Box 875301, Tempe, AZ 85287-5301 USA Packages: The Biodesign Institute, 1001 S. McAllister Ave, Tempe, AZ 85287-5301 USA Office: Biodesign A-224A, 1-480-965-9949 From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 06:54:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9BEB0F86 for ; Sun, 9 Jun 2013 06:54:48 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 22FEE1AE9 for ; Sun, 9 Jun 2013 06:54:47 +0000 (UTC) Received: from mfilter4-d.gandi.net (mfilter4-d.gandi.net [217.70.178.134]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 12A71A80B9; Sun, 9 Jun 2013 08:54:36 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter4-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter4-d.gandi.net (mfilter4-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id 8CZWnTkV-RuU; Sun, 9 Jun 2013 08:54:34 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 08309A80B1; Sun, 9 Jun 2013 08:54:33 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 402CF73A1C; Sat, 8 Jun 2013 23:54:30 -0700 (PDT) Date: Sat, 8 Jun 2013 23:54:30 -0700 From: Jeremy Chadwick To: "Reed A. Cartwright" Subject: Re: ZFS and Glabel Message-ID: <20130609065430.GA28206@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 06:54:48 -0000 On Sat, Jun 08, 2013 at 09:46:18PM -0700, Reed A. Cartwright wrote: > I'm looking at my dmesg.boot to figure out what settings I need to > wire down my HDDs. I read the cam(4) documentation but I'm not sure I > know what I'm doing. Any advice would be helpful. > > Let's assume that I want to wire everything down to their current > positions, what should I put in loader.conf? I'll paste below some of > my hardware configuration and lines from dmesg.boot that I think I > need to look at. > > I have 4 LSI cards in the system: mps0, mps1, mps2, mps3. > > mps0: port 0xd000-0xd0ff mem > 0xdff3c000-0xdff3ffff,0xdff40000-0xdff7ffff irq 24 at device 0.0 on > pci5 > mps1: port 0xc000-0xc0ff mem > 0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 44 at device 0.0 on > pci4 > mps2: port 0xb000-0xb0ff mem > 0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 32 at device 0.0 on > pci3 > mps3: port 0xe000-0xe0ff mem > 0xdbf3c000-0xdbf3ffff,0xdbf40000-0xdbf7ffff irq 56 at device 0.0 on > pci65 > > I have drives attached to two of those cards: > > da0 at mps0 bus 0 scbus0 target 0 lun 0 > da1 at mps0 bus 0 scbus0 target 1 lun 0 > da2 at mps0 bus 0 scbus0 target 2 lun 0 > da3 at mps0 bus 0 scbus0 target 3 lun 0 > da4 at mps0 bus 0 scbus0 target 4 lun 0 > da5 at mps0 bus 0 scbus0 target 5 lun 0 > da6 at mps0 bus 0 scbus0 target 6 lun 0 > da7 at mps0 bus 0 scbus0 target 7 lun 0 > da8 at mps3 bus 0 scbus9 target 0 lun 0 > da9 at mps3 bus 0 scbus9 target 1 lun 0 > da10 at mps3 bus 0 scbus9 target 2 lun 0 > da11 at mps3 bus 0 scbus9 target 3 lun 0 > da12 at mps3 bus 0 scbus9 target 4 lun 0 > > {snip} As usual, the situation is insane because you have so many controllers on the system (more than just mps(4)) -- specifically 11 separate controllers or systems using CAM (hence scbus0 to scbus10). Below is for mps(4). If you want to wire down ahci(4), things are a bit different, but you can read this post of mine: http://lists.freebsd.org/pipermail/freebsd-stable/2013-January/071851.html Enjoy: hint.scbus.0.at="mps0" hint.scbus.1.at="mps1" hint.scbus.2.at="mps2" hint.scbus.9.at="mps3" hint.da.0.at="scbus0" hint.da.1.at="scbus0" hint.da.2.at="scbus0" hint.da.3.at="scbus0" hint.da.4.at="scbus0" hint.da.5.at="scbus0" hint.da.6.at="scbus0" hint.da.7.at="scbus0" hint.da.8.at="scbus9" hint.da.9.at="scbus9" hint.da.10.at="scbus9" hint.da.11.at="scbus9" hint.da.12.at="scbus9" hint.da.13.at="scbus9" hint.da.14.at="scbus9" hint.da.15.at="scbus9" hint.da.16.at="scbus1" hint.da.17.at="scbus1" hint.da.18.at="scbus1" hint.da.19.at="scbus1" hint.da.20.at="scbus1" hint.da.21.at="scbus1" hint.da.22.at="scbus1" hint.da.23.at="scbus1" hint.da.24.at="scbus2" hint.da.25.at="scbus2" hint.da.26.at="scbus2" hint.da.27.at="scbus2" hint.da.28.at="scbus2" hint.da.29.at="scbus2" hint.da.30.at="scbus2" hint.da.31.at="scbus2" hint.da.0.target="0" hint.da.1.target="1" hint.da.2.target="2" hint.da.3.target="3" hint.da.4.target="4" hint.da.5.target="5" hint.da.6.target="6" hint.da.7.target="7" hint.da.8.target="0" hint.da.9.target="1" hint.da.10.target="2" hint.da.11.target="3" hint.da.12.target="4" hint.da.13.target="5" hint.da.14.target="6" hint.da.15.target="7" hint.da.16.target="0" hint.da.17.target="1" hint.da.18.target="2" hint.da.19.target="3" hint.da.20.target="4" hint.da.21.target="5" hint.da.22.target="6" hint.da.23.target="7" hint.da.24.target="0" hint.da.25.target="1" hint.da.26.target="2" hint.da.27.target="3" hint.da.28.target="4" hint.da.29.target="5" hint.da.30.target="6" hint.da.31.target="7" -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 10:40:24 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6CB7EEC8 for ; Sun, 9 Jun 2013 10:40:24 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) by mx1.freebsd.org (Postfix) with ESMTP id 0666F1095 for ; Sun, 9 Jun 2013 10:40:23 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id z12so3523073wgg.19 for ; Sun, 09 Jun 2013 03:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:x-mailer:from:subject:date :to:x-gm-message-state; bh=9N+NLraiOty4QUjVP50QXOHB1IUkRH1XedUPb5pWWPA=; b=U+Xv4bPY+eCx9APmqFQ7xUyJoWMUhB22lZyK+t/uuSJYHaEHY1oDx1zw/UYF3CybpW U3o39YuJTJZ+Wb17JWhTNscb5GIAEEdWRAHHpivYQDaL7V6Hd1j1Px8w8miaInfQoZqI IwWR96HgdUHWHKOteO3h7JH2wW1lbNUuvuGm6OD8bdYWF8iYgddm8RZXiPzZ98rA/ruH 4yojs7GjS8/UN+9tMLFMW12rCRfRSfNxVprrydjFWZx4n9DOB8NWOHcKIL02bgdK231H LBW/Jb7Yt8EQK/papN6pvwmFVuAtzy2EX96ovUqxNnfBTAfPG3mkWtUoDzk6uO3ZEO2y xHBQ== X-Received: by 10.180.89.140 with SMTP id bo12mr2448667wib.22.1370774423141; Sun, 09 Jun 2013 03:40:23 -0700 (PDT) Received: from [192.168.0.9] (AAubervilliers-652-1-225-231.w83-112.abo.wanadoo.fr. [83.112.232.231]) by mx.google.com with ESMTPSA id ft10sm5532746wib.7.2013.06.09.03.40.21 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 09 Jun 2013 03:40:22 -0700 (PDT) References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> Mime-Version: 1.0 (1.0) In-Reply-To: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: <2AC5E8F4-3AF1-4EA5-975D-741506AC70A5@my.gd> X-Mailer: iPhone Mail (10B144) From: Damien Fleuriot Subject: Re: Changing the default for ZFS atime to off? Date: Sun, 9 Jun 2013 12:39:17 +0200 To: Steven Hartland X-Gm-Message-State: ALoCoQlUgO9W/8K2n7fCgPkhgsxEN6lAa7nkTrhfhhOmwdDOyrT+Z4Hx/WcCBxymmmDLO/JnsiYM Cc: "" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 10:40:24 -0000 On 8 Jun 2013, at 20:54, "Steven Hartland" wrote: > One of the first changes we make here when installing machines > here to changing atime=3Doff on all ZFS pool roots. >=20 > I know there are a few apps which can rely on atime updates > such as qmail and possibly postfix, but those seem like special > cases for which admins should enable atime instead of the other > way round. >=20 > This is going to of particular interest for flash based storage > which should avoid unnessacary writes to reduce wear, but it will > also help improve performance in general. >=20 > So what do people think is it worth considering changing the > default from atime=3Don to atime=3Doff moving forward? >=20 > If so what about UFS, same change? >=20 I strongly oppose the change for reasons already raised by many people regar= ding the mbox file. Besides, if atime should default to off on 2 filesystems and on on all other= s, that would definitely create confusion. Last, I believe it should be the admin's decision to turn atime off, just li= ke it is his decision to turn compression on. Don't mistake me, we turn atime=3Doff on every box, every filesystem, even o= n Mac's HFS. Yet I believe defaulting it to off is a mistake.= From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 11:45:35 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EE63ECA1 for ; Sun, 9 Jun 2013 11:45:35 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 800051374 for ; Sun, 9 Jun 2013 11:45:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r59BjSIC084468 for ; Sun, 9 Jun 2013 15:45:28 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Sun, 9 Jun 2013 15:45:28 +0400 (MSK) From: Dmitry Morozovsky To: freebsd-fs@FreeBSD.org Subject: /tmp: change default to mdmfs and/or tmpfs? Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Sun, 09 Jun 2013 15:45:28 +0400 (MSK) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 11:45:36 -0000 Dear colleagues, what do you think about stop using precious disk or even SSD resources for /tmp? For last several (well, maybe over 10?) years I constantly use md (swap-backed) for /tmp, usually 128M in size, which is enough for most of our server needs. Some require more, but none more than 512M. Regarding the options, we use tmpmfs_flags="-S -n -o async -b 4096 -f 512" Given more and more fixes/improvements committed to tmpfs, switching /tmp to it would be even better idea. You thoughts? Thank you! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 12:00:36 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 386FEEA for ; Sun, 9 Jun 2013 12:00:36 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp.smtpout.orange.fr (smtp03.smtpout.orange.fr [80.12.242.125]) by mx1.freebsd.org (Postfix) with ESMTP id ADB9115EB for ; Sun, 9 Jun 2013 12:00:34 +0000 (UTC) Received: from [10.42.69.5] ([82.120.202.131]) by mwinf5d06 with ME id mBsx1l0082qcW6A03Bsx5c; Sun, 09 Jun 2013 13:52:57 +0200 Message-ID: <1370779193.2018.10.camel@Nerz-PC> Subject: Re: /tmp: change default to mdmfs and/or tmpfs? From: =?ISO-8859-1?Q?Lo=EFc?= BLOT To: freebsd-fs@freebsd.org Date: Sun, 09 Jun 2013 13:59:53 +0200 In-Reply-To: References: Organization: UNIX Experience Fr Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-S1znKLqXGdPP+SXnlk4B" X-Mailer: Evolution 3.8.3 Mime-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 12:00:36 -0000 --=-S1znKLqXGdPP+SXnlk4B Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Dmitry, I agree with you. /tmp is a temporary filesystem. On machines (both servers and clients), i think /tmp must be like Linux, cleared at reboot because it's a temporary FS. The Linux point i don't agree is size of /tmp (on Linux: ram-memory/2). This formula is quite good for system with lower than 2Gb of RAM but i think 2Go is sufficient to system with more RAM (we could control this a install of in the fstab after ?). --=20 Best regards, Lo=C3=AFc BLOT,=20 UNIX systems, security and network expert http://www.unix-experience.fr Le dimanche 09 juin 2013 =C3=A0 15:45 +0400, Dmitry Morozovsky a =C3=A9crit= : > Dear colleagues, >=20 > what do you think about stop using precious disk or even SSD resources fo= r=20 > /tmp? >=20 > For last several (well, maybe over 10?) years I constantly use md (swap-b= acked)=20 > for /tmp, usually 128M in size, which is enough for most of our server ne= eds. =20 > Some require more, but none more than 512M. Regarding the options, we us= e > tmpmfs_flags=3D"-S -n -o async -b 4096 -f 512" >=20 > Given more and more fixes/improvements committed to tmpfs, switching /tmp= to it=20 > would be even better idea. >=20 > You thoughts? Thank you! >=20 >=20 --=-S1znKLqXGdPP+SXnlk4B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iF4EABEIAAYFAlG0bjkACgkQh290DZyz8uaPGQEApIRm/z1EYj4jJvdWHGnn2X+j hLQTsdMMktHhYm2t0e8BANHRRbEGr1coZpLYnCJnzIS7YkFOLHhMvsMoSYVqlDlZ =Esvp -----END PGP SIGNATURE----- --=-S1znKLqXGdPP+SXnlk4B-- From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 12:18:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9EF3B33D for ; Sun, 9 Jun 2013 12:18:31 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) by mx1.freebsd.org (Postfix) with ESMTP id 62E961674 for ; Sun, 9 Jun 2013 12:18:30 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UleZh-0004kY-P1 for freebsd-fs@freebsd.org; Sun, 09 Jun 2013 14:18:22 +0200 Received: from dhcp-077-251-158-153.chello.nl ([77.251.158.153] helo=pinky) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UleZe-0003CC-TS for freebsd-fs@freebsd.org; Sun, 09 Jun 2013 14:18:18 +0200 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: /tmp: change default to mdmfs and/or tmpfs? References: Date: Sun, 09 Jun 2013 14:18:20 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.15 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 2ecd0b53b7de9511489f92806276a3d7 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 12:18:31 -0000 On Sun, 09 Jun 2013 13:45:28 +0200, Dmitry Morozovsky wrote: > Dear colleagues, > > what do you think about stop using precious disk or even SSD resources > for > /tmp? > > For last several (well, maybe over 10?) years I constantly use md > (swap-backed) > for /tmp, usually 128M in size, which is enough for most of our server > needs. > Some require more, but none more than 512M. Regarding the options, we > use > tmpmfs_flags="-S -n -o async -b 4096 -f 512" > > Given more and more fixes/improvements committed to tmpfs, switching > /tmp to it > would be even better idea. > > You thoughts? Thank you! > > What keeps you from putting this in fstab and stop using the tmpmfs rc.conf variable? 'tmpfs /tmp tmpfs rw,size=536870912 0 0' I thought tmpmfs/varmfs infrastructure was more for diskless/full-NFS systems anyways. Ronald. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 12:23:13 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D3DA940A for ; Sun, 9 Jun 2013 12:23:13 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 665A0169A for ; Sun, 9 Jun 2013 12:23:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r59CN95Y085950; Sun, 9 Jun 2013 16:23:09 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Sun, 9 Jun 2013 16:23:09 +0400 (MSK) From: Dmitry Morozovsky To: Ronald Klop Subject: Re: /tmp: change default to mdmfs and/or tmpfs? In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Sun, 09 Jun 2013 16:23:09 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 12:23:13 -0000 On Sun, 9 Jun 2013, Ronald Klop wrote: > > what do you think about stop using precious disk or even SSD resources for > > /tmp? > > > > For last several (well, maybe over 10?) years I constantly use md > > (swap-backed) > > for /tmp, usually 128M in size, which is enough for most of our server > > needs. > > Some require more, but none more than 512M. Regarding the options, we use > > tmpmfs_flags="-S -n -o async -b 4096 -f 512" > > > > Given more and more fixes/improvements committed to tmpfs, switching /tmp to > > it > > would be even better idea. > > > > You thoughts? Thank you! > > > > > > What keeps you from putting this in fstab and stop using the tmpmfs rc.conf > variable? > 'tmpfs /tmp tmpfs rw,size=536870912 0 0' > > I thought tmpmfs/varmfs infrastructure was more for diskless/full-NFS systems > anyways. I do not see much difference here, to be honest. Either way, you have memory-backed /tmp (though via using /etc/rc.d/tmp you can fine-tune FS options a bit easier, at least for my PoV) The question is: shouldn't we treat this as a default at least for usual amd64/i386 installation with "non-embedded" quantity of RAM (like, e.g. > 512M)? -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 12:33:59 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8F65651C for ; Sun, 9 Jun 2013 12:33:59 +0000 (UTC) (envelope-from lee@dilkie.com) Received: from data.snhdns.com (data.snhdns.com [208.76.82.136]) by mx1.freebsd.org (Postfix) with ESMTP id 5C5F116DE for ; Sun, 9 Jun 2013 12:33:59 +0000 (UTC) Received: from [142.46.160.218] (port=60357 helo=[206.51.1.11]) by data.snhdns.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.80) (envelope-from ) id 1UldxB-0004xR-Gt; Sun, 09 Jun 2013 07:38:33 -0400 Message-ID: <51B4693B.8020704@dilkie.com> Date: Sun, 09 Jun 2013 07:38:35 -0400 From: Lee Dilkie User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Steven Hartland Subject: Re: Changing the default for ZFS atime to off? References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> <8C34552BD7074953A74E0443BAD1CCB7@multiplay.co.uk> In-Reply-To: <8C34552BD7074953A74E0443BAD1CCB7@multiplay.co.uk> X-Enigmail-Version: 1.5.1 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - data.snhdns.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - dilkie.com X-Get-Message-Sender-Via: data.snhdns.com: authenticated_id: lee@dilkie.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 12:33:59 -0000 On 6/8/2013 11:14 PM, Steven Hartland wrote: > >> Most modern linuce use relatime eg the benefits of noatime and >> preserving >> functionality for mail stuff. >> >> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/Relatime.html >> > > Now thats a clever idea, like it. > Indeed... very clever. caching atime itself. I like it too. -lee From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 12:46:18 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E0CAAA93 for ; Sun, 9 Jun 2013 12:46:18 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 82ACA178E for ; Sun, 9 Jun 2013 12:46:18 +0000 (UTC) Received: from mfilter14-d.gandi.net (mfilter14-d.gandi.net [217.70.178.142]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 500ED41C067; Sun, 9 Jun 2013 14:46:07 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter14-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter14-d.gandi.net (mfilter14-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id WFI7CVzrHShu; Sun, 9 Jun 2013 14:46:05 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 1C89141C060; Sun, 9 Jun 2013 14:46:05 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 4114D73A1C; Sun, 9 Jun 2013 05:46:03 -0700 (PDT) Date: Sun, 9 Jun 2013 05:46:03 -0700 From: Jeremy Chadwick To: Dmitry Morozovsky Subject: Re: /tmp: change default to mdmfs and/or tmpfs? Message-ID: <20130609124603.GA35681@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 12:46:18 -0000 On Sun, Jun 09, 2013 at 03:45:28PM +0400, Dmitry Morozovsky wrote: > Dear colleagues, > > what do you think about stop using precious disk or even SSD resources for > /tmp? > > For last several (well, maybe over 10?) years I constantly use md (swap-backed) > for /tmp, usually 128M in size, which is enough for most of our server needs. > Some require more, but none more than 512M. Regarding the options, we use > tmpmfs_flags="-S -n -o async -b 4096 -f 512" Hold up. Let's start with what you just gave. Everything I'm talking about below is for stable/9 by the way: 1. grep -r tmpfs /etc returns nothing, so I don't know where this magic comes from, 2. tmpfs(5) documents none of these flags, and the flags you've given cannot be mdconfig(8) flags because: a) -S requires a sector size (you specified none), b) -n would have no bearing given the context, c) -o async applies only to vnode-backed models (default is malloc, and I see no -t vnode), d) There is no -b flag, e) The -f flag is for -t vnode only, and refers to a filename for the vnode-backing store. So consider me very, very confused with what you've given. Maybe the flags were different on FreeBSD 6.x or 7.x or 8.x? I haven't checked http://www.freebsd.org/cgi/man.cgi yet. > Given more and more fixes/improvements committed to tmpfs, switching /tmp to it > would be even better idea. > > You thoughts? Thank you! As I understand it, there are (or were -- because I remember seeing them repeatedly brought up on the mailing lists) problems with tmpfs. Sometimes these issues would turn out to be with other filesystems (such as unionfs), but other times not so much. If my memory serves me correct, there are major complexities with VM/memory management when intermixing tmpfs + ZFS + UFS on a system***. Skimming lists and my memory, I come across these (and I recommend anyone replying please read the full thread from that post onward): http://lists.freebsd.org/pipermail/freebsd-current/2011-June/025459.html http://lists.freebsd.org/pipermail/freebsd-current/2011-June/025461.html http://lists.freebsd.org/pipermail/freebsd-fs/2013-January/016165.html Be aware the -current thread posts I linked come from a thread started asking if tmpfs should "really still be considered experimental or not". Then there's this, which shows issues getting MFC'd to stable/9 but not 8.x, so one may want to be very careful about decisions where tmpfs gets used by default going forward (but keep reading): http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/139312 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/159418 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/155411 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/171626 However PR 155411 claims the issue happens on 9.0-RELEASE as well, and PR 139312 even mentions/brings up ZFS -- I have no idea what "State: patched" means (is it fixed? Is it committed? Why isn't the PR closed? etc.) I also see this: http://forums.freebsd.org/archive/index.php/t-30467.html Where someone stated that excessive ARC usage on ZFS had an indirect effect on tmpfs. r233769 to stable/9 may have fixed this, but given the history of all of this "juggling" of Feature X causing memory exhaustion for Feature Y, and in turn affecting Feature Z, all within kernel space, I really don't know how much I can trust all of this. One should probably review the FreeBSD forums for other posts as well, as gut feeling says there's probably more there too. Now some more generic items: tmpfs does not retain data across reboots -- that's by design, of course. I have concerns with regards to stuff that may end up in /tmp that *should* persist across reboots and may surprise an administrator that the files he/she placed in /tmp + reboot no longer appear. While this may be considered a social problem of sorts, it definitely requires one to reconsider use of /tmp (instead /var/tmp, for example) for certain tasks. In closing: If you want to make bsdinstall ask/prompt the administrator "would you like to use tmpfs for /tmp?", then I'm all for it -- sounds good to me. But doing it by default would be something (at this time) I would not be in favour of. I just don't get the impression of stability from tmpfs given its track record. (Yes, I am paranoid in this regard) *** -- For example I personally have experienced strange behaviour when ZFS+UFS are used on the same system with massive amounts of I/O being done between the two (my experience showed the ZFS ARC suddenly limiting itself in a strange manner, to some abysmally small limit (much lower than arc_max)). In this case, I can only imagine tmpfs making things "even worse" given added memory pressure and so on. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 13:01:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A0EC6916 for ; Sun, 9 Jun 2013 13:01:57 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from thyme.infocus-llc.com (server.infocus-llc.com [206.156.254.44]) by mx1.freebsd.org (Postfix) with ESMTP id 7E5F218F8 for ; Sun, 9 Jun 2013 13:01:57 +0000 (UTC) Received: from draco.over-yonder.net (c-75-65-60-66.hsd1.ms.comcast.net [75.65.60.66]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by thyme.infocus-llc.com (Postfix) with ESMTPSA id A0C1637B4AE; Sun, 9 Jun 2013 08:01:56 -0500 (CDT) Received: by draco.over-yonder.net (Postfix, from userid 100) id 3bSyKw1nrDzG2w; Sun, 9 Jun 2013 08:01:56 -0500 (CDT) Date: Sun, 9 Jun 2013 08:01:56 -0500 From: "Matthew D. Fuller" To: Ronald Klop Subject: Re: /tmp: change default to mdmfs and/or tmpfs? Message-ID: <20130609130156.GN61341@over-yonder.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Editor: vi X-OS: FreeBSD User-Agent: Mutt/1.5.21-fullermd.4 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.6 at thyme.infocus-llc.com X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 13:01:57 -0000 On Sun, Jun 09, 2013 at 02:18:20PM +0200 I heard the voice of Ronald Klop, and lo! it spake thus: > > What keeps you from putting this in fstab and stop using the tmpmfs > rc.conf variable? > 'tmpfs /tmp tmpfs rw,size=536870912 0 0' That makes a tmpfs(5) filesystem, not a ufs-on-md(8) filesystem like rc.conf tmpmfs does. Whether that matters depends on your own peculiar situation, but they're not exactly the same. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 13:06:08 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 87FBAA06 for ; Sun, 9 Jun 2013 13:06:08 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from thyme.infocus-llc.com (server.infocus-llc.com [206.156.254.44]) by mx1.freebsd.org (Postfix) with ESMTP id 65D4B191C for ; Sun, 9 Jun 2013 13:06:08 +0000 (UTC) Received: from draco.over-yonder.net (c-75-65-60-66.hsd1.ms.comcast.net [75.65.60.66]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by thyme.infocus-llc.com (Postfix) with ESMTPSA id 109ED37B4E4; Sun, 9 Jun 2013 08:00:38 -0500 (CDT) Received: by draco.over-yonder.net (Postfix, from userid 100) id 3bSyJP3WXhzG2l; Sun, 9 Jun 2013 08:00:37 -0500 (CDT) Date: Sun, 9 Jun 2013 08:00:37 -0500 From: "Matthew D. Fuller" To: Jeremy Chadwick Subject: Re: /tmp: change default to mdmfs and/or tmpfs? Message-ID: <20130609130037.GM61341@over-yonder.net> References: <20130609124603.GA35681@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130609124603.GA35681@icarus.home.lan> X-Editor: vi X-OS: FreeBSD User-Agent: Mutt/1.5.21-fullermd.4 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.6 at thyme.infocus-llc.com X-Virus-Status: Clean Cc: freebsd-fs@FreeBSD.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 13:06:08 -0000 On Sun, Jun 09, 2013 at 05:46:03AM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: > > 1. grep -r tmpfs /etc returns nothing, so I don't know where this magic > comes from, Remembering the second 'm' in tmpmfs 8-} > 2. tmpfs(5) documents none of these flags, and the flags you've given > cannot be mdconfig(8) flags because: Which is why they're mdmfs(8) flags (/etc/rc.d/tmp -> mount_md from rc.subr -> mdmfs). -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 13:16:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 86721F63 for ; Sun, 9 Jun 2013 13:16:25 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id F1F151985 for ; Sun, 9 Jun 2013 13:16:24 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r59DGNIe088449; Sun, 9 Jun 2013 17:16:23 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Sun, 9 Jun 2013 17:16:23 +0400 (MSK) From: Dmitry Morozovsky To: Jeremy Chadwick Subject: Re: /tmp: change default to mdmfs and/or tmpfs? In-Reply-To: <20130609124603.GA35681@icarus.home.lan> Message-ID: References: <20130609124603.GA35681@icarus.home.lan> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Sun, 09 Jun 2013 17:16:23 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 13:16:25 -0000 On Sun, 9 Jun 2013, Jeremy Chadwick wrote: > > what do you think about stop using precious disk or even SSD resources for > > /tmp? > > > > For last several (well, maybe over 10?) years I constantly use md (swap-backed) > > for /tmp, usually 128M in size, which is enough for most of our server needs. > > Some require more, but none more than 512M. Regarding the options, we use > > tmpmfs_flags="-S -n -o async -b 4096 -f 512" > > Hold up. Let's start with what you just gave. Everything I'm talking > about below is for stable/9 by the way: Don't mix md-backed tmp with tmpfs, see below: > 1. grep -r tmpfs /etc returns nothing, so I don't know where this magic > comes from, it is /etc/rc.d/tmp with tmpmfs_* rc variables actually > 2. tmpfs(5) documents none of these flags, and the flags you've given > cannot be mdconfig(8) flags because: > > a) -S requires a sector size (you specified none), > b) -n would have no bearing given the context, > c) -o async applies only to vnode-backed models (default is malloc, > and I see no -t vnode), > d) There is no -b flag, > e) The -f flag is for -t vnode only, and refers to a filename for the > vnode-backing store. all these are related to mdmfs(8) > So consider me very, very confused with what you've given. Maybe the > flags were different on FreeBSD 6.x or 7.x or 8.x? I haven't checked > http://www.freebsd.org/cgi/man.cgi yet. Actually, there are two different questions (or kind of questions): - are we considering switching off /tmp from real media-backed storage? - is so, what are we selecting: memory/swap-backed UFS (mdmfs) or tmpfs? > As I understand it, there are (or were -- because I remember seeing them > repeatedly brought up on the mailing lists) problems with tmpfs. > Sometimes these issues would turn out to be with other filesystems (such > as unionfs), but other times not so much. > > If my memory serves me correct, there are major complexities with > VM/memory management when intermixing tmpfs + ZFS + UFS on a system***. Yes, hence my question about status of tmpfs now. And yes, I personally do *not* used tmpfs-backed /tmp on real productionj servers -- just mdmfs-backed. OTOH, I *do* use tmpfs for my builder (for tinderbox for now, but I'm planning switch buildworld/buildkernel there too), with little issues yet. [snip the rest, I have to dig a bit more to answer] -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 13:17:01 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CF63CFD9 for ; Sun, 9 Jun 2013 13:17:01 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 8CACA198D for ; Sun, 9 Jun 2013 13:17:01 +0000 (UTC) Received: from mfilter3-d.gandi.net (mfilter3-d.gandi.net [217.70.178.133]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 7F4A4A80D0; Sun, 9 Jun 2013 15:16:50 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter3-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter3-d.gandi.net (mfilter3-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id GvWTETafvw6Z; Sun, 9 Jun 2013 15:16:48 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 47B63A80C4; Sun, 9 Jun 2013 15:16:48 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 8EC7773A1C; Sun, 9 Jun 2013 06:16:46 -0700 (PDT) Date: Sun, 9 Jun 2013 06:16:46 -0700 From: Jeremy Chadwick To: "Matthew D. Fuller" Subject: Re: /tmp: change default to mdmfs and/or tmpfs? Message-ID: <20130609131646.GA37012@icarus.home.lan> References: <20130609124603.GA35681@icarus.home.lan> <20130609130037.GM61341@over-yonder.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130609130037.GM61341@over-yonder.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 13:17:01 -0000 On Sun, Jun 09, 2013 at 08:00:37AM -0500, Matthew D. Fuller wrote: > On Sun, Jun 09, 2013 at 05:46:03AM -0700 I heard the voice of > Jeremy Chadwick, and lo! it spake thus: > > > > 1. grep -r tmpfs /etc returns nothing, so I don't know where this magic > > comes from, > > Remembering the second 'm' in tmpmfs 8-} > > > > 2. tmpfs(5) documents none of these flags, and the flags you've given > > cannot be mdconfig(8) flags because: > > Which is why they're mdmfs(8) flags (/etc/rc.d/tmp -> mount_md from > rc.subr -> mdmfs). Thank you -- the magic has been discovered! ;-) I had never heard of mdmfs(8) until now (mdconfig(8) sure, mdmfs(8) nope). Looking at the source, this thing is just a "fancy wrapper" written in C, using mdconfig(8) and newfs(8), as well as geom_uzip(4) (which I also didn't know about until now) in some manner (not sure how that fits into the puzzle). I guess it's mainly a program for convenience. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 13:25:56 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 707E0458 for ; Sun, 9 Jun 2013 13:25:56 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 35B4819C6 for ; Sun, 9 Jun 2013 13:25:55 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.7/8.14.7) with ESMTP id r59DPtKv070635; Sun, 9 Jun 2013 07:25:55 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.7/8.14.7/Submit) with ESMTP id r59DPtLM070632; Sun, 9 Jun 2013 07:25:55 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Sun, 9 Jun 2013 07:25:55 -0600 (MDT) From: Warren Block To: Dmitry Morozovsky Subject: Re: /tmp: change default to mdmfs and/or tmpfs? In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (wonkity.com [127.0.0.1]); Sun, 09 Jun 2013 07:25:55 -0600 (MDT) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 13:25:56 -0000 On Sun, 9 Jun 2013, Dmitry Morozovsky wrote: > Dear colleagues, > > what do you think about stop using precious disk or even SSD resources for > /tmp? > > For last several (well, maybe over 10?) years I constantly use md (swap-backed) > for /tmp, usually 128M in size, which is enough for most of our server needs. > Some require more, but none more than 512M. Regarding the options, we use > tmpmfs_flags="-S -n -o async -b 4096 -f 512" > > Given more and more fixes/improvements committed to tmpfs, switching /tmp to it > would be even better idea. > > You thoughts? Thank you! tmpfs has been working fine here for /tmp. I also use it for /usr/obj. It does not tie up a fixed chunk of RAM, and can grow to large sizes if necessary. And maximum size can be limited in fstab. (Possible improvement: allow human-readable sizes instead of just blocks.) One problem is that tmpfs is cleared by a reboot. This would surprise users expecting the default behavior (clear_tmp_enable="NO"), and would require some prominent warnings in the release notes and maybe in the installer. Or in the startup scripts: "/tmp on tmpfs, contents will be discarded on reboot". From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 14:09:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 06B80CC1 for ; Sun, 9 Jun 2013 14:09:07 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 6F4601AD5 for ; Sun, 9 Jun 2013 14:09:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r59E950f090752; Sun, 9 Jun 2013 18:09:05 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Sun, 9 Jun 2013 18:09:05 +0400 (MSK) From: Dmitry Morozovsky To: Jeremy Chadwick Subject: Re: /tmp: change default to mdmfs and/or tmpfs? In-Reply-To: <20130609124603.GA35681@icarus.home.lan> Message-ID: References: <20130609124603.GA35681@icarus.home.lan> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Sun, 09 Jun 2013 18:09:05 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 14:09:07 -0000 On Sun, 9 Jun 2013, Jeremy Chadwick wrote: [back to second part] [and snip a lot here too] > Where someone stated that excessive ARC usage on ZFS had an indirect > effect on tmpfs. r233769 to stable/9 may have fixed this, but given the > history of all of this "juggling" of Feature X causing memory exhaustion > for Feature Y, and in turn affecting Feature Z, all within kernel space, > I really don't know how much I can trust all of this. > > One should probably review the FreeBSD forums for other posts as well, > as gut feeling says there's probably more there too. .. that's why I'm trying to discuss this in public (maybe wrong list had been chosen, perhaps -stable@ would fit a bit more) -- to share knowledge, opinions and other related stuff ;) > In closing: > > If you want to make bsdinstall ask/prompt the administrator "would you > like to use tmpfs for /tmp?", then I'm all for it -- sounds good to me. > But doing it by default would be something (at this time) I would not be > in favour of. I just don't get the impression of stability from tmpfs > given its track record. (Yes, I am paranoid in this regard) Agree at most. > *** -- For example I personally have experienced strange behaviour when > ZFS+UFS are used on the same system with massive amounts of I/O being > done between the two (my experience showed the ZFS ARC suddenly limiting > itself in a strange manner, to some abysmally small limit (much lower > than arc_max)). In this case, I can only imagine tmpfs making things > "even worse" given added memory pressure and so on. For our backup server, which uses rather huge 24*2T raidz2 and periodically synced on eSATA UFS, I sometimes seen speed drops, but nothing really bad. It's stable/9 with 16G of RAM though, perhaps on systems where RAM is tighter the situation could be much worse... -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 16:37:41 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E4A0E1FD; Sun, 9 Jun 2013 16:37:41 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by mx1.freebsd.org (Postfix) with ESMTP id BDECE111A; Sun, 9 Jun 2013 16:37:41 +0000 (UTC) Received: by mail-pd0-f177.google.com with SMTP id p10so1071300pdj.8 for ; Sun, 09 Jun 2013 09:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=HuRdr8G1D01VyeayCEqb/wxl1tWu6SzrR37YxYA0d6I=; b=TgRZuto+mvf9mSNglXbvbD9t8pIr/Z8kUEW32XQ2zrzA/dWIb0g/ZHeHXeD+GW+nkS DjQnGrs519FXmSKETod+ltqPGPEr3tRgyy9OXGHkWsnamegR7So8RNjzqBCWpeSCt0AZ xORRN0g9zERMZahjymll9TtAutwKiJxFTnAiBbYEz7Msx+UPyaFDgMhirsfvjA9JjcUe mjn3HZ7m7pq4Azu8LWdmFrtxe36QKuc4z5/I8li4ryGRNvXs8gM+Mxxxzo8ruKumFiAs +RvrQij1LvY7LxAwNC6nnrd+MQUnaE7X+twcUahukfoJYDNa9seYj2K4JqAc9l8zyhdY +y9A== MIME-Version: 1.0 X-Received: by 10.66.26.231 with SMTP id o7mr10647586pag.207.1370795861194; Sun, 09 Jun 2013 09:37:41 -0700 (PDT) Received: by 10.70.31.195 with HTTP; Sun, 9 Jun 2013 09:37:41 -0700 (PDT) In-Reply-To: References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> Date: Sun, 9 Jun 2013 11:37:41 -0500 Message-ID: Subject: Re: Changing the default for ZFS atime to off? From: Adam Vande More To: Xin LI Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Steven Hartland , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 16:37:42 -0000 On Sat, Jun 8, 2013 at 10:04 PM, Xin LI wrote: > I'd suggest implementing relative atime in VFS layer first: > > > https://github.com/delphij/freebsd/commit/6a199821fbdbf424027499d4a0f8f113f6943e16 Cool, looks like you were already on this. I would offer to test some, but I'm pretty much ZFS only at this point. I imagine there would be much less objections to defaulting to relatime rather than noatime. AFAIK, relatime doesn't break any major tools. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 16:39:46 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 844C62E0 for ; Sun, 9 Jun 2013 16:39:46 +0000 (UTC) (envelope-from prvs=18721298a7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 2915C1131 for ; Sun, 9 Jun 2013 16:39:45 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004231304.msg for ; Sun, 09 Jun 2013 17:39:44 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 09 Jun 2013 17:39:44 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=18721298a7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: fs@freebsd.org Message-ID: <3152D35416D047BCA14009F3108A8967@multiplay.co.uk> From: "Steven Hartland" To: "Damien Fleuriot" References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <2AC5E8F4-3AF1-4EA5-975D-741506AC70A5@my.gd> Subject: Re: Changing the default for ZFS atime to off? Date: Sun, 9 Jun 2013 17:39:42 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 16:39:46 -0000 ----- Original Message ----- From: "Damien Fleuriot" To: "Steven Hartland" Cc: Sent: Sunday, June 09, 2013 11:39 AM Subject: Re: Changing the default for ZFS atime to off? > > On 8 Jun 2013, at 20:54, "Steven Hartland" wrote: > >> One of the first changes we make here when installing machines >> here to changing atime=off on all ZFS pool roots. >> >> I know there are a few apps which can rely on atime updates >> such as qmail and possibly postfix, but those seem like special >> cases for which admins should enable atime instead of the other >> way round. >> >> This is going to of particular interest for flash based storage >> which should avoid unnessacary writes to reduce wear, but it will >> also help improve performance in general. >> >> So what do people think is it worth considering changing the >> default from atime=on to atime=off moving forward? >> >> If so what about UFS, same change? > > I strongly oppose the change for reasons already raised by many > people regarding the mbox file. > > Besides, if atime should default to off on 2 filesystems and on > on all others, that would definitely create confusion. A very valid point. > Last, I believe it should be the admin's decision to turn atime > off, just like it is his decision to turn compression on. Trying to play devils advocate here; compression is off by default because it uses resources and doesn't give a benefit for all cases. Is that not the same as atime, and it should be an admins decision to turn it on where it's wanted? > Don't mistake me, we turn atime=off on every box, every > filesystem, even on Mac's HFS. > Yet I believe defaulting it to off is a mistake. That's what prompted me to start this discussion. If a large portion of users either disable atime already or would disable atime if they knew about it, does that bring into question the current default? Potentially a better solution would be to make atime an option in the installer, as that helps educate admins that the option exists, which is potentially the biggest issue here? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 16:42:34 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D00903AA for ; Sun, 9 Jun 2013 16:42:34 +0000 (UTC) (envelope-from prvs=18721298a7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 7138D1146 for ; Sun, 9 Jun 2013 16:42:34 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004231359.msg for ; Sun, 09 Jun 2013 17:42:34 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 09 Jun 2013 17:42:34 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=18721298a7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Xin LI" References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> Subject: Re: Changing the default for ZFS atime to off? Date: Sun, 9 Jun 2013 17:42:31 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 16:42:34 -0000 ----- Original Message ----- From: "Xin LI" > I'd suggest implementing relative atime in VFS layer first: > > https://github.com/delphij/freebsd/commit/6a199821fbdbf424027499d4a0f8f113f6943e16 > Cool, its this something your looking to commit to HEAD Xin? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 17:14:08 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4215B257 for ; Sun, 9 Jun 2013 17:14:08 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net [217.70.183.196]) by mx1.freebsd.org (Postfix) with ESMTP id D9BF312FE for ; Sun, 9 Jun 2013 17:14:07 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay4-d.mail.gandi.net (Postfix) with ESMTP id BF177172089; Sun, 9 Jun 2013 19:13:56 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay4-d.mail.gandi.net ([217.70.183.196]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id NT9OyW5jv1WZ; Sun, 9 Jun 2013 19:13:55 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id 97F37172067; Sun, 9 Jun 2013 19:13:54 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id DEBAF73A1C; Sun, 9 Jun 2013 10:13:51 -0700 (PDT) Date: Sun, 9 Jun 2013 10:13:51 -0700 From: Jeremy Chadwick To: Steven Hartland Subject: Re: Changing the default for ZFS atime to off? Message-ID: <20130609171351.GA41133@icarus.home.lan> References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <2AC5E8F4-3AF1-4EA5-975D-741506AC70A5@my.gd> <3152D35416D047BCA14009F3108A8967@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3152D35416D047BCA14009F3108A8967@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 17:14:08 -0000 On Sun, Jun 09, 2013 at 05:39:42PM +0100, Steven Hartland wrote: > > ----- Original Message ----- From: "Damien Fleuriot" > To: "Steven Hartland" > Cc: > Sent: Sunday, June 09, 2013 11:39 AM > Subject: Re: Changing the default for ZFS atime to off? > > > > > >On 8 Jun 2013, at 20:54, "Steven Hartland" wrote: > > > >>One of the first changes we make here when installing machines > >>here to changing atime=off on all ZFS pool roots. > >> > >>I know there are a few apps which can rely on atime updates > >>such as qmail and possibly postfix, but those seem like special > >>cases for which admins should enable atime instead of the other > >>way round. > >> > >>This is going to of particular interest for flash based storage > >>which should avoid unnessacary writes to reduce wear, but it will > >>also help improve performance in general. > >> > >>So what do people think is it worth considering changing the > >>default from atime=on to atime=off moving forward? > >> > >>If so what about UFS, same change? > > > >I strongly oppose the change for reasons already raised by many > >people regarding the mbox file. > > > >Besides, if atime should default to off on 2 filesystems and on > >on all others, that would definitely create confusion. > > A very valid point. > > >Last, I believe it should be the admin's decision to turn atime > >off, just like it is his decision to turn compression on. > > Trying to play devils advocate here; compression is off by default > because it uses resources and doesn't give a benefit for all cases. Not to mention ZFS on FreeBSD, specifically WRT compression and dedup, still lack a separate priority class for their threads. Info on that: http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012718.html http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012726.html While the discussion about atime default is fine/good to have, there are bigger/more impacting than atime. (Compression and dedup are something people *really* want to use, and I understand -- hell, I'd be using compression if it weren't for the above problem. It's the sole blocker for me -- really). > Is that not the same as atime, and it should be an admins decision > to turn it on where it's wanted? While I understand you're playing devil's advocate, you will find I, as well as most BSD people (in my experience), tend to err on the side of caution. That means atime=on as a default. > >Don't mistake me, we turn atime=off on every box, every > >filesystem, even on Mac's HFS. > >Yet I believe defaulting it to off is a mistake. > > That's what prompted me to start this discussion. If a large portion > of users either disable atime already or would disable atime if they > knew about it, does that bring into question the current default? You've just encountered 1 user who sets atime=off on every box they admin/maintain. And you have me -- who has atime=on on every box he admins/maintains. If you're looking for a vote, you won't get one that satisfies everyone, nor the majority of FreeBSD users -- because most users are not subscribed to the mailing list, do not visit the forums, etc.. They install the OS + use it and live happily in their hobbit hole. I also have no idea how this would impact the commercial companies who rely on FreeBSD for their enterprise products. I imagine their feedback would (should? Matter of opinion) hold more weight. > Potentially a better solution would be to make atime an option > in the installer, as that helps educate admins that the option > exists, which is potentially the biggest issue here? As I've stated in some other threads (probably on -stable), I'm all for people adding options/checkboxes/etc. to bsdinstall to allow more granularity during installation (vs. having to do things after-the-fact or the "final shell" prior to rebooting). If someone wants to add an atime checkbox (checked == atime enabled) to the filesystem creation phase, that's fantastic. But I strongly feel that checkbox needs to default to checked/enabled, solely so there are no "unwanted surprises" since we have no idea what software they'll be using on the system. There is also (still) the concern of POSIX compliance, which the BSDs have historically been very strict about. I guess you can hash that out with Bruce. Honestly the relatime thing from Linux sounds like a decent compromise. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Jun 9 20:25:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1521E7A8 for ; Sun, 9 Jun 2013 20:25:49 +0000 (UTC) (envelope-from rcartwri@asu.edu) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) by mx1.freebsd.org (Postfix) with ESMTP id A47A71089 for ; Sun, 9 Jun 2013 20:25:48 +0000 (UTC) Received: by mail-wg0-f44.google.com with SMTP id m15so3301962wgh.23 for ; Sun, 09 Jun 2013 13:25:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=Xq5I2j4UaEqkJhZFPlUFAtbAZnv5+9q6B0USzNHIsLg=; b=OLudLj6WcgGVCkZZL8yrjZE7oL0rhBZz59ByaR3lrkzFsNBcEz8DB7qvOe4XorfHLh 3KuQfwEo19SyRwPMiTk6nyRCEKRy1J8/UvyDzTPzG1SCj+6U0q5zyeNCNOKX+prrZS7D A/dCOzzgniKYpxpr/B5qje46iduomSdsNlOFskDkhFjtfjFK4QI5j6lF04m8XH2claz1 TBylVXZvKilAAnlD3RExoQwZqamRCQhty14VUTftnt6lyycHlxCtrYOkRzJVoKVpYunz iak7B7Rj7QcBr1EHVcYOcFMPYJNySyuob/NmRXkmfvil/vdG2/GNsnCIs1i/gqIUQRuf PkOQ== MIME-Version: 1.0 X-Received: by 10.194.123.9 with SMTP id lw9mr4104756wjb.24.1370809547842; Sun, 09 Jun 2013 13:25:47 -0700 (PDT) Received: by 10.180.76.114 with HTTP; Sun, 9 Jun 2013 13:25:47 -0700 (PDT) In-Reply-To: <20130609065430.GA28206@icarus.home.lan> References: <20130609065430.GA28206@icarus.home.lan> Date: Sun, 9 Jun 2013 13:25:47 -0700 Message-ID: Subject: Re: ZFS and Glabel From: "Reed A. Cartwright" To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnwqpJD6hBCDxu4D/NnKeHWRVFbbG2p3vXTz2wE4M8NzpdiaW7a8MqhhuLGZxKzKY3eYWZ6 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 20:25:49 -0000 Thanks, it makes sense now. Would it make sense to have a script that processes the output of "cam devlist -v" to produce such an example output? On Sat, Jun 8, 2013 at 11:54 PM, Jeremy Chadwick wrote: > On Sat, Jun 08, 2013 at 09:46:18PM -0700, Reed A. Cartwright wrote: >> I'm looking at my dmesg.boot to figure out what settings I need to >> wire down my HDDs. I read the cam(4) documentation but I'm not sure I >> know what I'm doing. Any advice would be helpful. >> >> Let's assume that I want to wire everything down to their current >> positions, what should I put in loader.conf? I'll paste below some of >> my hardware configuration and lines from dmesg.boot that I think I >> need to look at. >> >> I have 4 LSI cards in the system: mps0, mps1, mps2, mps3. >> >> mps0: port 0xd000-0xd0ff mem >> 0xdff3c000-0xdff3ffff,0xdff40000-0xdff7ffff irq 24 at device 0.0 on >> pci5 >> mps1: port 0xc000-0xc0ff mem >> 0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 44 at device 0.0 on >> pci4 >> mps2: port 0xb000-0xb0ff mem >> 0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 32 at device 0.0 on >> pci3 >> mps3: port 0xe000-0xe0ff mem >> 0xdbf3c000-0xdbf3ffff,0xdbf40000-0xdbf7ffff irq 56 at device 0.0 on >> pci65 >> >> I have drives attached to two of those cards: >> >> da0 at mps0 bus 0 scbus0 target 0 lun 0 >> da1 at mps0 bus 0 scbus0 target 1 lun 0 >> da2 at mps0 bus 0 scbus0 target 2 lun 0 >> da3 at mps0 bus 0 scbus0 target 3 lun 0 >> da4 at mps0 bus 0 scbus0 target 4 lun 0 >> da5 at mps0 bus 0 scbus0 target 5 lun 0 >> da6 at mps0 bus 0 scbus0 target 6 lun 0 >> da7 at mps0 bus 0 scbus0 target 7 lun 0 > >> da8 at mps3 bus 0 scbus9 target 0 lun 0 >> da9 at mps3 bus 0 scbus9 target 1 lun 0 >> da10 at mps3 bus 0 scbus9 target 2 lun 0 >> da11 at mps3 bus 0 scbus9 target 3 lun 0 >> da12 at mps3 bus 0 scbus9 target 4 lun 0 >> >> {snip} > > As usual, the situation is insane because you have so many controllers > on the system (more than just mps(4)) -- specifically 11 separate > controllers or systems using CAM (hence scbus0 to scbus10). > > Below is for mps(4). If you want to wire down ahci(4), things are > a bit different, but you can read this post of mine: > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-January/071851.html > > Enjoy: > > hint.scbus.0.at="mps0" > hint.scbus.1.at="mps1" > hint.scbus.2.at="mps2" > hint.scbus.9.at="mps3" > hint.da.0.at="scbus0" > hint.da.1.at="scbus0" > hint.da.2.at="scbus0" > hint.da.3.at="scbus0" > hint.da.4.at="scbus0" > hint.da.5.at="scbus0" > hint.da.6.at="scbus0" > hint.da.7.at="scbus0" > hint.da.8.at="scbus9" > hint.da.9.at="scbus9" > hint.da.10.at="scbus9" > hint.da.11.at="scbus9" > hint.da.12.at="scbus9" > hint.da.13.at="scbus9" > hint.da.14.at="scbus9" > hint.da.15.at="scbus9" > hint.da.16.at="scbus1" > hint.da.17.at="scbus1" > hint.da.18.at="scbus1" > hint.da.19.at="scbus1" > hint.da.20.at="scbus1" > hint.da.21.at="scbus1" > hint.da.22.at="scbus1" > hint.da.23.at="scbus1" > hint.da.24.at="scbus2" > hint.da.25.at="scbus2" > hint.da.26.at="scbus2" > hint.da.27.at="scbus2" > hint.da.28.at="scbus2" > hint.da.29.at="scbus2" > hint.da.30.at="scbus2" > hint.da.31.at="scbus2" > hint.da.0.target="0" > hint.da.1.target="1" > hint.da.2.target="2" > hint.da.3.target="3" > hint.da.4.target="4" > hint.da.5.target="5" > hint.da.6.target="6" > hint.da.7.target="7" > hint.da.8.target="0" > hint.da.9.target="1" > hint.da.10.target="2" > hint.da.11.target="3" > hint.da.12.target="4" > hint.da.13.target="5" > hint.da.14.target="6" > hint.da.15.target="7" > hint.da.16.target="0" > hint.da.17.target="1" > hint.da.18.target="2" > hint.da.19.target="3" > hint.da.20.target="4" > hint.da.21.target="5" > hint.da.22.target="6" > hint.da.23.target="7" > hint.da.24.target="0" > hint.da.25.target="1" > hint.da.26.target="2" > hint.da.27.target="3" > hint.da.28.target="4" > hint.da.29.target="5" > hint.da.30.target="6" > hint.da.31.target="7" > > -- > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | > -- Reed A. Cartwright, PhD Assistant Professor of Genomics, Evolution, and Bioinformatics School of Life Sciences Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University - Address: The Biodesign Institute, PO Box 875301, Tempe, AZ 85287-5301 USA Packages: The Biodesign Institute, 1001 S. McAllister Ave, Tempe, AZ 85287-5301 USA Office: Biodesign A-224A, 1-480-965-9949 From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 02:20:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0F5C263D for ; Mon, 10 Jun 2013 02:20:14 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id AA69F1622 for ; Mon, 10 Jun 2013 02:20:13 +0000 (UTC) Received: from mfilter14-d.gandi.net (mfilter14-d.gandi.net [217.70.178.142]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 15EC441C064; Mon, 10 Jun 2013 04:19:57 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter14-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter14-d.gandi.net (mfilter14-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id A5-+RIutWa4L; Mon, 10 Jun 2013 04:19:55 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id B5BCB41C05C; Mon, 10 Jun 2013 04:19:54 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id D00C773A1C; Sun, 9 Jun 2013 19:19:50 -0700 (PDT) Date: Sun, 9 Jun 2013 19:19:50 -0700 From: Jeremy Chadwick To: "Reed A. Cartwright" Subject: Re: ZFS and Glabel Message-ID: <20130610021950.GA50356@icarus.home.lan> References: <20130609065430.GA28206@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 02:20:14 -0000 On Sun, Jun 09, 2013 at 01:25:47PM -0700, Reed A. Cartwright wrote: > Thanks, it makes sense now. > > Would it make sense to have a script that processes the output of "cam > devlist -v" to produce such an example output? > > On Sat, Jun 8, 2013 at 11:54 PM, Jeremy Chadwick wrote: > > On Sat, Jun 08, 2013 at 09:46:18PM -0700, Reed A. Cartwright wrote: > >> I'm looking at my dmesg.boot to figure out what settings I need to > >> wire down my HDDs. I read the cam(4) documentation but I'm not sure I > >> know what I'm doing. Any advice would be helpful. > >> > >> Let's assume that I want to wire everything down to their current > >> positions, what should I put in loader.conf? I'll paste below some of > >> my hardware configuration and lines from dmesg.boot that I think I > >> need to look at. > >> > >> I have 4 LSI cards in the system: mps0, mps1, mps2, mps3. > >> > >> mps0: port 0xd000-0xd0ff mem > >> 0xdff3c000-0xdff3ffff,0xdff40000-0xdff7ffff irq 24 at device 0.0 on > >> pci5 > >> mps1: port 0xc000-0xc0ff mem > >> 0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 44 at device 0.0 on > >> pci4 > >> mps2: port 0xb000-0xb0ff mem > >> 0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 32 at device 0.0 on > >> pci3 > >> mps3: port 0xe000-0xe0ff mem > >> 0xdbf3c000-0xdbf3ffff,0xdbf40000-0xdbf7ffff irq 56 at device 0.0 on > >> pci65 > >> > >> I have drives attached to two of those cards: > >> > >> da0 at mps0 bus 0 scbus0 target 0 lun 0 > >> da1 at mps0 bus 0 scbus0 target 1 lun 0 > >> da2 at mps0 bus 0 scbus0 target 2 lun 0 > >> da3 at mps0 bus 0 scbus0 target 3 lun 0 > >> da4 at mps0 bus 0 scbus0 target 4 lun 0 > >> da5 at mps0 bus 0 scbus0 target 5 lun 0 > >> da6 at mps0 bus 0 scbus0 target 6 lun 0 > >> da7 at mps0 bus 0 scbus0 target 7 lun 0 > > > >> da8 at mps3 bus 0 scbus9 target 0 lun 0 > >> da9 at mps3 bus 0 scbus9 target 1 lun 0 > >> da10 at mps3 bus 0 scbus9 target 2 lun 0 > >> da11 at mps3 bus 0 scbus9 target 3 lun 0 > >> da12 at mps3 bus 0 scbus9 target 4 lun 0 > >> > >> {snip} > > > > As usual, the situation is insane because you have so many controllers > > on the system (more than just mps(4)) -- specifically 11 separate > > controllers or systems using CAM (hence scbus0 to scbus10). > > > > Below is for mps(4). If you want to wire down ahci(4), things are > > a bit different, but you can read this post of mine: > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-January/071851.html > > > > Enjoy: > > > > hint.scbus.0.at="mps0" > > hint.scbus.1.at="mps1" > > hint.scbus.2.at="mps2" > > hint.scbus.9.at="mps3" > > hint.da.0.at="scbus0" > > hint.da.1.at="scbus0" > > hint.da.2.at="scbus0" > > hint.da.3.at="scbus0" > > hint.da.4.at="scbus0" > > hint.da.5.at="scbus0" > > hint.da.6.at="scbus0" > > hint.da.7.at="scbus0" > > hint.da.8.at="scbus9" > > hint.da.9.at="scbus9" > > hint.da.10.at="scbus9" > > hint.da.11.at="scbus9" > > hint.da.12.at="scbus9" > > hint.da.13.at="scbus9" > > hint.da.14.at="scbus9" > > hint.da.15.at="scbus9" > > hint.da.16.at="scbus1" > > hint.da.17.at="scbus1" > > hint.da.18.at="scbus1" > > hint.da.19.at="scbus1" > > hint.da.20.at="scbus1" > > hint.da.21.at="scbus1" > > hint.da.22.at="scbus1" > > hint.da.23.at="scbus1" > > hint.da.24.at="scbus2" > > hint.da.25.at="scbus2" > > hint.da.26.at="scbus2" > > hint.da.27.at="scbus2" > > hint.da.28.at="scbus2" > > hint.da.29.at="scbus2" > > hint.da.30.at="scbus2" > > hint.da.31.at="scbus2" > > hint.da.0.target="0" > > hint.da.1.target="1" > > hint.da.2.target="2" > > hint.da.3.target="3" > > hint.da.4.target="4" > > hint.da.5.target="5" > > hint.da.6.target="6" > > hint.da.7.target="7" > > hint.da.8.target="0" > > hint.da.9.target="1" > > hint.da.10.target="2" > > hint.da.11.target="3" > > hint.da.12.target="4" > > hint.da.13.target="5" > > hint.da.14.target="6" > > hint.da.15.target="7" > > hint.da.16.target="0" > > hint.da.17.target="1" > > hint.da.18.target="2" > > hint.da.19.target="3" > > hint.da.20.target="4" > > hint.da.21.target="5" > > hint.da.22.target="6" > > hint.da.23.target="7" > > hint.da.24.target="0" > > hint.da.25.target="1" > > hint.da.26.target="2" > > hint.da.27.target="3" > > hint.da.28.target="4" > > hint.da.29.target="5" > > hint.da.30.target="6" > > hint.da.31.target="7" The script would be ugly and require one-offs per driver. For example, look at your "camcontrol devlist -v" output with regards to mps1 and mps2. There's no indication of what the bus #, target #, or lun # should be because there are no disks on the controller. I made an educated guess based off of mps0/mps3 and previous familiarity (on the lists -- I've never used one of these controllers) with mps(4). Had you shown me "camcontrol devlist -v" output with only 1 controller and 1 disk, I would have had to go purely off of what I've seen in the past. The behaviour could change as well, depending on firmware upgrades or driver changes (many of the storage drivers in FreeBSD in the past 3-4 years have gone through massive changes), or even operational mode (RAID vs. non-RAID), where the target then is always 0 but the lun # increases, or maybe the bus number, or maybe a combination. If someone really wants to take a stab at writing some script that does this, be my guest, but I definitely don't. :-) There's just too many one-offs or assumptions that have to be made which a human mind + experience can do more reliably, IMO. Because remember: the last thing you want to do is modify loader.conf for wiring down and botch it/break it. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 03:05:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6EDE9C35 for ; Mon, 10 Jun 2013 03:05:28 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 46BBA182A for ; Mon, 10 Jun 2013 03:05:27 +0000 (UTC) Received: from jre-mbp.elischer.org (ppp121-45-237-17.lns20.per1.internode.on.net [121.45.237.17]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r5A2d8Z9072116 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 9 Jun 2013 19:39:11 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51B53C4C.3030007@freebsd.org> Date: Mon, 10 Jun 2013 10:39:08 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: /tmp: change default to mdmfs and/or tmpfs? References: <20130609124603.GA35681@icarus.home.lan> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 03:05:28 -0000 On 6/9/13 9:16 PM, Dmitry Morozovsky wrote: > On Sun, 9 Jun 2013, Jeremy Chadwick wrote: > >>> what do you think about stop using precious disk or even SSD resources for >>> /tmp? >>> >>> For last several (well, maybe over 10?) years I constantly use md (swap-backed) >>> for /tmp, usually 128M in size, which is enough for most of our server needs. >>> Some require more, but none more than 512M. Regarding the options, we use >>> tmpmfs_flags="-S -n -o async -b 4096 -f 512" >> [...] I sometimes use virtual filesystems but there are cases when I am looking to store HUGE amounts of trace data in /tmp and end up cursing and remounting a real disk partition. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 03:26:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0AE27E53 for ; Mon, 10 Jun 2013 03:26:24 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id C10E818D8 for ; Mon, 10 Jun 2013 03:26:23 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.7/8.14.7) with ESMTP id r5A3QLJG075281; Sun, 9 Jun 2013 21:26:21 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.7/8.14.7/Submit) with ESMTP id r5A3QLjg075278; Sun, 9 Jun 2013 21:26:21 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Sun, 9 Jun 2013 21:26:21 -0600 (MDT) From: Warren Block To: Jeremy Chadwick Subject: Re: ZFS and Glabel In-Reply-To: <20130610021950.GA50356@icarus.home.lan> Message-ID: References: <20130609065430.GA28206@icarus.home.lan> <20130610021950.GA50356@icarus.home.lan> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (wonkity.com [127.0.0.1]); Sun, 09 Jun 2013 21:26:21 -0600 (MDT) Cc: freebsd-fs@freebsd.org, "Reed A. Cartwright" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 03:26:24 -0000 On Sun, 9 Jun 2013, Jeremy Chadwick wrote: > On Sun, Jun 09, 2013 at 01:25:47PM -0700, Reed A. Cartwright wrote: >> Thanks, it makes sense now. >> >> Would it make sense to have a script that processes the output of "cam >> devlist -v" to produce such an example output? >>> ... > > The script would be ugly and require one-offs per driver. For example, > look at your "camcontrol devlist -v" output with regards to mps1 and > mps2. There's no indication of what the bus #, target #, or lun # > should be because there are no disks on the controller. I made an > educated guess based off of mps0/mps3 and previous familiarity (on the > lists -- I've never used one of these controllers) with mps(4). > > Had you shown me "camcontrol devlist -v" output with only 1 controller > and 1 disk, I would have had to go purely off of what I've seen in the > past. This all looks remarkably complicated and fragile compared to labels. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 08:43:15 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 11B6025A; Mon, 10 Jun 2013 08:43:15 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 94D0D1393; Mon, 10 Jun 2013 08:43:14 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r5A8hC7s051794; Mon, 10 Jun 2013 12:43:12 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Mon, 10 Jun 2013 12:43:12 +0400 (MSK) From: Dmitry Morozovsky To: Julian Elischer Subject: Re: /tmp: change default to mdmfs and/or tmpfs? In-Reply-To: <51B53C4C.3030007@freebsd.org> Message-ID: References: <20130609124603.GA35681@icarus.home.lan> <51B53C4C.3030007@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Mon, 10 Jun 2013 12:43:12 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 08:43:15 -0000 On Mon, 10 Jun 2013, Julian Elischer wrote: > > > > what do you think about stop using precious disk or even SSD resources > > > > for > > > > /tmp? > > > > > > > > For last several (well, maybe over 10?) years I constantly use md > > > > (swap-backed) > > > > for /tmp, usually 128M in size, which is enough for most of our server > > > > needs. > > > > Some require more, but none more than 512M. Regarding the options, we > > > > use > > > > tmpmfs_flags="-S -n -o async -b 4096 -f 512" > > > [...] > > I sometimes use virtual filesystems but there are cases when I am looking to > store HUGE amounts of trace data in /tmp and end up cursing and remounting a > real disk partition. Hmm, don't /var/tmp exist for such a task? -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 08:44:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D8CF82F0 for ; Mon, 10 Jun 2013 08:44:39 +0000 (UTC) (envelope-from pierre@lemazurier.fr) Received: from mail.lemazurier.fr (mail.lemazurier.fr [62.147.151.66]) by mx1.freebsd.org (Postfix) with ESMTP id 5557D13A9 for ; Mon, 10 Jun 2013 08:44:39 +0000 (UTC) Received: from [172.18.8.191] (zup50-1-88-186-33-16.fbx.proxad.net [88.186.33.16]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.lemazurier.fr (Postfix) with ESMTPSA id 20DAC23E15 for ; Mon, 10 Jun 2013 10:44:31 +0200 (CEST) Message-ID: <51B59257.3070500@lemazurier.fr> Date: Mon, 10 Jun 2013 10:46:15 +0200 From: Pierre Lemazurier User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: [ZFS] Raid 10 performance issues References: <51B1EBD1.9010207@gmail.com> <51B1F726.7090402@lemazurier.fr> In-Reply-To: <51B1F726.7090402@lemazurier.fr> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 08:44:40 -0000 I add my /boot/loader.conf for more information : zfs_load="YES" vm.kmem_size="22528M" vfs.zfs.arc_min="20480M" vfs.zfs.arc_max="20480M" vfs.zfs.prefetch_disable="0" vfs.zfs.txg.timeout="5" vfs.zfs.vdev.max_pending="10" vfs.zfs.vdev.min_pending="4" vfs.zfs.write_limit_override="0" vfs.zfs.no_write_throttle="0" Le 07/06/2013 17:07, Pierre Lemazurier a écrit : > Hi, i think i suffer of write and read performance issues on my zpool. > > About my system and hardware : > > uname -a > FreeBSD bsdnas 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 > 09:23:10 UTC 2012 > root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 > > sysinfo -a : http://www.privatepaste.com/b32f34c938 > > - 24 (4gbx6) GB DDR3 ECC : > http://www.ec.kingston.com/ecom/configurator_new/partsinfo.asp?ktcpartno=KVR16R11D8/4HC > > - 14x this drive : > http://www.wdc.com/global/products/specs/?driveID=1086&language=1 > - server : > http://www.supermicro.com/products/system/1u/5017/sys-5017r-wrf.cfm?parts=show > > - CPU : > http://ark.intel.com/fr/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI > > - chassis : > http://www.supermicro.com/products/chassis/4u/847/sc847e16-rjbod1.cfm > - HBA sas connector : > http://www.lsi.com/products/storagecomponents/Pages/LSISAS9200-8e.aspx > - Cable between chassis and server : > http://www.provantage.com/supermicro-cbl-0166l~7SUPA01R.htm > > I use this command for test write speed :dd if=/dev/zero of=test.dd > bs=2M count=10000 > I use this command for test read speed :dd if=test.dd of=/dev/null bs=2M > count=10000 > > Of course no compression on zfs dataset. > > Test on one of this disk format with UFS : > > Write : > some gstat raising : http://www.privatepaste.com/dd31fafaa6 > speed around 140 mo/s and something like 1100 iops > dd result : 20971520000 bytes transferred in 146.722126 secs (142933589 > bytes/sec) > > Read : > I think I read on RAM (20971520000 bytes transferred in 8.813298 secs > (2379531480 bytes/sec)). > Then I make the test on all the drive (dd if=/dev/gpt/disk14.nop > of=/dev/null bs=2M count=10000) > some gstat raising : http://www.privatepaste.com/d022b7c480 > speed around 140 mo/s again an near 1100+ iops > dd reslut : 20971520000 bytes transferred in 142.895212 secs (146761530 > bytes/sec) > > > ZFS - I make my zpool on this way : http://www.privatepaste.com/e74d9cc3b9 > > zpool status : http://www.privatepaste.com/0276801ef6 > zpool get all : http://www.privatepaste.com/74b37a2429 > zfs get all : http://www.privatepaste.com/e56f4a33f8 > zfs-stats -a : http://www.privatepaste.com/f017890aa1 > zdb : http://www.privatepaste.com/7d723c5556 > > With this setup I hope to have near 7x more speed for write and near 14x > for > read than the UFS device alone. Then for be realistic, something like > 850 mo/s for write and 1700 mo/s for read. > > > ZFS – test : > > Write : > gstat raising : http://www.privatepaste.com/7cefb9393a > zpool iostat -v 1 of a fastest try : http://www.privatepaste.com/8ade4defbe > dd result : 20971520000 bytes transferred in 54.326509 secs (386027381 > bytes/sec) > > 386 mo/s more than twice less than I expect. > > > Read : > I export and import the pool for limit the ARC effect. I don't know how > to do better, I hope that sufficient. > gstat raising : http://www.privatepaste.com/130ce43af1 > zpool iostat -v 1 : http://privatepaste.com/eb5f9d3432 > dd result : 20971520000 bytes transferred in 30.347214 secs (691052563 > bytes/sec) > > 690 mo/s 2,5x less than I expect. > > > It's appear to not be an hardware issue, when I do a dd test of each > whole disk at the same time with the command dd if=/dev/gpt/diskX > of=/dev/null bs=1M count=10000, I have this gstat raising : > http://privatepaste.com/df9f63fd4d > > Near 130 mo/s for each device, something like I expect. > > In your opinion where the problem come from ? > > > Forgive me for my English, please keep easy language, i'm not realy easy > with English. > I can give you more information if you need. > > Many thanks for your help. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 10:03:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2AABC67F for ; Mon, 10 Jun 2013 10:03:43 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from melon.pingpong.net (melon.pingpong.net [79.136.116.200]) by mx1.freebsd.org (Postfix) with ESMTP id C1FD318B6 for ; Mon, 10 Jun 2013 10:03:42 +0000 (UTC) Received: from girgBook.local (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by melon.pingpong.net (Postfix) with ESMTPSA id 0A4B616571; Mon, 10 Jun 2013 11:54:58 +0200 (CEST) Message-ID: <51B5A277.2060904@FreeBSD.org> Date: Mon, 10 Jun 2013 11:55:03 +0200 From: Palle Girgensohn User-Agent: Postbox 3.0.8 (Macintosh/20130427) MIME-Version: 1.0 To: Kirk McKusick Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) References: <201306022101.r52L19vg033389@chez.mckusick.com> In-Reply-To: <201306022101.r52L19vg033389@chez.mckusick.com> X-Enigmail-Version: 1.2.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Dan Thomas , Jeff Roberson , Julian Akehurst X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 10:03:43 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kirk McKusick skrev: >> Date: Sun, 02 Jun 2013 22:35:23 +0200 From: Palle Girgensohn >> To: Kirk McKusick >> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) >> Cc: freebsd-fs@freebsd.org, Dan Thomas , Jeff >> Roberson , Julian Akehurst >> >> >> --On 31 maj 2013 11.25.40 -0700 Kirk McKusick >> wrote: >> >>> Your results are very enlightening. Especially the fact that you >>> have to do a forcible unmount of the filesystem. What that tells >>> me is that somehow we are getting vnodes that have phantom >>> references. That is there is some system call where we get a >>> reference on a vnode (vref, vget, or similar) that does not >>> ultimately have a corresponding drop of the reference (vrele, >>> vput, or similar). The net effect is that the file is held open >>> despite the fact that there are no longer any connections to it. >>> When you do the forcible unmount, the kernel walks the list of >>> vnodes associated with the filesystem and does a vgone on each of >>> them. That causes each to be inactivated which then triggers the >>> release of their associated disk space. The reason that the >>> unmount takes 20 seconds is to process all the releasing of the >>> space. My guess is that there is an error path in some system >>> call that is missing the vrele or vput. >>> >>> Assuming that you are able to run some more tests on your test >>> machine, the next step in narrowing down the set of code to look >>> at is to try running your system with soft updates disabled. The >>> idea is to find out whether the miss-matched references are in >>> the soft updates code or are in one of the filesystem system >>> calls themselves. To disable soft updates run the command `tunefs >>> -n disable /pgsql' on the unmounted /pgsql filesystem. If the >>> system then runs without the problem, I will know to search the >>> soft updates code. If the problem persists, then I'll know to >>> look in the system calls themselves. You may want to do some >>> preliminary tests to see how quickly the problem manifests >>> itself. You can do this by running it for a short time (10 >>> minutes say) and then checking to see if you need to do a >>> forcible unmount of the filesystem. Once you establish how long >>> you have to run before you reliably have to do a forcible >>> unmount, you will know how long to run the test with soft updates >>> turned off. If you find that running with soft updates turned off >>> makes your application run too slowly you can mount your >>> filesystem asynchronously. Note however, that you should not run >>> asynchronously if the data on the filesystem is critical as you >>> may end up with an unrecoverable filesystem after a power >>> failure or system crash. So only run asynchronously if you can >>> afford to lose your filesystem. >>> >>> Finally, it would be helpful if you could add two more commands >>> to your diskspacecheck.sh script: >>> >>> sysctl -a | egrep vnode mount -v >>> >>> The first shows the vnode usage and the second shows the >>> operational state of your filesystems. >>> >>> Kirk McKusick >> OK, I have now turned off soft updates. This is on the test server. >> It is not as busy as the production machine, but I'll keep an eye >> on it and will mail new results as soon as I see any evidence of >> either that soft updates is the culprit or that it is not. >> >> FWIW, I attach the script from this remount process as well, which >> includes >> >> sysctl -a | grep vnode ; mount -v. >> >> Note that it is all in one script file this time. >> >> Cheers, Palle > > This looks good. Keep me posted. After running for a number of days without soft updates, it seems to me that the culprit is indeed in the soft updates code. # df -k /pgsql; du -sk /pgsql Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/da2s1d 134763348 86339044 37643238 70% /pgsql 86303252 /pgsql Palle -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJRtaJ3AAoJEIhV+7FrxBJD+IkH/3FOoZ95VGE0fOWSuFIwVn8I jvHiJ6qTx0zh17pZNnc+G0UpU5fHxCazD1yT6yCwfkWebWKXELXtfQMeZUMGi0AX e94P0HJ2O4RQSMHC1rlWSLUidAB6m1ZtAtpXzgziB9P/Jonk78uFqRcTmZyMycsy pxPFHsbywsjJm9FLF4ZuhiSPX57tbAKLQM3HYDMFQ/rHPJiBlkx7VVeON6svtmMO bRZWnQTUXUAAMT1NDUEL8opGAO2S72+hFBiCjJsgS22SSq7KIMzAlJqq01L2svhH o7KNAkN6lIMuJS9B2idjJWLVXG/vNQ1QBOha0VY80fIQYSYeZt25EGlXf3rYL6Y= =Zmu2 -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 11:06:47 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CC96FFD5 for ; Mon, 10 Jun 2013 11:06:47 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id BE9321C81 for ; Mon, 10 Jun 2013 11:06:47 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5AB6lL8096938 for ; Mon, 10 Jun 2013 11:06:47 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5AB6ljf096936 for freebsd-fs@FreeBSD.org; Mon, 10 Jun 2013 11:06:47 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 10 Jun 2013 11:06:47 GMT Message-Id: <201306101106.r5AB6ljf096936@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 11:06:47 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/178999 fs [zfs] dev entries for cloned zvol don't show up until o bin/178996 fs [zfs] [patch] error in message with zfs mount -> there o kern/178854 fs [ufs] FreeBSD kernel crash in UFS o kern/178713 fs [nfs] [patch] Correct WebNFS support in NFS server and o kern/178412 fs [smbfs] Coredump when smbfs mounted o kern/178388 fs [zfs] [patch] allow up to 8MB recordsize o kern/178349 fs [zfs] zfs scrub on deduped data could be much less see o kern/178329 fs [zfs] extended attributes leak o kern/178238 fs [nullfs] nullfs don't release i-nodes on unlink. f kern/178231 fs [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr o kern/178103 fs [kernel] [nfs] [patch] Correct support of index files o kern/177985 fs [zfs] disk usage problem when copying from one zfs dat o kern/177971 fs [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3, o kern/177966 fs [zfs] resilver completes but subsequent scrub reports o kern/177658 fs [ufs] FreeBSD panics after get full filesystem with uf o kern/177536 fs [zfs] zfs livelock (deadlock) with high write-to-disk o kern/177445 fs [hast] HAST panic f kern/177335 fs [nfs] [panic] Sleeping on "vmopar" with the following o kern/177240 fs [zfs] zpool import failed with state UNAVAIL but all d o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/174060 fs [ext2fs] Ext2FS system crashes (buffer overflow?) o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172942 fs [smbfs] Unmounting a smb mount when the server became o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic f kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 320 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 11:12:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B032CC89 for ; Mon, 10 Jun 2013 11:12:57 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net [217.70.183.196]) by mx1.freebsd.org (Postfix) with ESMTP id 4FC3A1E9D for ; Mon, 10 Jun 2013 11:12:57 +0000 (UTC) Received: from mfilter24-d.gandi.net (mfilter24-d.gandi.net [217.70.178.152]) by relay4-d.mail.gandi.net (Postfix) with ESMTP id 55159172094; Mon, 10 Jun 2013 13:12:40 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter24-d.gandi.net Received: from relay4-d.mail.gandi.net ([217.70.183.196]) by mfilter24-d.gandi.net (mfilter24-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id E5B2Aw3VxnIO; Mon, 10 Jun 2013 13:12:38 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id BCABB172092; Mon, 10 Jun 2013 13:12:37 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id C9C8873A1C; Mon, 10 Jun 2013 04:12:35 -0700 (PDT) Date: Mon, 10 Jun 2013 04:12:35 -0700 From: Jeremy Chadwick To: Pierre Lemazurier Subject: Re: [ZFS] Raid 10 performance issues Message-ID: <20130610111235.GB61858@icarus.home.lan> References: <51B1EBD1.9010207@gmail.com> <51B1F726.7090402@lemazurier.fr> <51B59257.3070500@lemazurier.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline In-Reply-To: <51B59257.3070500@lemazurier.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 11:12:57 -0000 On Mon, Jun 10, 2013 at 10:46:15AM +0200, Pierre Lemazurier wrote: > I add my /boot/loader.conf for more information : >=20 > zfs_load=3D"YES" > vm.kmem_size=3D"22528M" > vfs.zfs.arc_min=3D"20480M" > vfs.zfs.arc_max=3D"20480M" > vfs.zfs.prefetch_disable=3D"0" > vfs.zfs.txg.timeout=3D"5" > vfs.zfs.vdev.max_pending=3D"10" > vfs.zfs.vdev.min_pending=3D"4" > vfs.zfs.write_limit_override=3D"0" > vfs.zfs.no_write_throttle=3D"0" Please remove these variables: vm.kmem_size=3D"22528M" vfs.zfs.arc_min=3D"20480M" You do not need to set vm.kmem_size any longer (that was addressed long ago, during the mid-days of stable/8), and you should let the ARC shrink if need be (my concern here is that possibly limiting the lower end of the ARC size may be triggering some other portions of FreeBSD's VM or ZFS to behave oddly. No proof/evidence, just guesswork on my part). At bare minimum, *definitely* remove the vm.kmem_size setting. Next, please remove the following variables, as these serve no purpose (they are the defaults in 9.1-RELEASE): vfs.zfs.prefetch_disable=3D"0" vfs.zfs.txg.timeout=3D"5" vfs.zfs.vdev.max_pending=3D"10" vfs.zfs.vdev.min_pending=3D"4" vfs.zfs.write_limit_override=3D"0" vfs.zfs.no_write_throttle=3D"0" So in short all you should have in your loader.conf is: zfs_load=3D"yes" vfs.zfs.arc_max=3D"20480M" > Le 07/06/2013 17:07, Pierre Lemazurier a =E9crit : > >Hi, i think i suffer of write and read performance issues on my zpool. > > > >About my system and hardware : > > > >uname -a > >FreeBSD bsdnas 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 > >09:23:10 UTC 2012 > >root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 > > > >sysinfo -a : http://www.privatepaste.com/b32f34c938 Going forward, I would recommend also providing "dmesg". It is a lot easier to read to most of us. All I can work out is that your storage controller is mps(4), except I can't see any of the important details about it. dmesg would give that, not this weird "sysinfo" thing. I would also like to request "pciconf -lvbc" output. > >- 24 (4gbx6) GB DDR3 ECC : > >http://www.ec.kingston.com/ecom/configurator_new/partsinfo.asp?ktcpart= no=3DKVR16R11D8/4HC > > > >- 14x this drive : > >http://www.wdc.com/global/products/specs/?driveID=3D1086&language=3D1 Worth pointing out for readers: These are 4096-byte sector 2TB WD Red drives. > >- server : > >http://www.supermicro.com/products/system/1u/5017/sys-5017r-wrf.cfm?pa= rts=3Dshow > > > >- CPU : > >http://ark.intel.com/fr/products/64594/Intel-Xeon-Processor-E5-2620-15= M-Cache-2_00-GHz-7_20-GTs-Intel-QPI > > > >- chassis : > >http://www.supermicro.com/products/chassis/4u/847/sc847e16-rjbod1.cfm > >- HBA sas connector : > >http://www.lsi.com/products/storagecomponents/Pages/LSISAS9200-8e.aspx > >- Cable between chassis and server : > >http://www.provantage.com/supermicro-cbl-0166l~7SUPA01R.htm > > > >I use this command for test write speed :dd if=3D/dev/zero of=3Dtest.d= d > >bs=3D2M count=3D10000 > >I use this command for test read speed :dd if=3Dtest.dd of=3D/dev/null= bs=3D2M > >count=3D10000 > > > >Of course no compression on zfs dataset. > > > >Test on one of this disk format with UFS : > > > >Write : > >some gstat raising : http://www.privatepaste.com/dd31fafaa6 > >speed around 140 mo/s and something like 1100 iops > >dd result : 20971520000 bytes transferred in 146.722126 secs (14293358= 9 > >bytes/sec) > > > >Read : > >I think I read on RAM (20971520000 bytes transferred in 8.813298 secs > >(2379531480 bytes/sec)). > >Then I make the test on all the drive (dd if=3D/dev/gpt/disk14.nop > >of=3D/dev/null bs=3D2M count=3D10000) > >some gstat raising : http://www.privatepaste.com/d022b7c480 > >speed around 140 mo/s again an near 1100+ iops > >dd reslut : 20971520000 bytes transferred in 142.895212 secs (14676153= 0 > >bytes/sec) Looks about right for a single WD Red 2TB drive. Important: THIS IS A SINGLE DRIVE. > >ZFS - I make my zpool on this way : http://www.privatepaste.com/e74d9c= c3b9 Looks good to me. This is effectively RAID-10 as you said (a stripe of mirrors). > >zpool status : http://www.privatepaste.com/0276801ef6 > >zpool get all : http://www.privatepaste.com/74b37a2429 > >zfs get all : http://www.privatepaste.com/e56f4a33f8 > >zfs-stats -a : http://www.privatepaste.com/f017890aa1 > >zdb : http://www.privatepaste.com/7d723c5556 > > > >With this setup I hope to have near 7x more speed for write and near 1= 4x > >for > >read than the UFS device alone. Then for be realistic, something like > >850 mo/s for write and 1700 mo/s for read. Your hopes may be shattered by the reality of how controllers behave and operate (performance-wise) as well as many other things, including some ZFS tunables. We shall see. > >ZFS =96 test : > > > >Write : > >gstat raising : http://www.privatepaste.com/7cefb9393a > >zpool iostat -v 1 of a fastest try : http://www.privatepaste.com/8ade4= defbe > >dd result : 20971520000 bytes transferred in 54.326509 secs (386027381 > >bytes/sec) > > > >386 mo/s more than twice less than I expect. One thing to be aware of: while the dd took 54 seconds, the I/O to the pool probably continued for long after that. Your average speed to each disk at that time was (just estimating it here) ~55MBytes/second. I would assume what you're seeing above is probably the speed between /dev/zero and the ZFS ARC, with (of course) the controller and driver in the way. We know that your disks can do about 110-140MBytes/second each, so the performance hit has got to be in one of the following places: 1. ZFS itself, 2. Controller, controller driver (mps(4)), or controller firmware, 3. On-die MCH (memory controller) 4. PCIe bus speed limitations or other whatnots. The place to start is with #1, ZFS. See the bottom of my mail for advice. > >Read : > >I export and import the pool for limit the ARC effect. I don't know ho= w > >to do better, I hope that sufficient. You could have checked using "top -b" (before and after export); look for the "ARC" line. I tend to just reboot the system, but export should result in a full pending I/O flush (from ARC, etc.) to all the devices. I would do this and wait about 15 seconds + check with gstat before doing more performance tests. > >gstat raising : http://www.privatepaste.com/130ce43af1 > >zpool iostat -v 1 : http://privatepaste.com/eb5f9d3432 > >dd result : 20971520000 bytes transferred in 30.347214 secs (691052563 > >bytes/sec) > >690 mo/s 2,5x less than I expect. > > > > > >It's appear to not be an hardware issue, when I do a dd test of each > >whole disk at the same time with the command dd if=3D/dev/gpt/diskX > >of=3D/dev/null bs=3D1M count=3D10000, I have this gstat raising : > >http://privatepaste.com/df9f63fd4d > > > >Near 130 mo/s for each device, something like I expect. You're thinking of hardware in too simply a fashion -- if only it were that simple. > >In your opinion where the problem come from ? Not enough information at this time to narrow down where the issue is. Things to try: 1. Start with the initial loader.conf modifications I stated. The vm.kmem_size removal may help. 2. Possibly trying vfs.zfs.no_write_throttle=3D"1" in loader.conf + rebooting + re-doing this test. What that tunable does: https://blogs.oracle.com/roch/entry/the_new_zfs_write_throttle You can also Google "vfs.zfs.no_write_throttle" and see that it's been discussed quite a bit, including some folks saying performance tremendously increases when they set this to 1. =20 3. Given the massive size of your disk array and how much memory you have, you may also want to consider adjusting some of these (possibly increasing vfs.zfs.txg.timeout to make I/O flushing to your disks happen *less* often; I haven't tinkered with the other two): vfs.zfs.txg.timeout=3D"5" vfs.zfs.vdev.max_pending=3D"10" vfs.zfs.vdev.min_pending=3D"4" These also come to mind (these are the defaults): vfs.zfs.write_limit_max=3D"1069071872" vfs.zfs.write_limit_min=3D"33554432" sysctl -d will give you descriptions of these. I have never had to tune any of these, however, but that's also because the pools I've built have consisted of much smaller numbers of disks (3 or 4 at most). I am also used to ahci(4) and have avoided all other controllers for a multitude of reasons (not saying that's the cause of your problem here, just saying that's the stance I've chosen to take). You might also try limiting your ARC maximum (vfs.zfs.arc_max) to something smaller -- say, 8GBytes. See if that has an effect. 4. "sysctl -a | grep zfs" is a very useful piece of information that you should do along with "gstat" and "zpool iostat -v". The counters and information shown there are very, very helpful a lot of the time. There are particular ones that indicate certain performance-hindering scenarios. 5. Your "UFS tests" only tested a single disk, while your ZFS tests tested 14 disks in a RAID-10-like fashion. You could try reproducing the RAID-10 setup using gvinum(8) and use UFS and see what sort of performance you get there. 6. Try re-doing the tests but with less drives involved -- say, 6 instead of 14. See if the throughput to each drive is increased compared to with 14 drives. In general, "profiling" ZFS like this is tricky and requires folks who are very much in-the-know and know how to go about accomplishing this task. Others more familiar with how to do this may need to step up to the plate, but no support/response is guaranteed (if you need that, try Solaris). --=20 | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 13:13:53 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id ABC907CE; Mon, 10 Jun 2013 13:13:53 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 22B791758; Mon, 10 Jun 2013 13:13:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r5ADDi5a072378; Mon, 10 Jun 2013 17:13:44 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Mon, 10 Jun 2013 17:13:44 +0400 (MSK) From: Dmitry Morozovsky To: freebsd-fs@FreeBSD.org Subject: hast: can't restore after disk failure Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Mon, 10 Jun 2013 17:13:45 +0400 (MSK) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 13:13:53 -0000 Dear colleagues, stable/9 FreeBSD cthulhu3 9.1-STABLE-NEWCARP FreeBSD 9.1-STABLE-NEWCARP #6 r251443M: Thu Jun 6 02:54:36 MSK 2013 ada1 failed and has been replaced. gpart created. But I can't insert new disk into hast root@cthulhu3:/# hastctl status Name Status Role Components d0 complete secondary /dev/ada0p1 cthulhu4 d1 - init /dev/ada1p1 cthulhu4 d2 complete secondary /dev/ada2p1 cthulhu4 d3 complete secondary /dev/ada3p1 cthulhu4 zil3 complete secondary /dev/ada4p1 cthulhu4 zil4 complete secondary /dev/ada4p2 cthulhu4 root@cthulhu3:/# hastctl role secondary d1 root@cthulhu3:/# hastctl list d1 d1: role: secondary provname: d1 localpath: /dev/ada1p1 extentsize: 0 (0B) keepdirty: 0 remoteaddr: cthulhu4 replication: memsync dirty: 0 (0B) statistics: reads: 0 writes: 0 deletes: 0 flushes: 0 activemap updates: 0 local errors: read: 0, write: 0, delete: 0, flush: 0 root@cthulhu3:/# tail -2 /var/log/console.log Jun 10 16:56:06 cthulhu3 kernel: Jun 10 16:56:06 cthulhu3 hastd[14379]: [d1] (secondary) Unable to read metadata from /dev/ada1p1: No such file or directory. Jun 10 16:56:11 cthulhu3 kernel: Jun 10 16:56:11 cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully (pid=14379, exitcode=66). Jun 10 16:56:16 cthulhu3 kernel: Jun 10 16:56:16 cthulhu3 hastd[14380]: [d1] (secondary) Unable to read metadata from /dev/ada1p1: No such file or directory. Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully (pid=14380, exitcode=66). Any hints? Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 17:29:04 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 15103B89; Mon, 10 Jun 2013 17:29:04 +0000 (UTC) (envelope-from br@mail.bsdpad.com) Received: from mail.bsdpad.com (mail.bsdpad.com [109.107.176.56]) by mx1.freebsd.org (Postfix) with ESMTP id C34021762; Mon, 10 Jun 2013 17:29:03 +0000 (UTC) Received: from mail.bsdpad.com ([109.107.176.56]) by mail.bsdpad.com with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1Um5Gw-0002dy-Rv; Mon, 10 Jun 2013 16:48:46 +0000 Received: by mail.bsdpad.com (nbSMTP-1.00) for uid 1001 br@mail.bsdpad.com; Mon, 10 Jun 2013 16:48:46 +0000 (UTC) Date: Mon, 10 Jun 2013 16:48:46 +0000 From: Ruslan Bukin To: Steve Wills Subject: Re: dev entries for cloned zvol don't show up until after reboot Message-ID: <20130610164846.GA10127@mail.bsdpad.com> References: <8ea8b9c8074fd122f78c5eaa3b289805.squirrel@mouf.net> <51A2B533.8030504@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <51A2B533.8030504@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 17:29:04 -0000 On Mon, May 27, 2013 at 01:21:55AM +0000, Steve Wills wrote: > On 05/24/13 17:56, Steve Wills wrote: > > Hi, > > > > I've noticed that if I make zvol, create a snapshot of it, then clone > > that, the /dev/zvol/* entries for it don't show up until after I reboot. > > This is on r250925. Is this a known bug? > > To add a bit more detail to this, the steps are: > > zfs create -V 1G pool/somevol > ls /dev/zvol/pool # witness somevol entries > zfs create pool/somevol@somesnap > ls /dev/zvol/pool # witness no new entries > zfs clone pool/somvol@somesnap pool/anothervol > ls /dev/zvol/pool # again witness no new entries > reboot > ls /dev/zvol/pool # witness missing entries appearing > > I'll go ahead and submit a PR too in case that helps. this patch for 9.1-stable works for me --- zfs_ioctl.c 2013-06-09 23:54:22.386708932 +0400 +++ zfs_ioctl.c 2013-06-10 00:21:58.161708460 +0400 @@ -3299,6 +3299,12 @@ if (error != 0) (void) dsl_destroy_head(fsname); } + +#ifdef __FreeBSD__ + if (error == 0) + zvol_create_minors(fsname); +#endif + return (error); } -Ruslan From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 20:16:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2C4AC655 for ; Mon, 10 Jun 2013 20:16:57 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bk0-x22f.google.com (mail-bk0-x22f.google.com [IPv6:2a00:1450:4008:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id B44701F17 for ; Mon, 10 Jun 2013 20:16:56 +0000 (UTC) Received: by mail-bk0-f47.google.com with SMTP id jg1so3326684bkc.20 for ; Mon, 10 Jun 2013 13:16:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=tLSLZFEJhnbdtPtOrDcAzRLa7J4sWOPNEVVfvgA0L0U=; b=WkgLgEmsrV+AhrpNwba245wtGAgcZe6dxgIPnrz10HbX36JyhorvD0SaMDDdBb5B7J t7q0Hs+omFM6eIhtR5d0eq6fpfz2lJFLF++w4FfmAHziwWNDlUSe4L8w5XRhhVEopMoB nYpB3QggbyreKsoDsnMb8dr6qbW3a8GOsy9cba1Wsx/JowgPxGf1zolyvsuXMSBzm72c 5fEWRaqDz9cuS7yqpWM71lyb+LV/ME1Qt71ji2Ds/v6HjuFrNiol3I5YMd+u+ORazwSx zSCE6DJD1eVpw4RtMFKoWw/c7EMKwkK2H9MslAmIQYG9aYa8tKdv6O5eXg0yc1DVkWhW l+Zw== X-Received: by 10.204.71.77 with SMTP id g13mr1767464bkj.50.1370895415702; Mon, 10 Jun 2013 13:16:55 -0700 (PDT) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id og1sm4474296bkb.16.2013.06.10.13.16.54 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 10 Jun 2013 13:16:55 -0700 (PDT) Sender: Mikolaj Golub Date: Mon, 10 Jun 2013 23:16:51 +0300 From: Mikolaj Golub To: Dmitry Morozovsky Subject: Re: hast: can't restore after disk failure Message-ID: <20130610201650.GA2823@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 20:16:57 -0000 On Mon, Jun 10, 2013 at 05:13:44PM +0400, Dmitry Morozovsky wrote: > Dear colleagues, > > > stable/9 > > FreeBSD cthulhu3 9.1-STABLE-NEWCARP FreeBSD 9.1-STABLE-NEWCARP #6 > r251443M: Thu Jun 6 02:54:36 MSK 2013 > > ada1 failed and has been replaced. gpart created. But I can't insert new disk > into hast > > root@cthulhu3:/# hastctl status > Name Status Role Components > d0 complete secondary /dev/ada0p1 cthulhu4 > d1 - init /dev/ada1p1 cthulhu4 > d2 complete secondary /dev/ada2p1 cthulhu4 > d3 complete secondary /dev/ada3p1 cthulhu4 > zil3 complete secondary /dev/ada4p1 cthulhu4 > zil4 complete secondary /dev/ada4p2 cthulhu4 > > root@cthulhu3:/# hastctl role secondary d1 > root@cthulhu3:/# hastctl list d1 > d1: > role: secondary > provname: d1 > localpath: /dev/ada1p1 > extentsize: 0 (0B) > keepdirty: 0 > remoteaddr: cthulhu4 > replication: memsync > dirty: 0 (0B) > statistics: > reads: 0 > writes: 0 > deletes: 0 > flushes: 0 > activemap updates: 0 > local errors: read: 0, write: 0, delete: 0, flush: 0 > root@cthulhu3:/# tail -2 /var/log/console.log > Jun 10 16:56:06 cthulhu3 kernel: Jun 10 16:56:06 > cthulhu3 hastd[14379]: [d1] (secondary) Unable to read metadata from > /dev/ada1p1: No such file or directory. > Jun 10 16:56:11 cthulhu3 kernel: Jun 10 16:56:11 > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > (pid=14379, exitcode=66). > Jun 10 16:56:16 cthulhu3 kernel: Jun 10 16:56:16 > cthulhu3 hastd[14380]: [d1] (secondary) Unable to read metadata from > /dev/ada1p1: No such file or directory. > Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > (pid=14380, exitcode=66). > > Any hints? Thanks! Have you run hastctl create to initialize metadata? -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 20:40:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BE3E4E39; Mon, 10 Jun 2013 20:40:10 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 51336106F; Mon, 10 Jun 2013 20:40:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r5AKe82o096707; Tue, 11 Jun 2013 00:40:08 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 11 Jun 2013 00:40:08 +0400 (MSK) From: Dmitry Morozovsky To: Mikolaj Golub Subject: Re: hast: can't restore after disk failure In-Reply-To: <20130610201650.GA2823@gmail.com> Message-ID: References: <20130610201650.GA2823@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Tue, 11 Jun 2013 00:40:09 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 20:40:10 -0000 On Mon, 10 Jun 2013, Mikolaj Golub wrote: [snipall] > > Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 > > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > > (pid=14380, exitcode=66). > > > > Any hints? Thanks! > > Have you run hastctl create to initialize metadata? Yes, but did it naively: hastctl create d1 and status still reported 0 as provider size... Sould I provide more options to hastctl create? -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 23:54:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AFD799D2 for ; Mon, 10 Jun 2013 23:54:50 +0000 (UTC) (envelope-from editor@callfortesting.org) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) by mx1.freebsd.org (Postfix) with ESMTP id 8AB0C1B1D for ; Mon, 10 Jun 2013 23:54:50 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id bi5so4899033pad.18 for ; Mon, 10 Jun 2013 16:54:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=0qtVKFxCgNYZUWlehrJAUMJ77lh6rAr2C72VacTmOdU=; b=QSNW7Xv6eee1WR/uJFB91s/+vaG10+RaZqPWnREv8+tKWbxU4S+o54mWvLJhLbmBlQ ts1qZipJ/EhXP7RgCj5bovBrmtMR02vJu98tL8ZsVShWwRb9skyUoZqUxO0skZx+cDLs nBN9ha/qN1ipvMdg3kYCWz53UuPoIrd693QCFPQMhdoWcyGLpD0HB46TdtoiJR8Nz6+1 8uSgYxSy5CT5ceTmnKKkXoPT4fA8vv0ILXLR3Vmmz7PmTwueSBYdCkiFV6tqr26u+uHs hhLvp2PMt68oV2lWOxQ+FWhLbpbyE6hEKnrM4nGzt4jc76Ov8hoSkJ/mcwFrf4r17AC/ VVXQ== X-Received: by 10.66.166.107 with SMTP id zf11mr16207396pab.166.1370908490151; Mon, 10 Jun 2013 16:54:50 -0700 (PDT) Received: from MacBook-4.local (c-98-246-202-204.hsd1.or.comcast.net. [98.246.202.204]) by mx.google.com with ESMTPSA id pl9sm12038436pbc.5.2013.06.10.16.54.48 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Jun 2013 16:54:49 -0700 (PDT) Message-ID: <51B66748.4000708@callfortesting.org> Date: Mon, 10 Jun 2013 16:54:48 -0700 From: Michael Dexter User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS panic on import under VMware References: <51B0FADB.10302@callfortesting.org> <20130606215224.GA44910@icarus.home.lan> In-Reply-To: <20130606215224.GA44910@icarus.home.lan> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQklBDOJAUPhPZwOVFAS0oau/spzpG9wYKl2h4DScrSRU3DAUq4vWQrRlOapZykpJf7el5ey Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 23:54:50 -0000 Requested details included inline: >> I have encountered a FreeNAS under VMware system that gives "ONLINE" >> status for a pool with 'zfs import' but panics when a -f import is >> done under FreeNAS 8.3 x64, FreeBSD 10 and Solaris 11 live DVD. The >> host filesystem passes all checks. >> Stopped at traverse_prefetch_metadata+0x44: movq 0x50(%rax),%rcx >> I have posted screen shots of the import, panic and backtrace output: >> http://cft.lv/zfs/2013-07-06/ > > 1. On what OS (version, etc.) was the ZFS pool originally created? FreeNAS 8.2 > 2. Was the pool originally created with compression or dedup enabled? > (Answers to both of these questions is extremely important) Dedup: never enabled Compression: Yes (Believe FreeNAS Default, guessing lzjb) > 3. How much memory are you allocating to the VMware instance? (This is > in partial relation to question #2) 6GB, up to 10GB when attempting to re-import > 4. On what OS (version, etc.) are the panic/backtrace screenshots from? > It looks to me like FreeBSD 10.x. Correct. amd64. Panics with Solaris 11.1 & FreeNAS 8.3. No saved traces > 5. Is there a reason you didn't try FreeBSD 9.1-RELEASE? The state of > FreeBSD 10.x (head/CURRENT) is usually in fluctuation, you should try > something other than head. Will try 9.1R > 6. You're using VMware Workstation; where did the source ZFS pool come > from? Do you have physical disks attached to the machine and are using > the "Use a physical disk" feature? If you're using "disk images" made > by something, what did you use? Please provide all the details, how you > did it, etc... VMware ESXi 5.1 with no storage. All from HUS110 iSCSI, mounted as VMFS5 Datastores. The failing disks are vmdk files in virtual disk mode which were created when installing FreeNAS. > 7. Is there some reason you cannot try this on bare metal? VMware does not appear to fully support ZFS pass-through. Not easy to convert two, 2TB images. Suggestions? > 8. On FreeBSD 9.x (see above) or 10.x, during boot, drop to the loader > prompt and issue "set vfs.zfs.prefetch_disable=1" followed by "boot". > See if that has any impact during the "zpool import" phase. Results in Fatal trap 12 on FreeBSD 10 and will try 9.X ASAP. Thanks! Michael From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 00:17:19 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EA261FB0; Tue, 11 Jun 2013 00:17:19 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id C80D91C30; Tue, 11 Jun 2013 00:17:19 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r5B0HFct074482; Mon, 10 Jun 2013 17:17:15 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201306110017.r5B0HFct074482@chez.mckusick.com> To: Palle Girgensohn Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) In-reply-to: <51B5A277.2060904@FreeBSD.org> Date: Mon, 10 Jun 2013 17:17:15 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@FreeBSD.org, Dan Thomas , Jeff Roberson , Julian Akehurst X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 00:17:20 -0000 > Date: Mon, 10 Jun 2013 11:55:03 +0200 > From: Palle Girgensohn > To: Kirk McKusick > CC: freebsd-fs@FreeBSD.org, Dan Thomas , > Jeff Roberson , > Julian Akehurst > Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) > > Kirk McKusick skrev: > > > This looks good. Keep me posted. > > After running for a number of days without soft updates, it seems to me > that the culprit is indeed in the soft updates code. > > # df -k /pgsql; du -sk /pgsql > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/da2s1d 134763348 86339044 37643238 70% /pgsql > 86303252 /pgsql > > Palle OK, good to have it narrowed down. I will look to devise some additional diagnostics that hopefully will help tease out the bug. I'll hopefully get back to you soon. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 06:07:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 062E37E9 for ; Tue, 11 Jun 2013 06:07:49 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-lb0-f179.google.com (mail-lb0-f179.google.com [209.85.217.179]) by mx1.freebsd.org (Postfix) with ESMTP id 8413E1D27 for ; Tue, 11 Jun 2013 06:07:47 +0000 (UTC) Received: by mail-lb0-f179.google.com with SMTP id w20so7081620lbh.38 for ; Mon, 10 Jun 2013 23:07:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9U8eaVVJ0OEtDWh5BaSk5gro1MqxiqZ1uz2le/qMtj0=; b=mJoAIhwtFp88CayPgaE8dbW5yQfYBv9UdAucpJvJdQ1Kppq+YPQYKCoVnyHV2G9jOG wJ8w98/+oxgAoUBzLJfBubV9X979TEHuJEH2EAZrsaQPVGQMvcT3WMzPMkd5qFM9o1mW NPsxJj1qs4Rn1d6KOvLAdMxlYvZ23kjAIeA89SUWPL7tpn7RqSPKD2+i29An8XYFGQou b3vcLRtJRP/OjlxvtGTjZ5ZM25Rs17XEUcMiwcAz2ETzLajshEwsQqH8dkhCfzzFzqHx sxIg2s1JkGtjc66b4yELtiYQS8cFRVh+8HuTpA09gi8F6DsRHvMCXF+b/09FjauOiefh AVcw== X-Received: by 10.152.8.37 with SMTP id o5mr6452312laa.87.1370930866709; Mon, 10 Jun 2013 23:07:46 -0700 (PDT) Received: from localhost ([188.230.122.226]) by mx.google.com with ESMTPSA id z9sm5601702lae.7.2013.06.10.23.07.44 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 10 Jun 2013 23:07:45 -0700 (PDT) Date: Tue, 11 Jun 2013 09:07:42 +0300 From: Mikolaj Golub To: Dmitry Morozovsky Subject: Re: hast: can't restore after disk failure Message-ID: <20130611060741.GA42231@gmail.com> References: <20130610201650.GA2823@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 06:07:49 -0000 On Tue, Jun 11, 2013 at 12:40:08AM +0400, Dmitry Morozovsky wrote: > On Mon, 10 Jun 2013, Mikolaj Golub wrote: > > [snipall] > > > > Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 > > > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > > > (pid=14380, exitcode=66). > > > > > > Any hints? Thanks! > > > > Have you run hastctl create to initialize metadata? > > Yes, but did it naively: > > hastctl create d1 No errors? > > and status still reported 0 as provider size... I assume /dev/ada1p1 is present and readable/writable? Symptoms are like if it did not exist. > Sould I provide more options to hastctl create? Usually no, until the disk is of larger size than the replaced one, and you need manually specify the old mediasize. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 06:34:51 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4443FD81 for ; Tue, 11 Jun 2013 06:34:51 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id DBDA61E0E for ; Tue, 11 Jun 2013 06:34:50 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r5B6Ykds004347; Tue, 11 Jun 2013 09:34:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r5B6Ykds004347 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r5B6Yktc004346; Tue, 11 Jun 2013 09:34:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 11 Jun 2013 09:34:46 +0300 From: Konstantin Belousov To: Bruce Evans Subject: Re: missed clustering for small block sizes in cluster_wbuild() Message-ID: <20130611063446.GJ3047@kib.kiev.ua> References: <20130607044845.O24441@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hCS/BWoPfTdmYtZi" Content-Disposition: inline In-Reply-To: <20130607044845.O24441@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 06:34:51 -0000 --hCS/BWoPfTdmYtZi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 07, 2013 at 05:28:11AM +1000, Bruce Evans wrote: > I think this is best fixed be fixed by removing the check above and > checking here. Then back out of the changes. I don't know this code > well enough to write the backing out easily. Could you test this, please ? diff --git a/sys/kern/vfs_cluster.c b/sys/kern/vfs_cluster.c index b280317..9e1528e 100644 --- a/sys/kern/vfs_cluster.c +++ b/sys/kern/vfs_cluster.c @@ -766,7 +766,7 @@ cluster_wbuild(struct vnode *vp, long size, daddr_t sta= rt_lbn, int len, { struct buf *bp, *tbp; struct bufobj *bo; - int i, j; + int i, j, jj; int totalwritten =3D 0; int dbsize =3D btodb(size); =20 @@ -904,14 +904,10 @@ cluster_wbuild(struct vnode *vp, long size, daddr_t s= tart_lbn, int len, =20 /* * Check that the combined cluster - * would make sense with regard to pages - * and would not be too large + * would make sense with regard to pages. */ - if ((tbp->b_bcount !=3D size) || - ((bp->b_blkno + (dbsize * i)) !=3D - tbp->b_blkno) || - ((tbp->b_npages + bp->b_npages) > - (vp->v_mount->mnt_iosize_max / PAGE_SIZE))) { + if (tbp->b_bcount !=3D size || + bp->b_blkno + dbsize * i !=3D tbp->b_blkno) { BUF_UNLOCK(tbp); break; } @@ -964,6 +960,22 @@ cluster_wbuild(struct vnode *vp, long size, daddr_t st= art_lbn, int len, bp->b_pages[bp->b_npages] =3D m; bp->b_npages++; } + if (bp->b_npages > vp->v_mount-> + mnt_iosize_max / PAGE_SIZE) { + KASSERT(i !=3D 0, ("XXX")); + j++; + for (jj =3D 0; jj < j; jj++) { + vm_page_io_finish(tbp-> + b_pages[jj]); + } + vm_object_pip_subtract( + m->object, j); + bqrelse(tbp); + bp->b_npages -=3D j; + VM_OBJECT_WUNLOCK(tbp-> + b_bufobj->bo_object); + goto finishcluster; + } } VM_OBJECT_WUNLOCK(tbp->b_bufobj->bo_object); } --hCS/BWoPfTdmYtZi Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRtsUGAAoJEJDCuSvBvK1B8FUP/3J+JkxgJ7jdQvkOyp21fibx b/iiN19fnd3Ih1sDtLvKXFKDguf17vOxoECqnlhhRjrI8mJMsghqMjKJ11CUsZGq 9LrkXWCLiwtecuP7Rupu8hczAj+Msf1HGwZMtNVwDRAuhL9fE9WX/EiWXFV5D+Z+ 04SniVO6Fu+v9ZlPVjCaGhJMDsuMrtsphdiDpRjivgqWN85dvrGur2I8hYm6PTD4 1qBwxv3j5IR3dqRBFkZc+jrYjpRjA5UIAtmJc+3iJlZvL4963od1m48x0L+Wdtit +mfBDEaXN8gCAbtbN3QW1s+9WUKcqFcucYsECcb9wEjNi5aKjb0FLDPqEJWRoBVS dqHwXc8bI3KM3+fVoTIhTJDgmkTaCZEvTSCkmGgS1e5B7f4H5My+X7jzaonf7pl3 Jeoab2J6dZ6mwu1xh9Kk+nN80NwGiujx5hW9NgBC7MD7xxvQL5JGmwz/HU1LGZfO RL4Yi1g0dxobOWAfusK+rZDTEhzKts0vvrRgxk0O+LmEibx0WPcg/nkv8t60wE6U 4bBb7pxXudqTmmGuswKkGrNmP0F7HJFjn7kjkkbacnJRLPA9sOHh+MYW7V9dDIw5 T2Uk12Mm4aKX/hjHEWl6x87PIx384LyW9GiCkWcyFHqwH41SqYz7CgVUuNMqlv/P f6YwIcq9W9OocqF26iZ0 =DAlx -----END PGP SIGNATURE----- --hCS/BWoPfTdmYtZi-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 08:46:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0DF31F70 for ; Tue, 11 Jun 2013 08:46:28 +0000 (UTC) (envelope-from mxb@alumni.chalmers.se) Received: from mail-la0-x22b.google.com (mail-la0-x22b.google.com [IPv6:2a00:1450:4010:c03::22b]) by mx1.freebsd.org (Postfix) with ESMTP id 87554160C for ; Tue, 11 Jun 2013 08:46:27 +0000 (UTC) Received: by mail-la0-f43.google.com with SMTP id gw10so6666799lab.30 for ; Tue, 11 Jun 2013 01:46:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=qMS8s+keSx5TMEvVX4qXiamKGDpfRAyy9glszaQGKeA=; b=W2kljDhSycGwDHYoqXimTSXyJwu8agT1s3ygChxFeoQ7yAj4410umMSe92nBxuaLOY E8qvdWHqKSZF+xqDTmkt+YH719tdLFm6tOlE9BOy6v+VJzR8Sf3zWhYMUQjxG2n96iWu IgRhNAYrp2uJtDUae2f+APU796qKNqXmA1+xjwvbuYa4jmpm9Av1xaTWZmtdIueVkBCW FzL65vQ1N+acVKdyl9IunesvkPepnYhHfeuGuU3dkarbxVcszKirdDYcKjOg90uBaMyw 03D+oj46SoVdzg9myy8mJagxNGTSAOhKEb0XSQkGoUDbktVVs6xxlR4+orjUHUaZHjL/ prPw== X-Received: by 10.152.121.106 with SMTP id lj10mr6861724lab.27.1370940385481; Tue, 11 Jun 2013 01:46:25 -0700 (PDT) Received: from grey.office.se.prisjakt.nu ([212.16.170.194]) by mx.google.com with ESMTPSA id w4sm855521law.5.2013.06.11.01.46.23 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 11 Jun 2013 01:46:24 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: zpool export/import on failover - The pool metadata is corrupted From: mxb In-Reply-To: <20130606233417.GA46506@icarus.home.lan> Date: Tue, 11 Jun 2013 10:46:22 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <61E414CF-FCD3-42BB-9533-A40EA934DB99@alumni.chalmers.se> References: <016B635E-4EDC-4CDF-AC58-82AC39CBFF56@alumni.chalmers.se> <20130606223911.GA45807@icarus.home.lan> <20130606233417.GA46506@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQkc8OJr4ravkNcLpOU7h/rIr026SuVs1m8vZtwDDxWpOfIwmf24Vr9kv+7aC7fnkKy/szrK Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 08:46:28 -0000 Thanks everyone whom replied. Removing local L2ARC cache disks (da1,da2) indeed showed to be a cure to = my problem. Next is to test with add/remove after import/export as Jeremy suggested. //mxb On 7 jun 2013, at 01:34, Jeremy Chadwick wrote: > On Fri, Jun 07, 2013 at 12:51:14AM +0200, mxb wrote: >>=20 >> Sure, script is not perfects yet and does not handle many of stuff, = but moving highlight from zpool import/export to the script itself not = that >> clever,as this works most of the time. >>=20 >> Question is WHY ZFS corrupts metadata then it should not. Sometimes. >> I'v seen stale of zpool then manually importing/exporting pool. >>=20 >>=20 >> On 7 jun 2013, at 00:39, Jeremy Chadwick wrote: >>=20 >>> On Fri, Jun 07, 2013 at 12:12:39AM +0200, mxb wrote: >>>>=20 >>>> Then MASTER goes down, CARP on the second node goes MASTER = (devd.conf, and script for lifting): >>>>=20 >>>> root@nfs2:/root # cat /etc/devd.conf >>>>=20 >>>>=20 >>>> notify 30 { >>>> match "system" "IFNET"; >>>> match "subsystem" "carp0"; >>>> match "type" "LINK_UP"; >>>> action "/etc/zfs_switch.sh active"; >>>> }; >>>>=20 >>>> notify 30 { >>>> match "system" "IFNET"; >>>> match "subsystem" "carp0"; >>>> match "type" "LINK_DOWN"; >>>> action "/etc/zfs_switch.sh backup"; >>>> }; >>>>=20 >>>> root@nfs2:/root # cat /etc/zfs_switch.sh >>>> #!/bin/sh >>>>=20 >>>> DATE=3D`date +%Y%m%d` >>>> HOSTNAME=3D`hostname` >>>>=20 >>>> ZFS_POOL=3D"jbod" >>>>=20 >>>>=20 >>>> case $1 in >>>> active) >>>> echo "Switching to ACTIVE and importing ZFS" | mail -s = ''$DATE': '$HOSTNAME' switching to ACTIVE' root >>>> sleep 10 >>>> /sbin/zpool import -f jbod >>>> /etc/rc.d/mountd restart >>>> /etc/rc.d/nfsd restart >>>> ;; >>>> backup) >>>> echo "Switching to BACKUP and exporting ZFS" | mail -s = ''$DATE': '$HOSTNAME' switching to BACKUP' root >>>> /sbin/zpool export jbod >>>> /etc/rc.d/mountd restart >>>> /etc/rc.d/nfsd restart >>>> ;; >>>> *) >>>> exit 0 >>>> ;; >>>> esac >>>>=20 >>>> This works, most of the time, but sometimes I'm forced to re-create = pool. Those machines suppose to go into prod. >>>> Loosing pool(and data inside it) stops me from deploy this setup. >>>=20 >>> This script looks highly error-prone. Hasty hasty... :-) >>>=20 >>> This script assumes that the "zpool" commands (import and export) = always >>> work/succeed; there is no exit code ($?) checking being used. >>>=20 >>> Since this is run from within devd(8): where does stdout/stderr go = to >>> when running a program/script under devd(8)? Does it effectively go >>> to the bit bucket (/dev/null)? If so, you'd never know if the = import or >>> export actually succeeded or not (the export sounds more likely to = be >>> the problem point). >>>=20 >>> I imagine there would be some situations where the export would fail >>> (some files on filesystems under pool "jbod" still in use), yet CARP = is >>> already blindly assuming everything will be fantastic. Surprise. >>>=20 >>> I also do not know if devd.conf(5) "action" commands spawn a = sub-shell >>> (/bin/sh) or not. If they don't, you won't be able to use things = like" >>> 'action "/etc/zfs_switch.sh active >> /var/log/failover.log";'. You >>> would then need to implement the equivalent of logging within your >>> zfs_switch.sh script. >>>=20 >>> You may want to consider the -f flag to zpool import/export >>> (particularly export). However there are risks involved -- userland >>> applications which have an fd/fh open on a file which is stored on a >>> filesystem that has now completely disappeared can sometimes crash >>> (segfault) or behave very oddly (100% CPU usage, etc.) depending on = how >>> they're designed. >>>=20 >>> Basically what I'm trying to say is that devd(8) being used as a = form of >>> HA (high availability) and load balancing is not always possible. >>> Real/true HA (especially with SANs) is often done very differently = (now >>> you know why it's often proprietary. :-) ) >=20 > Add error checking to your script. That's my first and foremost > recommendation. It's not hard to do, really. :-) >=20 > After you do that and still experience the issue (e.g. you see no = actual > errors/issues during the export/import phases), I recommend removing > the "cache" devices which are "independent" on each system from the = pool > entirely. Quoting you (for readers, since I snipped it from my = previous > reply): >=20 >>>> Note, that ZIL(mirrored) resides on external enclosure. Only L2ARC >>>> is both local and external - da1,da2, da13s2, da14s2 >=20 > I interpret this to mean the primary and backup nodes (physical = systems) > have actual disks which are not part of the "external enclosure". If > that's the case -- those disks are always going to vary in their > contents and metadata. Those are never going to be 100% identical all > the time (is this not obvious?). I'm surprised your stuff has worked = at > all using that model, honestly. >=20 > ZFS is going to bitch/cry if it cannot verify the integrity of certain > things, all the way down to the L2ARC. That's my understanding of it = at > least, meaning there must always be "some" kind of metadata that has = to > be kept/maintained there. >=20 > Alternately you could try doing this: >=20 > zpool remove jbod cache daX daY ... > zpool export jbod >=20 > Then on the other system: >=20 > zpool import jbod > zpool add jbod cache daX daY ... >=20 > Where daX and daY are the disks which are independent to each system > (not on the "external enclosure"). >=20 > Finally, it would also be useful/worthwhile if you would provide=20 > "dmesg" from both systems and for you to explain the physical wiring > along with what device (e.g. daX) correlates with what exact thing on > each system. (We right now have no knowledge of that, and your terse > explanations imply we do -- we need to know more) >=20 > --=20 > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | >=20 From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 14:21:12 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8F998261; Tue, 11 Jun 2013 14:21:12 +0000 (UTC) (envelope-from smh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 68A41171F; Tue, 11 Jun 2013 14:21:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5BELCQj031401; Tue, 11 Jun 2013 14:21:12 GMT (envelope-from smh@freefall.freebsd.org) Received: (from smh@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5BELC9m031400; Tue, 11 Jun 2013 14:21:12 GMT (envelope-from smh) Date: Tue, 11 Jun 2013 14:21:12 GMT Message-Id: <201306111421.r5BELC9m031400@freefall.freebsd.org> To: smh@FreeBSD.org, freebsd-fs@FreeBSD.org, smh@FreeBSD.org From: smh@FreeBSD.org Subject: Re: kern/178999: [zfs] dev entries for cloned zvol don't show up until after reboot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 14:21:12 -0000 Synopsis: [zfs] dev entries for cloned zvol don't show up until after reboot Responsible-Changed-From-To: freebsd-fs->smh Responsible-Changed-By: smh Responsible-Changed-When: Tue Jun 11 14:21:02 UTC 2013 Responsible-Changed-Why: I'll take it http://www.freebsd.org/cgi/query-pr.cgi?pr=178999 From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 20:28:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 270D11DC for ; Tue, 11 Jun 2013 20:28:41 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id A4AE71B14 for ; Tue, 11 Jun 2013 20:28:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r5BKNqlJ087328; Wed, 12 Jun 2013 00:23:52 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Wed, 12 Jun 2013 00:23:52 +0400 (MSK) From: Dmitry Morozovsky To: Mikolaj Golub Subject: Re: hast: can't restore after disk failure In-Reply-To: <20130611060741.GA42231@gmail.com> Message-ID: References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Wed, 12 Jun 2013 00:23:52 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 20:28:41 -0000 On Tue, 11 Jun 2013, Mikolaj Golub wrote: > On Tue, Jun 11, 2013 at 12:40:08AM +0400, Dmitry Morozovsky wrote: > > On Mon, 10 Jun 2013, Mikolaj Golub wrote: > > > > [snipall] > > > > > > Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 > > > > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > > > > (pid=14380, exitcode=66). > > > > > > > > Any hints? Thanks! > > > > > > Have you run hastctl create to initialize metadata? > > > > Yes, but did it naively: > > > > hastctl create d1 > > No errors? no visible, but hast instance ungracefully exits > > and status still reported 0 as provider size... > > I assume /dev/ada1p1 is present and readable/writable? > > Symptoms are like if it did not exist. nope, it does: root@cthulhu3:/# diskinfo /dev/ada1p1 /dev/ada1p1 512 999654686720 1952450560 0 1048576 1936954 16 63 root@cthulhu3:/# diskinfo /dev/ada0p1 /dev/ada0p1 512 999653638144 1952448512 0 1048576 1936952 16 63 -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 21:08:35 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AC819EBB for ; Tue, 11 Jun 2013 21:08:35 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 22A3B1CF8 for ; Tue, 11 Jun 2013 21:08:34 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id EDD4910CC062; Tue, 11 Jun 2013 23:01:27 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.3 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 12.4903] X-CRM114-CacheID: sfid-20130611_23012_80876E8A X-CRM114-Status: Good ( pR: 12.4903 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Tue Jun 11 23:01:27 2013 X-DSPAM-Confidence: 0.9938 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 51b79027796977830491068 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, STABLE, 0.00371, disks, 0.00397, disks, 0.00397, 231, 0.00428, filter, 0.00505, filter, 0.00505, ZFS, 0.00555, ZFS, 0.00555, 1+14, 0.00555, 2+21, 0.00555, Subject*ZFS, 0.00617, sysctl, 0.00617, From*Attila, 0.00617, To*FreeBSD.org, 0.00656, 158, 0.00693, 474, 0.00693, 215, 0.00693, machines, 0.00739, machines, 0.00739, 1+19, 0.00792, 1+19, 0.00792, controller, 0.00874, load, 0.00893, load, 0.00893, files, 0.00965, X-Spambayes-Classification: ham; 0.00 Received: from [192.168.3.2] (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id B8AAC10CC057 for ; Tue, 11 Jun 2013 23:01:26 +0200 (CEST) Message-ID: <51B79023.5020109@fsn.hu> Date: Tue, 11 Jun 2013 23:01:23 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Subject: An order of magnitude higher IOPS needed with ZFS than UFS Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 21:08:35 -0000 Hi, I have two identical machines. They have 14 disks hooked up to a HP smartarray (SA from now on) controller. Both machines have the same SA configuration and layout: the disks are organized into mirror pairs (HW RAID1). On the first machine, these mirrors are formatted with UFS2+SU (default settings), on the second machine they are used as separate zpools (please don't tell me that ZFS can do the same, I know). Atime is turned off, otherwise, no other modifications (zpool/zfs or sysctl parameters). The file systems are loaded more or less evenly with serving of some kB to few megs files. The machines act as NFS servers, so there is one, maybe important difference here: the UFS machine runs 8.3-RELEASE, while the ZFS one runs 9.1-STABLE@r248885. They get the same type of load, and according to nfsstat and netstat, the loads don't explain the big difference which can be seen in disk IOs. In fact, the UFS host seems to be more loaded... According to gstat on the UFS machine: dT: 60.001s w: 60.000s filter: da L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 42 35 404 6.4 8 150 214.2 21.5| da0 0 30 21 215 6.1 9 168 225.2 15.9| da1 0 41 33 474 4.5 8 158 211.3 18.0| da2 0 39 30 425 4.6 9 163 235.0 17.1| da3 1 31 24 266 5.1 7 93 174.1 14.9| da4 0 29 22 273 5.9 7 84 200.7 15.9| da5 0 37 30 692 7.1 7 115 206.6 19.4| da6 and on the ZFS one: dT: 60.001s w: 60.000s filter: da L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 228 201 1045 23.7 27 344 53.5 88.7| da0 5 185 167 855 21.1 19 238 44.9 73.8| da1 10 263 236 1298 34.9 27 454 53.3 99.9| da2 10 255 235 1341 28.3 20 239 64.8 92.9| da3 10 219 195 994 22.3 23 257 46.3 81.3| da4 10 248 221 1213 22.4 27 264 55.8 90.2| da5 9 231 213 1169 25.1 19 229 54.6 88.6| da6 I've seen a lot of cases where ZFS required more memory and CPU (and even IO) to handle the same load, but they were nowhere this bad (often a 10x increase). Any ideas? BTW, the file systems are 77-78% full according to df (so ZFS holds more, because UFS is -m 8). Thanks, From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 21:21:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4F7614C0 for ; Tue, 11 Jun 2013 21:21:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 158431DA7 for ; Tue, 11 Jun 2013 21:21:17 +0000 (UTC) X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.1 cv=ME3lrcP4jFDzpPiCSQywCMKJiHtpRWeRXBDIYmR1BZg= c=1 sm=2 a=ctSXsGKhotwA:10 a=FKkrIqjQGGEA:10 a=uNq0K1xFbOwA:10 a=IkcTkHD0fZMA:10 a=6I5d2MoRAAAA:8 a=IIDPTtw_6pPhr65o58kA:9 a=QEXdDO2ut3YA:10 a=IO5DDJVRER8A:10 a=jpxF4j0qNWYA:10 a=0X1wm-MWLxgA:10 a=izcmP9whcIMA:10 a=UgQyK67jzVMA:10 a=KK3dN39wtEsA:10 a=zr1izwO6SH0A:10 a=SV7veod9ZcQA:10 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqMEANeTt1GDaFvO/2dsb2JhbABWA4M5SYJ0u1qBF3SCIwEBAQMBAQEBICsgCwUWDgoCAg0ZAikBCSYGCAcEARwEh2YGDKhbkUKBJoxKEH4kEAcRgjuBFAOTboENgkWBKYkDhxaDKyAygQM2 X-IronPort-AV: E=Sophos;i="4.87,847,1363147200"; d="scan'208";a="32625428" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 11 Jun 2013 17:20:09 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BAA39B3F0D; Tue, 11 Jun 2013 17:20:09 -0400 (EDT) Date: Tue, 11 Jun 2013 17:20:09 -0400 (EDT) From: Rick Macklem To: Attila Nagy Message-ID: <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <51B79023.5020109@fsn.hu> Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 21:21:18 -0000 Attila Nagy wrote: > Hi, > > I have two identical machines. They have 14 disks hooked up to a HP > smartarray (SA from now on) controller. > Both machines have the same SA configuration and layout: the disks are > organized into mirror pairs (HW RAID1). > > On the first machine, these mirrors are formatted with UFS2+SU > (default > settings), on the second machine they are used as separate zpools > (please don't tell me that ZFS can do the same, I know). Atime is > turned > off, otherwise, no other modifications (zpool/zfs or sysctl > parameters). > The file systems are loaded more or less evenly with serving of some > kB > to few megs files. > > The machines act as NFS servers, so there is one, maybe important > difference here: the UFS machine runs 8.3-RELEASE, while the ZFS one > runs 9.1-STABLE@r248885. > They get the same type of load, and according to nfsstat and netstat, > the loads don't explain the big difference which can be seen in disk > IOs. In fact, the UFS host seems to be more loaded... > > According to gstat on the UFS machine: > dT: 60.001s w: 60.000s filter: da > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 42 35 404 6.4 8 150 214.2 21.5| da0 > 0 30 21 215 6.1 9 168 225.2 15.9| da1 > 0 41 33 474 4.5 8 158 211.3 18.0| da2 > 0 39 30 425 4.6 9 163 235.0 17.1| da3 > 1 31 24 266 5.1 7 93 174.1 14.9| da4 > 0 29 22 273 5.9 7 84 200.7 15.9| da5 > 0 37 30 692 7.1 7 115 206.6 19.4| da6 > > and on the ZFS one: > dT: 60.001s w: 60.000s filter: da > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 228 201 1045 23.7 27 344 53.5 88.7| da0 > 5 185 167 855 21.1 19 238 44.9 73.8| da1 > 10 263 236 1298 34.9 27 454 53.3 99.9| da2 > 10 255 235 1341 28.3 20 239 64.8 92.9| da3 > 10 219 195 994 22.3 23 257 46.3 81.3| da4 > 10 248 221 1213 22.4 27 264 55.8 90.2| da5 > 9 231 213 1169 25.1 19 229 54.6 88.6| da6 > > I've seen a lot of cases where ZFS required more memory and CPU (and > even IO) to handle the same load, but they were nowhere this bad > (often > a 10x increase). > > Any ideas? > ken@ recently committed a change to the new NFS server to add file handle affinity support to it. He reported that he had found that, without file handle affinity, that ZFS's sequential reading heuristic broke badly (or something like that, you can probably find the email thread or maybe he will chime in). Anyhow, you could try switching the FreeBSD 9 system to use the old NFS server (assuming your clients are doing NFSv3 mounts) and see if that has a significant effect. (For FreeBSD9, the old server has file handle affinity, but the new server does not.) rick > BTW, the file systems are 77-78% full according to df (so ZFS holds > more, because UFS is -m 8). > > Thanks, > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 21:25:34 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2DA91837 for ; Tue, 11 Jun 2013 21:25:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id D1D5D1DE9 for ; Tue, 11 Jun 2013 21:25:33 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id D91CD3C23B0; Wed, 12 Jun 2013 07:25:26 +1000 (EST) Date: Wed, 12 Jun 2013 07:25:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov Subject: Re: missed clustering for small block sizes in cluster_wbuild() In-Reply-To: <20130611063446.GJ3047@kib.kiev.ua> Message-ID: <20130612053543.X900@besplex.bde.org> References: <20130607044845.O24441@besplex.bde.org> <20130611063446.GJ3047@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Q6eKePKa c=1 sm=1 a=r8sOWHbHUnAA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=zUlCpqlVHewA:10 a=3cL_b2E_Z7FY-DCS104A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 21:25:34 -0000 On Tue, 11 Jun 2013, Konstantin Belousov wrote: > On Fri, Jun 07, 2013 at 05:28:11AM +1000, Bruce Evans wrote: >> I think this is best fixed be fixed by removing the check above and >> checking here. Then back out of the changes. I don't know this code >> well enough to write the backing out easily. > > Could you test this, please ? It works in limited testing. I got a panic from not adapting for changed locking when merging it to my version, and debugging this showed a problem. > diff --git a/sys/kern/vfs_cluster.c b/sys/kern/vfs_cluster.c > index b280317..9e1528e 100644 > --- a/sys/kern/vfs_cluster.c > +++ b/sys/kern/vfs_cluster.c > ... > @@ -904,14 +904,10 @@ cluster_wbuild(struct vnode *vp, long size, daddr_t start_lbn, int len, > > /* > * Check that the combined cluster > - * would make sense with regard to pages > - * and would not be too large > + * would make sense with regard to pages. > */ The comment needs more changes. There is no check "with regard to pages" now. The old comment was poorly worded. The code never made an extra check that the cluster "would make sense with regard to pages" here (that check was always later). What it did was use the page count in the largeness check. > - if ((tbp->b_bcount != size) || > - ((bp->b_blkno + (dbsize * i)) != > - tbp->b_blkno) || > - ((tbp->b_npages + bp->b_npages) > > - (vp->v_mount->mnt_iosize_max / PAGE_SIZE))) { > + if (tbp->b_bcount != size || > + bp->b_blkno + dbsize * i != tbp->b_blkno) { > BUF_UNLOCK(tbp); > break; > } Contrary to what I said before, the caller doesn't always limit the cluster size. Only the cluster_write() caller does that. The vfs_bio_awrite() doesn't. Now it is fairly common to allocate 1 too many page and have to back out. This happens even when everything is aligned. I observed the following: - there were a lot of contiguous dirty buffers, and this loop happily built up a cluster with 17 pages, though mnt_iosize_max was only 17 pages. Perhaps the extra page is necessary if the part of the buffer to be written starts at a nonzero offset, but there was no offset in the case that I observed (can there be one, and if so, is it limited to an offset within the first page? The general case needs 16 4K extra pages to write a 64K-block (when the offset of the area to be written is 64K-512). - ... > @@ -964,6 +960,22 @@ cluster_wbuild(struct vnode *vp, long size, daddr_t start_lbn, int len, > bp->b_pages[bp->b_npages] = m; > bp->b_npages++; > } > + if (bp->b_npages > vp->v_mount-> > + mnt_iosize_max / PAGE_SIZE) { - ...Then this detects that the 17th page is 1 too many and cleans up. > + KASSERT(i != 0, ("XXX")); > + j++; > + for (jj = 0; jj < j; jj++) { > + vm_page_io_finish(tbp-> > + b_pages[jj]); > + } > + vm_object_pip_subtract( > + m->object, j); > + bqrelse(tbp); > + bp->b_npages -= j; > + VM_OBJECT_WUNLOCK(tbp-> > + b_bufobj->bo_object); > + goto finishcluster; > + } > } > VM_OBJECT_WUNLOCK(tbp->b_bufobj->bo_object); > } I think it would work and fix other bugs to check (tbp->b_bcount + bp->b_bcount <= vp->v_mount->mnt_iosize_max) up front. Drivers should be able to handle an i/o size of b_bcount however many pages that takes. There must be a limit on b_pages, but it seems to be non-critical and the limit on b_bcount gives one of (mnt_iosize_max / PAGE_SIZE) rounded in some way and possibly increased by 1 or doubled to account for offsets. If mnt_iosize_max is not a multiple of PAGE_SIZE, then the limit using pages doesn't even allow covering mnt_iosize_max using pages, since the rounding down is non-null. I found this bug while debugging a recent PR about bad performance and hangs under write pressure. I only have 1 other clearly correct fix for the bad performance. msdosfs is missing read clustering for read-before-write. I didn't notice that this was necessary when I implemented clustering for msdosfs a few years ago. I thought that the following patch was a complete fix, but have found more performance problems in clustering: @ diff -u2 msdosfs_vnops.c~ msdosfs_vnops.c @ --- msdosfs_vnops.c~ Thu Feb 5 19:11:37 2004 @ +++ msdosfs_vnops.c Wed Jun 12 04:01:19 2013 @ @@ -740,5 +756,19 @@ @ * The block we need to write into exists, so read it in. @ */ @ - error = bread(thisvp, bn, pmp->pm_bpcluster, cred, &bp); @ + if ((ioflag >> IO_SEQSHIFT) != 0 && This was cloned from the ffs version. All ffs should call a common function instead of duplicating the cluster_read/bread decision. Similarly for write clustering except there are more decisions. But ffs and ext2fs do this in UFS_BALLOC() and ext2fs_balloc() (?) where not all the info that might be need is available. I repeated the (ioflag >> IO_SEQSHIFT) calculation instead of copying to a variable like ffs does, to localise this patch and to avoid copying ffs's mounds of style bugs in the declaration of the variable. @ + (vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) { @ + error = cluster_read(vp, dep->de_FileSize, bn, @ + pmp->pm_bpcluster, NOCRED, @ +#if 0 @ + (uio->uio_offset & pmp->pm_crbomask) + @ + uio->uio_resid, This part was copied from msdosfs_read(). msdosfs_read() uses the uio of some reader. Here the uio for read-before-write is for the writer. It isn't clear that either should be used here. UFS_BALLOC() is not passed the full uio info needed to make this caclulation, and it uses the fixed size MAXBSIZE. That is wrong in a different way. This parameter is used to reduce latency. It asks for a small cluster of the specified size followed by read ahead of full clusters, with the amount of read ahead controlled by vfs.read_max. This gives clusters of non-uniform sizes and offsets, especially when the reader uses small blocks. scottl recently added vfs.read_min which can be tuned to prevent this. I don't like many things in this area. vfs.read_min works OK, but is another hack. The default should probably be to optimize for throughput instead of latency (the reverse of the current one, but the curent one is historical so it shouldn't be changed). The units of vfs.read_max and vfs.read_min are fs block sizes. This is quite broken when the cluster size is varied and sometimes small. E.g., the old default read_max of 8, with a block size of 512 then the read-ahead is limited to a whole 4K. The default is 64, which is still too small with small block sizes. But if you increase vfs.read_max a lot, then the read-ahead becomes almost infinity when the block size is large. In my version, the units for vfs.read_max are 512-blocks (default 256 for the old limit of 128K read-ahead with ffs's old default 16K-blocks. The current limit of 64 seems excessive with ffs's current default of 32K-blocks (2048K read-ahead). My units are mostly better, but I just noticed that they have a different too-delicate interaction with application and kernel block sizes... @ +#else @ + MAXPHYS, The above gave sub-maximal clustering. So does ffs's MAXBSIZE when it is smaller than mnt_iosize_max. In ~5.2, mnt_iosize_max is physical and is usually DFLTPHYS == MAXBSIZE, so ffs's choice usually gives maximal clusters. However, in -current, mnt_iosize_max is virtual and is usually MAXPHYS == 2 * MAXBSIZE. So MAXPHYS is probably correct here. ... Then I noticed another problem. MAXPHYS twice mnt_iosize_max, so the cluster size is only mnt_iosize_max = DFLTPHYS = 64K. This apparently acts badly with vfs.read_max = 256 512-blocks. I think it breaks read-ahead. Throughput drops by a factor of 4 for read-before write relative to direct writes (not counting the factor of 2 for the doubled i/o from the reads), although all the i/o sizes are 64K. Increasing vfs.read_max by just 16 fixes this. The throughput drop is then only 10-20% (there must be some drop for the extra seeks). I'm not sure if extra read-ahead is good or bad here. More read-ahead in read-before-write reduces seeks, but it may also break drives' caching and sequential heuristics. My drives are old and have small caches and are very sensitive to the i/o pattern for read-before-write. @ +#endif @ + ioflag >> IO_SEQSHIFT, &bp); @ + } else { @ + error = bread(vp, bn, pmp->pm_bpcluster, @ + NOCRED, &bp); @ + } @ if (error) { @ brelse(bp); To complete getting mostly-full clusters for writing large files to msdosfs, I hacked the block size heuristic some more to give larger blocks: @ diff -u2 msdosfs_vfsops.c~ msdosfs_vfsops.c @ --- msdosfs_vfsops.c~ Sun Jun 20 14:20:03 2004 @ +++ msdosfs_vfsops.c Wed Jun 12 04:32:52 2013 @ @@ -519,7 +547,11 @@ @ @ if (FAT12(pmp)) @ - pmp->pm_fatblocksize = 3 * pmp->pm_BytesPerSec; @ + pmp->pm_fatblocksize = 3 * DEV_BSIZE; @ + else if (FAT16(pmp)) @ + pmp->pm_fatblocksize = PAGE_SIZE; @ else @ - pmp->pm_fatblocksize = MSDOSFS_DFLTBSIZE; @ + pmp->pm_fatblocksize = DFLTPHYS; @ + pmp->pm_fatblocksize = roundup(pmp->pm_fatblocksize, @ + pmp->pm_BytesPerSec); @ @ pmp->pm_fatblocksec = pmp->pm_fatblocksize / DEV_BSIZE; I changed my version this long ago to use 3*DEV_BSIZE for FAT12 and PAGE_SIZE in all other cases. 3*pmp->pm_BytesPerSec is bogus since IIRC a small file system doesn't need even 3 DEV_BSIZE sectors and when the sector size is larger than DEV_BSIZE then it won't need 3 sectors. MSDOSFS_DFLTBSIZE is 4096. This and PAGE_SIZE are really too small for huge FATs. The FAT i/o size should really depend on the size of the FAT, not on its type, but small sizes are more robust and more efficient for sparse writes. The larger size also requires fewer buffers. 4K is not too bad, but 512 would be really bad. I just remembered why I like small blocks. They are more robust, and clustering makes them efficient. But clustering of the FAT isn't done. Clusters are normally written with bdwrite() but not B_CLUSTEROK. I think some clustering still occurs since !B_CLUSTEROK is not honored. Clusters are read using bread(). I think this is followed breadn(), giving the old limited read-ahead which isn't nearly enough with 4K-blocks. DFLTPHYS or MAXPHYS wasn't usable as a default until geom made the max i/o size virtual, since it didn't work for devices with a lower limit. Neither did MAXBSIZE. Even 4K might be larger than the device limit. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 23:21:30 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D5CC5ADB for ; Tue, 11 Jun 2013 23:21:30 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id A7501138C for ; Tue, 11 Jun 2013 23:21:30 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r5BNLOCn047835; Tue, 11 Jun 2013 17:21:24 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r5BNLOgG047834; Tue, 11 Jun 2013 17:21:24 -0600 (MDT) (envelope-from ken) Date: Tue, 11 Jun 2013 17:21:24 -0600 From: "Kenneth D. Merry" To: Rick Macklem Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS Message-ID: <20130611232124.GA42577@nargothrond.kdm.org> References: <51B79023.5020109@fsn.hu> <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2i Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 23:21:30 -0000 On Tue, Jun 11, 2013 at 17:20:09 -0400, Rick Macklem wrote: > Attila Nagy wrote: > > Hi, > > > > I have two identical machines. They have 14 disks hooked up to a HP > > smartarray (SA from now on) controller. > > Both machines have the same SA configuration and layout: the disks are > > organized into mirror pairs (HW RAID1). > > > > On the first machine, these mirrors are formatted with UFS2+SU > > (default > > settings), on the second machine they are used as separate zpools > > (please don't tell me that ZFS can do the same, I know). Atime is > > turned > > off, otherwise, no other modifications (zpool/zfs or sysctl > > parameters). > > The file systems are loaded more or less evenly with serving of some > > kB > > to few megs files. > > > > The machines act as NFS servers, so there is one, maybe important > > difference here: the UFS machine runs 8.3-RELEASE, while the ZFS one > > runs 9.1-STABLE@r248885. > > They get the same type of load, and according to nfsstat and netstat, > > the loads don't explain the big difference which can be seen in disk > > IOs. In fact, the UFS host seems to be more loaded... > > > > According to gstat on the UFS machine: > > dT: 60.001s w: 60.000s filter: da > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > > 0 42 35 404 6.4 8 150 214.2 21.5| da0 > > 0 30 21 215 6.1 9 168 225.2 15.9| da1 > > 0 41 33 474 4.5 8 158 211.3 18.0| da2 > > 0 39 30 425 4.6 9 163 235.0 17.1| da3 > > 1 31 24 266 5.1 7 93 174.1 14.9| da4 > > 0 29 22 273 5.9 7 84 200.7 15.9| da5 > > 0 37 30 692 7.1 7 115 206.6 19.4| da6 > > > > and on the ZFS one: > > dT: 60.001s w: 60.000s filter: da > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > > 0 228 201 1045 23.7 27 344 53.5 88.7| da0 > > 5 185 167 855 21.1 19 238 44.9 73.8| da1 > > 10 263 236 1298 34.9 27 454 53.3 99.9| da2 > > 10 255 235 1341 28.3 20 239 64.8 92.9| da3 > > 10 219 195 994 22.3 23 257 46.3 81.3| da4 > > 10 248 221 1213 22.4 27 264 55.8 90.2| da5 > > 9 231 213 1169 25.1 19 229 54.6 88.6| da6 > > > > I've seen a lot of cases where ZFS required more memory and CPU (and > > even IO) to handle the same load, but they were nowhere this bad > > (often > > a 10x increase). > > > > Any ideas? > > > ken@ recently committed a change to the new NFS server to add file > handle affinity support to it. He reported that he had found that, > without file handle affinity, that ZFS's sequential reading heuristic > broke badly (or something like that, you can probably find the email > thread or maybe he will chime in). That is correct. The problem, if the I/O is sequential, is that simultaneous requests for adjacent blocks in a file will get farmed out to different threads in the NFS server. These can easily go down into ZFS out of order, and make the ZFS prefetch code think that the file is not being read sequentially. It blows away the zfetch stream, and you wind up with a lot of I/O bandwidth getting used (with a lot of prefetching done and then re-done), but not much performance. The FHA code puts adjacent requests in a single file into the same thread, so ZFS sees the requests in the right order. Another change I made was to allow parallel writes to a file if the underlying filesystem allows it. (ZFS is the only filesystem that allows that currently.) That can help random writes. Linux clients are more likely than FreeBSD and MacOS clients to queue a lot of reads to the server. > Anyhow, you could try switching the FreeBSD 9 system to use the old > NFS server (assuming your clients are doing NFSv3 mounts) and see if > that has a significant effect. (For FreeBSD9, the old server has file > handle affinity, but the new server does not.) If using the old NFS server helps, then the FHA code for the new server will help as well. Perhaps more, because the default FHA tuning parameters have changed somewhat and parallel writes are now possible. If you want to try out the FHA changes in stable/9, I just MFCed them, change 251641. Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-fs@FreeBSD.ORG Tue Jun 11 23:39:10 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C7863C9B for ; Tue, 11 Jun 2013 23:39:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 77328148D for ; Tue, 11 Jun 2013 23:39:09 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 2EB0F104196B; Wed, 12 Jun 2013 09:39:08 +1000 (EST) Date: Wed, 12 Jun 2013 09:39:07 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: missed clustering for small block sizes in cluster_wbuild() In-Reply-To: <20130612053543.X900@besplex.bde.org> Message-ID: <20130612085648.L836@besplex.bde.org> References: <20130607044845.O24441@besplex.bde.org> <20130611063446.GJ3047@kib.kiev.ua> <20130612053543.X900@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=K8x6hFqI c=1 sm=1 a=r8sOWHbHUnAA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=zUlCpqlVHewA:10 a=hiBHK-Nd3Hw8SXYrQcMA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 23:39:10 -0000 On Wed, 12 Jun 2013, Bruce Evans wrote: > On Tue, 11 Jun 2013, Konstantin Belousov wrote: > >> On Fri, Jun 07, 2013 at 05:28:11AM +1000, Bruce Evans wrote: >>> I think this is best fixed be fixed by removing the check above and >>> checking here. Then back out of the changes. I don't know this code >>> well enough to write the backing out easily. >> >> Could you test this, please ? > > It works in limited testing. > ... > - there were a lot of contiguous dirty buffers, and this loop happily built > up a cluster with 17 pages, though mnt_iosize_max was only 17 pages. > Perhaps the extra page is necessary if the part of the buffer to be > written starts at a nonzero offset, but there was no offset in the case > that I observed (can there be one, and if so, is it limited to an offset > within the first page? The general case needs 16 4K extra pages to write > a 64K-block (when the offset of the area to be written is 64K-512). I now remember a bit more about how this works. There is only a limited amount of offseting. The buffer might not be page-aligned relative to the start of the disk. Then the first page in the buffer must not all be accessed (via this buffer) for i/o. The first page is mapped at bp->b_kvabase, but disk drivers must only access data starting at bp->b_data, which is offset from bp->b_kvabase in the misaligned case. I think this is the only relevant complication. When misaligned buffers are merged into a cluster buffer, they must all have the same misalignment and size for the merge to work. 1 "extra" page, but no more, is always required in the misaligned case to reach the full mnt_iosize_max. msdosfs buffers may even be misaligned if they have size 64K! All msdosfs clusters may be misaligned if they have size >= PAGE_SIZE! This is not good for performance, but should work. Misalignment used to be the usual case, since msdsofs metadata before the data clusters tends to have an odd size in sectors and when the cluster size is >= PAGE_SIZE the misalignment is preserved. FreeBSD newfs_msdos shouldn't produce misaligned buffers, but other systems' utilities might. This may also cause problems with the MAXBSIZE limit of 64K. If it is a hard limit on b_kvasize, then misaligned buffers of this size won't be allowed. If it only a limit on b_bcount, then there may be fragmentation problems. > ... > I think it would work and fix other bugs to check (tbp->b_bcount + > bp->b_bcount <= vp->v_mount->mnt_iosize_max) up front. Drivers should > be able to handle an i/o size of b_bcount however many pages that > takes. There must be a limit on b_pages, but it seems to be > non-critical and the limit on b_bcount gives one of > (mnt_iosize_max / PAGE_SIZE) rounded in some way and possibly increased > by 1 or doubled to account for offsets. If mnt_iosize_max is not a > multiple of PAGE_SIZE, then the limit using pages doesn't even allow > covering mnt_iosize_max using pages, since the rounding down is > non-null. I'm now trying the b_bcount check and not doing any backout later (just print debugging info when it is reached). The backout case is reached even with the b_bcount check. This is in the misaligned case. The misaligned case shouldn't break clustering since it is quite common. It happens whenever the blocksize is small and the start of the cluster is misaligned relative to the start of the disk. If the block size is larger, then all blocks may be misaligned. > [read-before-write fix for msdosfs and generic problems with read-b4-write] > ... Then I noticed another problem. MAXPHYS twice mnt_iosize_max, > so the cluster size is only mnt_iosize_max = DFLTPHYS = 64K. This > apparently acts badly with vfs.read_max = 256 512-blocks. I think > it breaks read-ahead. Throughput drops by a factor of 4 for read-before > write relative to direct writes (not counting the factor of 2 for the > doubled i/o from the reads), although all the i/o sizes are 64K. > Increasing vfs.read_max by just 16 fixes this. The throughput drop > is then only 10-20% (there must be some drop for the extra seeks). > I'm not sure if extra read-ahead is good or bad here. More read-ahead > in read-before-write reduces seeks, but it may also break drives' > caching and sequential heuristics. My drives are old and have small > caches and are very sensitive to the i/o pattern for read-before-write. I confirmed that this has something to do with the drive. After reaching a quiescent pattern with "dd bs=1k count=1024k conv=notrunc" for almost- contiguous files (and 1k < fs block size, and fs = msdosfs with MAXPHYS read-before-write), reads and writes alternate with reads some constant distance ahead of writes. The difference depends on vfs.read_max. It is sometimes a multiple of 128 512-blocks, but often not. My drives don't like some fixed distances. I don't understand their pattern. They seem to prefer non-power-of-2 differnces. Turning off read-ahead by setting vfs.read_max to 0 gives the worset performance (reduce by another power of 2). The levels of reduced performance are quantized: one level at 7 times slower, one level at 4 times slower and one level at 10-20% slower. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 00:20:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7E37810F for ; Wed, 12 Jun 2013 00:20:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5483A16D3 for ; Wed, 12 Jun 2013 00:20:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5C0K1I9057967 for ; Wed, 12 Jun 2013 00:20:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5C0K0I0057965; Wed, 12 Jun 2013 00:20:00 GMT (envelope-from gnats) Date: Wed, 12 Jun 2013 00:20:00 GMT Message-Id: <201306120020.r5C0K0I0057965@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Garrett Cooper Subject: Re: kern/172334: [unionfs] unionfs permits recursive union mounts; causes panic quickly X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Garrett Cooper List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 00:20:01 -0000 The following reply was made to PR kern/172334; it has been noted by GNATS. From: Garrett Cooper To: bug-followup@FreeBSD.org, yaneurabeya@gmail.com Cc: daichi@FreeBSD.org Subject: Re: kern/172334: [unionfs] unionfs permits recursive union mounts; causes panic quickly Date: Tue, 11 Jun 2013 17:10:20 -0700 I finally got around to testing this. Yup -- the patch looks good to me. Thank you Daichi! -Garrett From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 02:17:04 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C5AB6AEC; Wed, 12 Jun 2013 02:17:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 750491980; Wed, 12 Jun 2013 02:17:03 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id A342D7804C2; Wed, 12 Jun 2013 11:48:12 +1000 (EST) Date: Wed, 12 Jun 2013 11:48:11 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Kenneth D. Merry" Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS In-Reply-To: <20130611232124.GA42577@nargothrond.kdm.org> Message-ID: <20130612104903.A1146@besplex.bde.org> References: <51B79023.5020109@fsn.hu> <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca> <20130611232124.GA42577@nargothrond.kdm.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Q6eKePKa c=1 sm=1 a=uNq0K1xFbOwA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=_0xpXSU753EA:10 a=fMB5tdty3pWOc5zq9kgA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 02:17:04 -0000 On Tue, 11 Jun 2013, Kenneth D. Merry wrote: > On Tue, Jun 11, 2013 at 17:20:09 -0400, Rick Macklem wrote: >> Attila Nagy wrote: >>> ... >>> I've seen a lot of cases where ZFS required more memory and CPU (and >>> even IO) to handle the same load, but they were nowhere this bad >>> (often >>> a 10x increase). >>> >>> Any ideas? >>> >> ken@ recently committed a change to the new NFS server to add file >> handle affinity support to it. He reported that he had found that, >> without file handle affinity, that ZFS's sequential reading heuristic >> broke badly (or something like that, you can probably find the email >> thread or maybe he will chime in). > > That is correct. The problem, if the I/O is sequential, is that simultaneous > requests for adjacent blocks in a file will get farmed out to different > threads in the NFS server. These can easily go down into ZFS out of order, > and make the ZFS prefetch code think that the file is not being read > sequentially. It blows away the zfetch stream, and you wind up with a lot > of I/O bandwidth getting used (with a lot of prefetching done and then > re-done), but not much performance. I saw the nfsd's getting in each other's way when debugging nfs write slowness some time ago. I used the "fix" of using only 1 nfsd. This worked fine on a lightly-loaded nfs server and client doing nothing nearly as heavy as the write benchmark for all other uses combined. With this and some other changes that are supposed to be in -current now, the write performance for large files was close to the drive's maximum. But reads were at best 75% of the maximum. Maybe FHA fixes the read case. More recently, I noticed that vfs clustering works poorly partly because it has too many, yet not enough sequential pointers. There is a pointer (fp->f_nextoff and fp->f_seqcount) for the sequential heuristic at the struct file level. This is shared between reads and writes, so mixed reads, writes and seeks break the heuristic for the reads and writes in the case that the seeks are to get back to position after the previous write (the rewrite benchmark in bonnie does this). The seeks mean that the i/o is not really sequential although it is sequential for the read part and the write part. FreeBSD is only trying to guess if these parts are sequential per-file. Mixed reads and writes on the same file shouldn't affect the guess any more than non-mixed reads or writes on different files, or mixed reads and writes on the same file when the kernel does the read to fill buffers before partial writes. However, at a lower level the only seeks that matter re physical ones. The per-file pointers should be combined somehow to predict and minimize the physical seeks. Nothing is done. The kernel read-before write does significant physical seeks but since everything is below the file level the per-file pointer is not clobbered so pure sequential writes are still guessed to be sequential although they aren't really. There is also a pointer (vp->v_lastw and vp->vp->v_lasta) for cluster writes. This is closer to the physical disk pointer that is needed, but since it is per-vnode it share a fundamental design error with the buffer cache (buffer cache code wants to access one vnode at a time, vnode data and metadata may be very non-sequential). vnodes are below the file level, so this pointer gets clobbered by writes (but not reads) on separate open files. The clobbering keeps the vnode pointer closer to the physical disk pointer if and only iff all accesses are to the same vnode. I think it mostly helps to not try to track per-file sequentiality for writes, but the per-file sequentiality guess is used for cluster writing too. The 2 types of sequentiality combine in a confusing way even if there is only 1 writer (but a reader on the same file). Write accesses are then sequential from the point of view of the vnode pointer, but random from the point of view of the file pointer, since only the latter is clobbered by intermediate reads. As mentioned above, bonnie's atypical i/o pattern clobbers the file pointer, but kernel's more typical i/o pattern for read-before-write doesn't. I first thought that clobbering the pointer was a bug, but now I think it is a feature. The i/o really is non-sequential. Basing most i/o sequentiality guesses on a single per-disk pointer (shared across different partitions on the same disk) might work better than all the separate pointers. Accesses that are sequential at the file level would only be considered sequential if no other physical accesses intervene. After getting that right, use sequentiality guesses again to delay some physical accesses if they would intervene. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 08:44:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CA585B42 for ; Wed, 12 Jun 2013 08:44:59 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-la0-x233.google.com (mail-la0-x233.google.com [IPv6:2a00:1450:4010:c03::233]) by mx1.freebsd.org (Postfix) with ESMTP id 5371219CF for ; Wed, 12 Jun 2013 08:44:59 +0000 (UTC) Received: by mail-la0-f51.google.com with SMTP id fq12so7489521lab.10 for ; Wed, 12 Jun 2013 01:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=vNvD5TNxquZYbGiKpfDWpMnowuHfmxgp/1BvQYFDZcc=; b=t4WXZ5o0CxZv9/kwfNrSqH5h5F0utLJsEh/pcNSbT1P3yo/Qh1uox58y1RVM8xua/8 6w8o2NQOW9bb1hrSNLWdbLhCWkcEI39DBUnIGMLOlIlvRt42w6cc3iL0DIIzJS3s2j+N 5lbgCqGwd61IY+1SKWO6OftgKDX3cnf3nvuQ/NXFgEJtZIFAHlJJ0s9YFBPZwdpnAuZE SfWcbcTgnrluj7Sp6tBqdY2DUYv/SCPR4hohQAnNCMojLCBH/AY+Y0q2KGPEl7LuJ87h NhDs3LVnLtbAbXqY/AQVucIRpyMEuxFa4lIMaLRgMLK4TZBxt8bH3aaziVaUVwzyOeeL rmoA== X-Received: by 10.112.150.170 with SMTP id uj10mr5406999lbb.93.1371026697991; Wed, 12 Jun 2013 01:44:57 -0700 (PDT) Received: from localhost ([188.230.122.226]) by mx.google.com with ESMTPSA id a3sm8737907lbg.2.2013.06.12.01.44.55 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 12 Jun 2013 01:44:56 -0700 (PDT) Date: Wed, 12 Jun 2013 11:44:54 +0300 From: Mikolaj Golub To: Dmitry Morozovsky Subject: Re: hast: can't restore after disk failure Message-ID: <20130612084453.GA55502@gmail.com> References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 08:44:59 -0000 On Wed, Jun 12, 2013 at 12:23:52AM +0400, Dmitry Morozovsky wrote: > On Tue, 11 Jun 2013, Mikolaj Golub wrote: > > > On Tue, Jun 11, 2013 at 12:40:08AM +0400, Dmitry Morozovsky wrote: > > > On Mon, 10 Jun 2013, Mikolaj Golub wrote: > > > > > > [snipall] > > > > > > > > Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 > > > > > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > > > > > (pid=14380, exitcode=66). > > > > > > > > > > Any hints? Thanks! > > > > > > > > Have you run hastctl create to initialize metadata? > > > > > > Yes, but did it naively: > > > > > > hastctl create d1 > > > > No errors? > > no visible, but hast instance ungracefully exits > > > > and status still reported 0 as provider size... > > > > I assume /dev/ada1p1 is present and readable/writable? > > > > Symptoms are like if it did not exist. > > nope, it does: > > root@cthulhu3:/# diskinfo /dev/ada1p1 > /dev/ada1p1 512 999654686720 1952450560 0 1048576 1936954 16 63 > root@cthulhu3:/# diskinfo /dev/ada0p1 > /dev/ada0p1 512 999653638144 1952448512 0 1048576 1936952 16 63 > Hm, looking in the source where this error is generated: cthulhu3 hastd[14379]: [d1] (secondary) Unable to read metadata from /dev/ada1p1: No such file or directory. it looks like hastd successfully read metadata from disk but failed to parse it (did not found an entry). This usually happens when metadata is not initialized by `hastctl create`. Does `hastctl dump d1' not work too? -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 08:49:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4B3E3C32 for ; Wed, 12 Jun 2013 08:49:54 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm8-vm1.bullet.mail.ird.yahoo.com (nm8-vm1.bullet.mail.ird.yahoo.com [77.238.189.198]) by mx1.freebsd.org (Postfix) with SMTP id 392C61A06 for ; Wed, 12 Jun 2013 08:49:53 +0000 (UTC) Received: from [77.238.189.238] by nm8.bullet.mail.ird.yahoo.com with NNFMP; 12 Jun 2013 08:49:51 -0000 Received: from [46.228.39.69] by tm19.bullet.mail.ird.yahoo.com with NNFMP; 12 Jun 2013 08:49:51 -0000 Received: from [127.0.0.1] by smtp106.mail.ir2.yahoo.com with NNFMP; 12 Jun 2013 08:49:51 -0000 X-Yahoo-Newman-Id: 458754.61133.bm@smtp106.mail.ir2.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: Bjnd4aYVM1k42RPalIBMugNHwfh2mQbDST.vBo1t969Rt0O g4RVTol70_AYmtxcePMxzP2nN7LWH6_OfJP6njytei1cnHwwnfeW3Ih8nL8X cDFiIs_ix3LL4CIKinlqNZgjPPj5T.623XaApa3sIhmm5pgZAoXttIiqZ1Rw ZCNX51Qgu1fb_1haxNOG_FtV4ViC5A.mIrPwxweNGAf8btyhojbxaD2HT2fF HhzMRcr2LDxZwwDVZbqLZur13SzDpCrg349r764O1HIA7Eja3xvOjKYqjAEq a6SHeqM4ReRmpM9JhqwbzQgWQ4YgR6u.fyBhPSWEJonfvTCr8W6kEyCIaJn4 28P17JfXWwCrFLyvNACE1mQqnLGNX8lhf0khX35YQRUT.XlpDS.6LdlmNgaz mrrRy8qQ.aNMhbcpA_PYSZ_gTzPJ_aU0il9kc7nDm5a4liJBNxDi8dmbXJBV bqzaHEwYsmxNCjr3GrnK13.vgW9Ohwfij X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-Rocket-Received: from [192.168.119.11] (se@87.158.30.195 with ) by smtp106.mail.ir2.yahoo.com with SMTP; 12 Jun 2013 08:49:51 +0000 UTC Message-ID: <51B8362A.4080406@freebsd.org> Date: Wed, 12 Jun 2013 10:49:46 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS References: <51B79023.5020109@fsn.hu> <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca> <20130611232124.GA42577@nargothrond.kdm.org> <20130612104903.A1146@besplex.bde.org> In-Reply-To: <20130612104903.A1146@besplex.bde.org> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: bde@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 08:49:54 -0000 Am 12.06.2013 03:48, schrieb Bruce Evans:> I first thought that clobbering the pointer was a bug, but now I think > it is a feature. The i/o really is non-sequential. Basing most i/o > sequentiality guesses on a single per-disk pointer (shared across > different partitions on the same disk) might work better than all > the separate pointers. Accesses that are sequential at the file level > would only be considered sequential if no other physical accesses > intervene. After getting that right, use sequentiality guesses again > to delay some physical accesses if they would intervene. Hi Bruce, I tend to disagree ... ;-) Recognizing sequential reads on a per file basis hints at whether read-ahead (delaying the next following access and the buffer needed to keep the data) might be useful. This "knowledge" can lead to drastically higher total throughput in situations, where multiple processes (or network clients) read files sequentially (e.g. a media server for many parallel streams). If you try to recognize sequential accesses on the device level, then you may identify cases were one reader is likely to perform back-to-back reads. But in all other cases (and especially under high load), you will not be able to identify the processes that might be helper by reading larger chunks than requested (lowering the number of seeks required and taking pressure from the storage). So, I think you need the per file read-ahead heuristics to identify candidates for read-ahead. And I doubt you can get the same effect by tracking disk accesses. Hmmm, you could keep a list of read-ahead pointers per disk, which could be recycled in a LRU scheme. Any new read that continues a prior read is detected and updates the corresponding pointer, which is in a struct with a read-ahead flag or the amount to read-ahead. Access to this list of pointers could be sped up by having a hash table that points to them (hash key is some number of LSBs, e.g. for 256 or 1024 buckets). That way the temporal distribution of the accesses could be included in the heuristic: If sequential reads are spread out over a long time, then their corresponding pointer is lost (after e.g. 256 or 1024 non-sequential accesses to the volume). This could be implemented as a scheduler class in GEOM, I think (to make it easily loadable and selectable per volume, but might also be appropriate for productive use). That way different strategies (with regard to read-ahead and the potential for clustering of writes) could be tested. Might be interesting to compare such a scheduler with the per file heuristics as implemented in the kernel now ... Best regards, STefan From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 09:07:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5C750224 for ; Wed, 12 Jun 2013 09:07:46 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id DE70D1B0D for ; Wed, 12 Jun 2013 09:07:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r5C97hdC024879; Wed, 12 Jun 2013 13:07:43 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Wed, 12 Jun 2013 13:07:43 +0400 (MSK) From: Dmitry Morozovsky To: Mikolaj Golub Subject: Re: hast: can't restore after disk failure In-Reply-To: <20130612084453.GA55502@gmail.com> Message-ID: References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> <20130612084453.GA55502@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Wed, 12 Jun 2013 13:07:43 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 09:07:46 -0000 On Wed, 12 Jun 2013, Mikolaj Golub wrote: [snip a bit] > > nope, it does: > > > > root@cthulhu3:/# diskinfo /dev/ada1p1 > > /dev/ada1p1 512 999654686720 1952450560 0 1048576 1936954 16 63 > > root@cthulhu3:/# diskinfo /dev/ada0p1 > > /dev/ada0p1 512 999653638144 1952448512 0 1048576 1936952 16 63 Argh! Somehow ada1p1 got created in slightly different size (though bigger than necessary, and it was the source of the problem. recreating it with gpart fixes the problem. > Hm, looking in the source where this error is generated: > > cthulhu3 hastd[14379]: [d1] (secondary) Unable to read metadata from /dev/ada1p1: No such file or directory. > > it looks like hastd successfully read metadata from disk but failed to > parse it (did not found an entry). This usually happens when metadata > is not initialized by `hastctl create`. Well, error messages definitely could be improved :) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 09:26:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 287DFA16; Wed, 12 Jun 2013 09:26:20 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 7BB051C6A; Wed, 12 Jun 2013 09:26:18 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id B21A5109600B; Wed, 12 Jun 2013 11:26:17 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.3 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 24.4369] X-CRM114-CacheID: sfid-20130612_11260_D61B9634 X-CRM114-Status: Good ( pR: 24.4369 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Wed Jun 12 11:26:17 2013 X-DSPAM-Confidence: 0.9965 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 51b83eb9254715214434796 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, >+On, 0.00059, FreeBSD, 0.00074, FreeBSD, 0.00074, )+>, 0.00139, wrote+>>, 0.00147, wrote+>, 0.00218, On+Tue, 0.00242, >+of, 0.00279, >+of, 0.00279, >+>>, 0.00321, 2013+at, 0.00348, >+>, 0.00371, >+>, 0.00371, References*fsn.hu>, 0.00371, the+server, 0.00397, >>+>>, 0.00428, something+like, 0.00463, parameters, 0.00463, >+have, 0.00463, queue, 0.00529, wrote, 0.00535, wrote, 0.00535, ZFS, 0.00555, ZFS, 0.00555, >+If, 0.00555, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id C30B31095FF4; Wed, 12 Jun 2013 11:26:06 +0200 (CEST) Message-ID: <51B83EAE.7060603@fsn.hu> Date: Wed, 12 Jun 2013 11:26:06 +0200 From: Attila Nagy MIME-Version: 1.0 To: "Kenneth D. Merry" Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS References: <51B79023.5020109@fsn.hu> <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca> <20130611232124.GA42577@nargothrond.kdm.org> In-Reply-To: <20130611232124.GA42577@nargothrond.kdm.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 09:26:20 -0000 Hi, On 06/12/13 01:21, Kenneth D. Merry wrote: > On Tue, Jun 11, 2013 at 17:20:09 -0400, Rick Macklem wrote: >> >> ken@ recently committed a change to the new NFS server to add file >> handle affinity support to it. He reported that he had found that, >> without file handle affinity, that ZFS's sequential reading heuristic >> broke badly (or something like that, you can probably find the email >> thread or maybe he will chime in). > That is correct. The problem, if the I/O is sequential, is that simultaneous > requests for adjacent blocks in a file will get farmed out to different The IO is pretty much random, and the files aren't so big either (mean size around 400k). > threads in the NFS server. These can easily go down into ZFS out of order, > and make the ZFS prefetch code think that the file is not being read > sequentially. It blows away the zfetch stream, and you wind up with a lot > of I/O bandwidth getting used (with a lot of prefetching done and then > re-done), but not much performance. I've tried disabling prefetch, without any noticeable effects. > > Linux clients are more likely than FreeBSD and MacOS clients to queue a lot > of reads to the server. The clients are also FreeBSD (8.3 and 7.2 mostly). Running NFSv3 of course. > >> Anyhow, you could try switching the FreeBSD 9 system to use the old >> NFS server (assuming your clients are doing NFSv3 mounts) and see if >> that has a significant effect. (For FreeBSD9, the old server has file >> handle affinity, but the new server does not.) > If using the old NFS server helps, then the FHA code for the new server > will help as well. Perhaps more, because the default FHA tuning parameters > have changed somewhat and parallel writes are now possible. > > If you want to try out the FHA changes in stable/9, I just MFCed them, > change 251641. > Sure, I will try both 251641 and the old nfsd. Thanks, From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 09:37:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C5DE7BDB for ; Wed, 12 Jun 2013 09:37:01 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id 4E4851D01 for ; Wed, 12 Jun 2013 09:37:00 +0000 (UTC) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 2350841C0A4; Wed, 12 Jun 2013 11:36:44 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id AsLmWF+lNvDh; Wed, 12 Jun 2013 11:36:42 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 8E7BB41C0A6; Wed, 12 Jun 2013 11:36:41 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id A53A973A1C; Wed, 12 Jun 2013 02:36:39 -0700 (PDT) Date: Wed, 12 Jun 2013 02:36:39 -0700 From: Jeremy Chadwick To: Mikolaj Golub Subject: Re: hast: can't restore after disk failure Message-ID: <20130612093639.GA9219@icarus.home.lan> References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> <20130612084453.GA55502@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130612084453.GA55502@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 09:37:01 -0000 On Wed, Jun 12, 2013 at 11:44:54AM +0300, Mikolaj Golub wrote: > On Wed, Jun 12, 2013 at 12:23:52AM +0400, Dmitry Morozovsky wrote: > > On Tue, 11 Jun 2013, Mikolaj Golub wrote: > > > > > On Tue, Jun 11, 2013 at 12:40:08AM +0400, Dmitry Morozovsky wrote: > > > > On Mon, 10 Jun 2013, Mikolaj Golub wrote: > > > > > > > > [snipall] > > > > > > > > > > Jun 10 16:56:20 cthulhu3 kernel: Jun 10 16:56:20 > > > > > > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > > > > > > (pid=14380, exitcode=66). > > > > > > > > > > > > Any hints? Thanks! > > > > > > > > > > Have you run hastctl create to initialize metadata? > > > > > > > > Yes, but did it naively: > > > > > > > > hastctl create d1 > > > > > > No errors? > > > > no visible, but hast instance ungracefully exits > > > > > > and status still reported 0 as provider size... > > > > > > I assume /dev/ada1p1 is present and readable/writable? > > > > > > Symptoms are like if it did not exist. > > > > nope, it does: > > > > root@cthulhu3:/# diskinfo /dev/ada1p1 > > /dev/ada1p1 512 999654686720 1952450560 0 1048576 1936954 16 63 > > root@cthulhu3:/# diskinfo /dev/ada0p1 > > /dev/ada0p1 512 999653638144 1952448512 0 1048576 1936952 16 63 > > > > Hm, looking in the source where this error is generated: > > cthulhu3 hastd[14379]: [d1] (secondary) Unable to read metadata from /dev/ada1p1: No such file or directory. > > it looks like hastd successfully read metadata from disk but failed to > parse it (did not found an entry). This usually happens when metadata > is not initialized by `hastctl create`. > > Does `hastctl dump d1' not work too? Note up front: I have zero familiarity with hast stuff. I'm just looking at source code, because your comment seems to indicate that ENOENT (errno 2; No such file or directory) is actually false/incorrect. I did spend almost 30 minutes digging through the hastd code. This is hard to follow -- very specifically, the error/errno situational code. It's a very deep rabbit hole. Variable names are common or re-used (legitimately due to local scope), and the actual error that gets printed comes directly from the global errno variable. I honestly cannot see how nv->nv_error (which is what nv_error() returns) gets set to ENOENT within the function call stack: - metadata_read() is what prints the error (line 152 in nv.c) - Error printing done by pjdlog_errno(), which uses the global errno to print its errors - nv = nv_ntoh(eb) - nv_ntoh() sets nv->nv_error to 0 initially, but then calls nv_validate() later on which can modify nv->error - nv_validate() explicitly sets error (which later can get assigned to nv->nv_error) to EINVAL in many cases, but not ENOENT. Therefore, I am honestly not sure how ENOENT gets returned to the user in this case. It looks like it's a misleading errno and is probably meant to be something else. If it's correct, I would absolutely love for someone to show me how/where. The code is here: http://svnweb.freebsd.org/base/stable/9/sbin/hastd/ -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 10:03:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9D1311F7; Wed, 12 Jun 2013 10:03:44 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-lb0-f170.google.com (mail-lb0-f170.google.com [209.85.217.170]) by mx1.freebsd.org (Postfix) with ESMTP id E8EFA1E76; Wed, 12 Jun 2013 10:03:43 +0000 (UTC) Received: by mail-lb0-f170.google.com with SMTP id t13so3407885lbd.29 for ; Wed, 12 Jun 2013 03:03:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=PhSENIOZW89kxYWmWCiHOoqiJQlNeu7Eb253VXpuTYM=; b=NsdZGVfYlJ8EKsvNM6ThSOsVNXcfkVt1pyk8NdHNPwKws9GS/K1aoVdYh41OsOWtpn 00Kss3TaKG9QH7mHTIw6WAYJeVYGf9TqoU5J+gjBPonkt0Wb+Mhx3SsDCE8F6QN5h0FH 2hp+ezWq7dY3ze4C60/vyEdJTLMJPC/i4GuZMTAyoRtByPOxstYn+lNSIyp5B6redOqT a/9sLUE4zJMwLRI9DYYWzUKVp0yYz5nQED3MzalSlaZ1bgtGG10vUrnfO31/s0P02z8v BiRb/ZxvaPe7CGJDtldtbiVTAXegDAhMRAmoaiHdmXlDUNhpwdH2yvORvoG2W4Ei7A9U +7DQ== X-Received: by 10.112.181.71 with SMTP id du7mr10509975lbc.24.1371031416709; Wed, 12 Jun 2013 03:03:36 -0700 (PDT) Received: from localhost ([188.230.122.226]) by mx.google.com with ESMTPSA id n3sm2301111lag.9.2013.06.12.03.03.34 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 12 Jun 2013 03:03:35 -0700 (PDT) Date: Wed, 12 Jun 2013 13:03:33 +0300 From: Mikolaj Golub To: Jeremy Chadwick Subject: Re: hast: can't restore after disk failure Message-ID: <20130612100332.GB55502@gmail.com> References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> <20130612084453.GA55502@gmail.com> <20130612093639.GA9219@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130612093639.GA9219@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 10:03:44 -0000 On Wed, Jun 12, 2013 at 02:36:39AM -0700, Jeremy Chadwick wrote: > I honestly cannot see how nv->nv_error (which is what nv_error() > returns) gets set to ENOENT within the function call stack: > > - metadata_read() is what prints the error (line 152 in nv.c) > - Error printing done by pjdlog_errno(), which uses the global errno > to print its errors > - nv = nv_ntoh(eb) > - nv_ntoh() sets nv->nv_error to 0 initially, but then calls > nv_validate() later on which can modify nv->error > - nv_validate() explicitly sets error (which later can get assigned > to nv->nv_error) to EINVAL in many cases, but not ENOENT. > > Therefore, I am honestly not sure how ENOENT gets returned to the user > in this case. It looks like it's a misleading errno and is probably > meant to be something else. If it's correct, I would absolutely love > for someone to show me how/where. nv_find() (which is used by nv_get_* functions) sets ENOENT when it fails. "No such file or directory" really looks confusing in this case. I am not sure what a code from errno.h would be better here though. ENOATTR? -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 10:41:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 55D64A50; Wed, 12 Jun 2013 10:41:51 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id D380010D6; Wed, 12 Jun 2013 10:41:50 +0000 (UTC) Received: from mfilter1-d.gandi.net (mfilter1-d.gandi.net [217.70.178.130]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id CE33341C07E; Wed, 12 Jun 2013 12:41:39 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter1-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter1-d.gandi.net (mfilter1-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id IQBXtwfEe7Xz; Wed, 12 Jun 2013 12:41:38 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id BA6AD41C090; Wed, 12 Jun 2013 12:41:37 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id EA25573A1C; Wed, 12 Jun 2013 03:41:35 -0700 (PDT) Date: Wed, 12 Jun 2013 03:41:35 -0700 From: Jeremy Chadwick To: Mikolaj Golub Subject: Re: hast: can't restore after disk failure Message-ID: <20130612104135.GA11495@icarus.home.lan> References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> <20130612084453.GA55502@gmail.com> <20130612093639.GA9219@icarus.home.lan> <20130612100332.GB55502@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130612100332.GB55502@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 10:41:51 -0000 On Wed, Jun 12, 2013 at 01:03:33PM +0300, Mikolaj Golub wrote: > On Wed, Jun 12, 2013 at 02:36:39AM -0700, Jeremy Chadwick wrote: > > > I honestly cannot see how nv->nv_error (which is what nv_error() > > returns) gets set to ENOENT within the function call stack: > > > > - metadata_read() is what prints the error (line 152 in nv.c) > > - Error printing done by pjdlog_errno(), which uses the global errno > > to print its errors > > - nv = nv_ntoh(eb) > > - nv_ntoh() sets nv->nv_error to 0 initially, but then calls > > nv_validate() later on which can modify nv->error > > - nv_validate() explicitly sets error (which later can get assigned > > to nv->nv_error) to EINVAL in many cases, but not ENOENT. > > > > Therefore, I am honestly not sure how ENOENT gets returned to the user > > in this case. It looks like it's a misleading errno and is probably > > meant to be something else. If it's correct, I would absolutely love > > for someone to show me how/where. > > nv_find() (which is used by nv_get_* functions) sets ENOENT when it > fails. How wonderful -- when I reviewed the code, I thought "Oh surely those can't be responsible...". I did see nv_find(), but I did not think nv_get_*() would call that. My fault/failure. > "No such file or directory" really looks confusing in this case. I am > not sure what a code from errno.h would be better here though. ENOATTR? Sorry to make this longer than it needs to be, but I'm brain dumping: What exactly is the error condition that is happening in the above case? All I read was that the partition size differed between nodes and that this caused the issue? IMO, that condition should be checked and handled elegantly, and that the error message should not use an errno at all but instead just tell the user about the device size mismatch between nodes (for that specific device) -- the device sizes must match between both nodes, correct? There must be some kind of communication protocol between the nodes that can indicate something along those lines. If an errno is really needed, ENOATTR isn't relevant (that's referring to extended filesystem attributes). See intro(2) for the official explanation of all of them. I would choose EIO, ENXIO, ENOSPC, EOPNOTSUPP, or EPROTO. I have not looked at what OpenBSD and NetBSD have for errno.h. That might be good to do first. Else, Linux has some errno.h entries in it which look like they might be more relevant, such as EBADFD, EREMOTEIO, or EMEDIUMTYPE (this one might be a bit misleading). http://www.virtsync.com/c-error-codes-include-errno Some of these are even part of our recent BSM audit(2) stuff; check out include/bsm/audit_errno.h (some are Solaris specific but look like they might help, and I see some duplicates between those and what Linux has too). Important: I do not know the implications of adding/enhancing errno. POSIX is involved, thus it would be wise to ask Bruce Evans. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 11:40:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 39B6C4AC for ; Wed, 12 Jun 2013 11:40:34 +0000 (UTC) (envelope-from feld@feld.me) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id 0FD5A1383 for ; Wed, 12 Jun 2013 11:40:33 +0000 (UTC) Received: from compute2.internal (compute2.nyi.mail.srv.osa [10.202.2.42]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 1D7D6202B4 for ; Wed, 12 Jun 2013 07:40:33 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute2.internal (MEProxy); Wed, 12 Jun 2013 07:40:33 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=MqodeHbV4GO4rRE2iQzLLhCKKRo=; b=fJGym8QTZVVQzeTF88IeB efygSQzPvPGWgxKUyirhfujGYSTxgDXK3MKGJT60XAgbSIaqFCT5HpsURPuWo219 WW05ch1viVxTuTSHA6/3B3qMXazCiRlA72rdxrkUmahT3s53HaFVlrw2B3nKRVEe M+/MVn+wlol/3/xQzgiTtA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:subject:references:date :mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=MqodeHbV4GO4rRE2iQzLLhCKKRo=; b=LSz2 iGPrzqJRGBBN/hPjlirxb9zTAkpZ/C66BY0a0+ICxBhnuwb3mEZKKIoELRC0cMtR rrtXIqCfN0MDUYTR1qdk/YkhLSxTlxyYnG6OvxBffEmYRkFCQtV4UY839buicS1q 71G3YVcyKDOx+3tKJUBbfCZRSUgPfphNUmehiCI= X-Sasl-enc: LAyeIfZX08QsWlq/ru4x6uK3D9LLy5QwEfPn9Ufsd6Bb 1371037232 Received: from markf.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id DB7B4C00E81 for ; Wed, 12 Jun 2013 07:40:32 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS References: <51B79023.5020109@fsn.hu> Date: Wed, 12 Jun 2013 06:40:32 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: <51B79023.5020109@fsn.hu> User-Agent: Opera Mail/12.15 (FreeBSD) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 11:40:34 -0000 On Tue, 11 Jun 2013 16:01:23 -0500, Attila Nagy wrote: > BTW, the file systems are 77-78% full according to df (so ZFS holds > more, because UFS is -m 8). ZFS write performance can begin to drop pretty badly when you get around 80% full. I've not seen any benchmarks showing an improvement with a very fast and large ZIL or tons of memory, but I'd expect that would help significantly. Just note that you're right at the edge where performance gets impacted. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 11:47:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C80D96AD for ; Wed, 12 Jun 2013 11:47:34 +0000 (UTC) (envelope-from ira@wakeful.net) Received: from mail-ob0-x234.google.com (mail-ob0-x234.google.com [IPv6:2607:f8b0:4003:c01::234]) by mx1.freebsd.org (Postfix) with ESMTP id 960FA145E for ; Wed, 12 Jun 2013 11:47:34 +0000 (UTC) Received: by mail-ob0-f180.google.com with SMTP id eh20so13318750obb.11 for ; Wed, 12 Jun 2013 04:47:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=NEvlZ61Rl9T6Zvl9YEVZ72bKeAoSOc4bQ8uf+x6JMbE=; b=EvCHKBMmO5+b5fyL/xLn9mpYOLKn/liniIFHKWFUlWs8M+0saDKDUe5e8FJEZYRrpE EeiWqLgwcxnOkuF6Y9gjTpXSxxk85+imGDRfkyXIw+Dxk/jWqNnpK1CJ7GIlICokWq77 v1rxBpvzAvsCE30ZYq8V+agjWvLrhyewEIlswfd+vZSxg9NxsPV1W/aCF5/UEYTwCMBo N5rzEnG2U2Dd4oVr8gMdLSsEzkZ+FbaY3Bc6cPXedFRuM5slTAVMnNF2+48LmPbMpCG+ QHIqs3PL3BIWmouWm0dL4ZGiuQIq3JAeoczxrCa4+c117MZGXCzIS8HpW/jAyqegmW3e KrEw== X-Received: by 10.182.237.50 with SMTP id uz18mr15126535obc.51.1371037653970; Wed, 12 Jun 2013 04:47:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.154.202 with HTTP; Wed, 12 Jun 2013 04:47:13 -0700 (PDT) In-Reply-To: References: <51B79023.5020109@fsn.hu> From: Ira Cooper Date: Wed, 12 Jun 2013 07:47:13 -0400 Message-ID: Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS To: Mark Felder X-Gm-Message-State: ALoCoQkHJ6W87MNF1GWkWnEqkwDkartlbG3olS7l3bPYQvhpU1BzmDG1Q1K5+4e1PZUo8URAg/T0 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 11:47:34 -0000 On Wed, Jun 12, 2013 at 7:40 AM, Mark Felder wrote: > On Tue, 11 Jun 2013 16:01:23 -0500, Attila Nagy wrote: > > BTW, the file systems are 77-78% full according to df (so ZFS holds more, >> because UFS is -m 8). >> > > ZFS write performance can begin to drop pretty badly when you get around > 80% full. I've not seen any benchmarks showing an improvement with a very > fast and large ZIL or tons of memory, but I'd expect that would help > significantly. Just note that you're right at the edge where performance > gets impacted. > > If it matches what illumos does. You jump off the same cliff. -Ira From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 11:49:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7759D7E2 for ; Wed, 12 Jun 2013 11:49:59 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 36A69148F for ; Wed, 12 Jun 2013 11:49:59 +0000 (UTC) Received: from mfilter21-d.gandi.net (mfilter21-d.gandi.net [217.70.178.149]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 7AC14A80FB; Wed, 12 Jun 2013 13:49:41 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter21-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter21-d.gandi.net (mfilter21-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id TugAw4oebJsM; Wed, 12 Jun 2013 13:49:39 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id A8520A80C4; Wed, 12 Jun 2013 13:49:39 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id D0B3C73A1C; Wed, 12 Jun 2013 04:49:37 -0700 (PDT) Date: Wed, 12 Jun 2013 04:49:37 -0700 From: Jeremy Chadwick To: Mark Felder Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS Message-ID: <20130612114937.GA13688@icarus.home.lan> References: <51B79023.5020109@fsn.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 11:49:59 -0000 On Wed, Jun 12, 2013 at 06:40:32AM -0500, Mark Felder wrote: > On Tue, 11 Jun 2013 16:01:23 -0500, Attila Nagy wrote: > > >BTW, the file systems are 77-78% full according to df (so ZFS > >holds more, because UFS is -m 8). > > ZFS write performance can begin to drop pretty badly when you get > around 80% full. I've not seen any benchmarks showing an improvement > with a very fast and large ZIL or tons of memory, but I'd expect > that would help significantly. Just note that you're right at the > edge where performance gets impacted. Mark, do you have any references for this? I'd love to learn/read more about this engineering/design aspect (I won't say flaw, I'll just say aspect) to ZFS, as it's the first I've heard of it. The reason I ask: (respectfully, not judgementally) I'm worried you might be referring to something that has to do with SSDs and not ZFS, specifically SSD wear-levelling performing better with lots of free space (i.e. a small FTL map; TRIM helps with this immensely) -- where the performance hit tends to begin around the 70-80% mark. (I can talk more about that if asked, but want to make sure the two things aren't being mistaken for one another) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 11:55:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DA9B0A99; Wed, 12 Jun 2013 11:55:46 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-la0-x232.google.com (mail-la0-x232.google.com [IPv6:2a00:1450:4010:c03::232]) by mx1.freebsd.org (Postfix) with ESMTP id 33FB71612; Wed, 12 Jun 2013 11:55:46 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id dy20so5576074lab.37 for ; Wed, 12 Jun 2013 04:55:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=RfCbMrcVnFlgHYM+ob+fOEVu5PC47ppzLGMo04pTGjI=; b=hizyH0WYpQmELz40hHpJoGgEjGDONOc71aspR6La4jHyBLqWM2V52ybyw7kBpkZcDN ixaBEUhgFqlwzK83WPntKITIADXgDTKY5NnV8qJ+TPdlx/b1O3+Ux5CEXVCPMwkH3N6Q 27BlVffSdl+z+hmD3bLYZbi+W51PDZBAO75UfolW7pP4rhlRN/cCwo+CQ9h+tF+kD7Sa G/rEoCh8TvEmWIkOwWLEuufJP0ZD5f4Imts7Fd02L1V2hRGFlqdnnvb2EJqdzy3DocH4 a/OGggVCodkNmS/z6XQgpxSvDYCSf8KEIk6YAS2Nu0j94uX6Cl3pvV3hsFD282Q67lBw qkjA== X-Received: by 10.152.28.66 with SMTP id z2mr3528753lag.5.1371038145172; Wed, 12 Jun 2013 04:55:45 -0700 (PDT) Received: from localhost ([188.230.122.226]) by mx.google.com with ESMTPSA id n1sm7913405lae.0.2013.06.12.04.55.43 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 12 Jun 2013 04:55:44 -0700 (PDT) Date: Wed, 12 Jun 2013 14:55:41 +0300 From: Mikolaj Golub To: Jeremy Chadwick Subject: Re: hast: can't restore after disk failure Message-ID: <20130612115540.GC55502@gmail.com> References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> <20130612084453.GA55502@gmail.com> <20130612093639.GA9219@icarus.home.lan> <20130612100332.GB55502@gmail.com> <20130612104135.GA11495@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130612104135.GA11495@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 11:55:47 -0000 On Wed, Jun 12, 2013 at 03:41:35AM -0700, Jeremy Chadwick wrote: > On Wed, Jun 12, 2013 at 01:03:33PM +0300, Mikolaj Golub wrote: > > On Wed, Jun 12, 2013 at 02:36:39AM -0700, Jeremy Chadwick wrote: > > > > > I honestly cannot see how nv->nv_error (which is what nv_error() > > > returns) gets set to ENOENT within the function call stack: > > > > > > - metadata_read() is what prints the error (line 152 in nv.c) > > > - Error printing done by pjdlog_errno(), which uses the global errno > > > to print its errors > > > - nv = nv_ntoh(eb) > > > - nv_ntoh() sets nv->nv_error to 0 initially, but then calls > > > nv_validate() later on which can modify nv->error > > > - nv_validate() explicitly sets error (which later can get assigned > > > to nv->nv_error) to EINVAL in many cases, but not ENOENT. > > > > > > Therefore, I am honestly not sure how ENOENT gets returned to the user > > > in this case. It looks like it's a misleading errno and is probably > > > meant to be something else. If it's correct, I would absolutely love > > > for someone to show me how/where. > > > > nv_find() (which is used by nv_get_* functions) sets ENOENT when it > > fails. > > How wonderful -- when I reviewed the code, I thought "Oh surely those > can't be responsible...". I did see nv_find(), but I did not think > nv_get_*() would call that. My fault/failure. > > > "No such file or directory" really looks confusing in this case. I am > > not sure what a code from errno.h would be better here though. ENOATTR? > > Sorry to make this longer than it needs to be, but I'm brain dumping: > > What exactly is the error condition that is happening in the above case? > All I read was that the partition size differed between nodes and that > this caused the issue? As I wrote it before the error was that hastd failed to parse metadata it had read from the local disk (failed to find some entry in metadata structure). Usually this happens when metadata is not properly initialized for a new disk or corrupted. Different data sizes should trigger the error "Data size differs between nodes ..." on primary. Unfortunately I have not seen full logs from primary and secondary, so it is difficult to me to guess what was going on there. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 12:04:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DAD43E71 for ; Wed, 12 Jun 2013 12:04:01 +0000 (UTC) (envelope-from feld@feld.me) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id AFDB4169D for ; Wed, 12 Jun 2013 12:04:01 +0000 (UTC) Received: from compute4.internal (compute4.nyi.mail.srv.osa [10.202.2.44]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 7061C20E1D; Wed, 12 Jun 2013 08:03:59 -0400 (EDT) Received: from frontend2.nyi.mail.srv.osa ([10.202.2.161]) by compute4.internal (MEProxy); Wed, 12 Jun 2013 08:04:00 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:cc:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=H/qxRRnrMV/D1SIGS6r2tLHnGlw=; b=Cirh4FoyepQSMn+zAtWSs c0SvqAaMh4QVL07s3XnKEmKbzpreUOe3UFPnVgfwXBOlufqfyKSgIkybjDhfjZlt Gdhnoj6hvzajR33Hvo+/bbJ+bseUPuMRC++6Q8xcsgSahe6XN0JAZZPfE+oZr/Nk YFzzt9lqqEQAHDVo91y/Js= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:cc:subject:references :date:mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=H/qxRRnrMV/D1SIGS6r2tLHnGlw=; b=JyaT jcphGy4OyitIwiq6ndaGeRrgwY+uCePpxjKtR3KqfRZRTu5Xqvn37rBpC/K+oMOV nbKpAddQ8zuh3uqM+78QOp700UgBpunySYbCBH5j9YbN/39SJiGtqDo29EOnXB1x 0PIsHxjrUsnE5St7AilZ0KFmRZwnzx21BEVBmik= X-Sasl-enc: 2QnJhI0Rp23FCEfkf4xV6ybxSRXJBZxjnOgnLX6JljYw 1371038639 Received: from markf.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id 5A6236801F3; Wed, 12 Jun 2013 08:03:59 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "Jeremy Chadwick" Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS References: <51B79023.5020109@fsn.hu> <20130612114937.GA13688@icarus.home.lan> Date: Wed, 12 Jun 2013 07:03:58 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: <20130612114937.GA13688@icarus.home.lan> User-Agent: Opera Mail/12.15 (FreeBSD) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 12:04:01 -0000 On Wed, 12 Jun 2013 06:49:37 -0500, Jeremy Chadwick wrote: > > Mark, do you have any references for this? I'd love to learn/read more > about this engineering/design aspect (I won't say flaw, I'll just say > aspect) to ZFS, as it's the first I've heard of it. Firsthand experience on a couple servers, and some old Sun docs that I can't find anymore since Oracle broke the links. If you start googling for "ZFS performance 80%" you should come across similar reports. The recommendation was always that when you hit about 80% you need to add a new vdev or you'll be in serious trouble. I'd always believed that it has to do with the way the ZFS COW algorithm works. If my suspicion is correct I'd guess it probably stalls trying to find an ideal place to write -- maybe some cost calculation? I'm reaching for straws now because I don't know anything about the code itself. I'd love to hear from people who have actually touched the code and can give a more definitive answer because this does border on "urban legend" territory, but I've read it and experienced it a few times so I'm just passing it on. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 14:52:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D4E0E8DE for ; Wed, 12 Jun 2013 14:52:44 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pa0-x235.google.com (mail-pa0-x235.google.com [IPv6:2607:f8b0:400e:c03::235]) by mx1.freebsd.org (Postfix) with ESMTP id B397F117F for ; Wed, 12 Jun 2013 14:52:44 +0000 (UTC) Received: by mail-pa0-f53.google.com with SMTP id tj12so4953867pac.40 for ; Wed, 12 Jun 2013 07:52:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=M2sQqeX5UwmfBtYtkhEwPwU8b2dVxUJm0BCuAvYFtz0=; b=OaswgK0E744eT/uWgAQIGtBy9EzaHoz8LBS0j067xfOAviYZy9JKc9IINhyhrBNK1v Lh3Ggl+r4tk7/t9ohtAAzKfkxfVFLy2ma5Onj+VBsBTmWwCkUCirx4WmjYbdLJ59VOvJ mEtgKGep89z2Sgioc1zokwHGLk+rXxDUGtttLgOlYsC/yxSW1/ZpmBZBMZumV8PriS4/ zZ5fhWbz9PNJ9n6zj9sQWCdEv5WvBB624MRAR4e5tk/12IJxLnNjxEzzX5XvkeNFgnXb FdkMWN3+Yf+bSYdCRgoZmUBehUtW9hkl49iS7T8jQEMIKG2BznIiO3W03nF6/vJyiE6j bRaw== MIME-Version: 1.0 X-Received: by 10.68.203.161 with SMTP id kr1mr19688590pbc.192.1371048763895; Wed, 12 Jun 2013 07:52:43 -0700 (PDT) Received: by 10.70.31.195 with HTTP; Wed, 12 Jun 2013 07:52:43 -0700 (PDT) In-Reply-To: <20130612114937.GA13688@icarus.home.lan> References: <51B79023.5020109@fsn.hu> <20130612114937.GA13688@icarus.home.lan> Date: Wed, 12 Jun 2013 09:52:43 -0500 Message-ID: Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS From: Adam Vande More To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 14:52:44 -0000 On Wed, Jun 12, 2013 at 6:49 AM, Jeremy Chadwick wrote: > Mark, do you have any references for this? I'd love to learn/read more > about this engineering/design aspect (I won't say flaw, I'll just say > aspect) to ZFS, as it's the first I've heard of it. Recently, I dd'ed out the free space on a ZFS volume. The last few MB's took like an hour and io seemed to drop exponentially once past 80% or so. Nothing to do with SSD's -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 16:03:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 628DC28D for ; Wed, 12 Jun 2013 16:03:40 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop03b.sare.net (proxypop03b.sare.net [194.30.0.251]) by mx1.freebsd.org (Postfix) with ESMTP id 285761855 for ; Wed, 12 Jun 2013 16:03:39 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop03.sare.net (Postfix) with ESMTPSA id 8B3959DD057; Wed, 12 Jun 2013 17:57:13 +0200 (CEST) Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: <20130612114937.GA13688@icarus.home.lan> Date: Wed, 12 Jun 2013 17:57:13 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <8FF3DAC5-ED3D-4678-B040-74829A208A86@sarenet.es> References: <51B79023.5020109@fsn.hu> <20130612114937.GA13688@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1085) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 16:03:40 -0000 On Jun 12, 2013, at 1:49 PM, Jeremy Chadwick wrote: > Mark, do you have any references for this? I'd love to learn/read = more > about this engineering/design aspect (I won't say flaw, I'll just say > aspect) to ZFS, as it's the first I've heard of it. I have seen that behavior with standard hard disks. Once the busy space = reached 80 % performance dropped significantly. Just deleting some old data (it is a log storage = system) performance went back to normal. Sorry I don't have graphs or anything like that. What I noticed is that = the disks were "busier" per the %busy column in gstat(8). Borja. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 17:59:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D10D55FC for ; Wed, 12 Jun 2013 17:59:42 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ie0-x242.google.com (mail-ie0-x242.google.com [IPv6:2607:f8b0:4001:c03::242]) by mx1.freebsd.org (Postfix) with ESMTP id AC70216FC for ; Wed, 12 Jun 2013 17:59:42 +0000 (UTC) Received: by mail-ie0-f194.google.com with SMTP id 9so1834772iec.5 for ; Wed, 12 Jun 2013 10:59:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=5m7cEcPBUjlr+wRt2UuWBIGRBr3N16Jvwkxs5XAVqdI=; b=d7Ce8tGAZejQZ0XNCNXEAs6lLQxCAbDTmgkrM/HYZyLdposbmeWlVeWIQCtW9lyuD9 YZrIslEk0TT8ObGqKGNjhiF0fiwZCBVMX6ueo9ASxCxr8uR52q0Ut8+1EEIO/ccRat13 t/ZSMFmL0b5+ot/xTpl4+bUbkS5DOFPFYnoFpsp83C9cGTU/SpUQ+BoOKo7pQcx/mJs8 pbexftTIwj/lx9hPM93bT/HdRACZajRFc+jHxqpuHGfD+fNVsDEj7rItIfiFWqHYuUse AjpbSzbf+6OX11PL6x5SbnSGfoVXc0zeTn3DpZFZt8Dz6RY+TS3uHarN+8VxRl4fgz8w 7s9g== MIME-Version: 1.0 X-Received: by 10.50.83.37 with SMTP id n5mr4022003igy.44.1371059982336; Wed, 12 Jun 2013 10:59:42 -0700 (PDT) Received: by 10.64.139.34 with HTTP; Wed, 12 Jun 2013 10:59:42 -0700 (PDT) Date: Wed, 12 Jun 2013 10:59:42 -0700 Message-ID: Subject: FFS: fsck doesn't match doc From: Dieter BSD To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 17:59:42 -0000 Anyone have thoughts on bin/166499: fsck(8) behaviour does not match doc (PARTIALLY TRUNCATED INODE)? This PR has been sitting around for over a year with no comments or action. Seems to me it should appear on the list of open fs bugs, but it doesn't? A process is running, appending data to a file (*NOT* truncating the file as the doc claims!). Machine panics or otherwise goes down badly. Fsck whines about PARTIALLY TRUNCATED INODE and that fs doesn't get mounted until I run fsck manually. This happens nearly every time the machine goes down. More details are in the PR. Would it be safe to have fsck automagically fix this problem, as the doc (incorrectly) says it does? From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 18:01:27 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 324A57C7 for ; Wed, 12 Jun 2013 18:01:27 +0000 (UTC) (envelope-from nowakpl@platinum.linux.pl) Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id E86481728 for ; Wed, 12 Jun 2013 18:01:26 +0000 (UTC) Received: by platinum.linux.pl (Postfix, from userid 87) id 696175FD06; Wed, 12 Jun 2013 19:55:13 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl X-Spam-Level: X-Spam-Status: No, score=-1.3 required=3.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.3.2 Received: from [10.255.0.2] (c38-073.client.duna.pl [83.151.38.73]) by platinum.linux.pl (Postfix) with ESMTPA id DD1995FD05 for ; Wed, 12 Jun 2013 19:55:12 +0200 (CEST) Message-ID: <51B8B5DC.2010703@platinum.linux.pl> Date: Wed, 12 Jun 2013 19:54:36 +0200 From: Adam Nowacki User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS References: <51B79023.5020109@fsn.hu> <20130612114937.GA13688@icarus.home.lan> In-Reply-To: <20130612114937.GA13688@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 18:01:27 -0000 On 2013-06-12 13:49, Jeremy Chadwick wrote: > On Wed, Jun 12, 2013 at 06:40:32AM -0500, Mark Felder wrote: >> On Tue, 11 Jun 2013 16:01:23 -0500, Attila Nagy wrote: >> >>> BTW, the file systems are 77-78% full according to df (so ZFS >>> holds more, because UFS is -m 8). >> >> ZFS write performance can begin to drop pretty badly when you get >> around 80% full. I've not seen any benchmarks showing an improvement >> with a very fast and large ZIL or tons of memory, but I'd expect >> that would help significantly. Just note that you're right at the >> edge where performance gets impacted. > > Mark, do you have any references for this? I'd love to learn/read more > about this engineering/design aspect (I won't say flaw, I'll just say > aspect) to ZFS, as it's the first I've heard of it. > > The reason I ask: (respectfully, not judgementally) I'm worried you > might be referring to something that has to do with SSDs and not ZFS, > specifically SSD wear-levelling performing better with lots of free > space (i.e. a small FTL map; TRIM helps with this immensely) -- where > the performance hit tends to begin around the 70-80% mark. (I can talk > more about that if asked, but want to make sure the two things aren't > being mistaken for one another) > So I went hunting for some evidence and created this: http://tepeserwery.pl/nowak/fillingzfs.png Columns are groups of sectors, new row is created every time a FLUSH command is sent to a disk. Percentage is the amount of filled space in the pool. Red means a write happened there, Pool is 1GB with writes of 50MB between black lines. It looks like past 80% there simply isn't enough continuous disk space and writes are becoming more and more random. For some unknown to me reason there is also a lot more flushing which certainly doesn't help for performance. There is also this odd hole left untouched by any write, reserved space of some sort? From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 18:26:02 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2E371AE3 for ; Wed, 12 Jun 2013 18:26:02 +0000 (UTC) (envelope-from feld@feld.me) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id 03C9E1935 for ; Wed, 12 Jun 2013 18:26:01 +0000 (UTC) Received: from compute3.internal (compute3.nyi.mail.srv.osa [10.202.2.43]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 05FC120D9D; Wed, 12 Jun 2013 14:26:01 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute3.internal (MEProxy); Wed, 12 Jun 2013 14:26:01 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=u/0kpl7t3ihQj1wittijxKv4Gbw=; b=oJjaWMOIxAJ1MbBj+aWpg lfNl3PmiM1sJSOEq7YLrDZWmrPvftNLdVXcUC7sOBgHfs6kp2NfMqsB3g9220Hq7 RnUddj7Y9PFhAs2e9iBI9DTzbldZ0PLBxSiWT3dl0Tv7BjDtbCZangLMrktalZ0A cWk8M92TGAbz51k1lenhjg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:subject:references:date :mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=u/0kpl7t3ihQj1wittijxKv4Gbw=; b=f7Vi HCTCj/+5+zuQDirlpHrvVWD8UchLYUR3UzcPoH4r5fNPSXGMb0Nr9A7cfcZi/fyI f4GtgGN7wzROiChqvtP+8OdnlxY0eDK94q08XhpEw/cry3YOMn5p9JTXDmQnaXE9 xrQUeahIfZyBjJraFuXHTae4Gpq4s66ASfDSFvA= X-Sasl-enc: yDd1pRBUgMhlL5fFnSaS1iMdVyzjpNH4XWMWYKt9Uzwx 1371061560 Received: from markf.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id BDE01C00E89; Wed, 12 Jun 2013 14:26:00 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org, "Dieter BSD" Subject: Re: FFS: fsck doesn't match doc References: Date: Wed, 12 Jun 2013 13:26:00 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.15 (FreeBSD) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 18:26:02 -0000 On Wed, 12 Jun 2013 12:59:42 -0500, Dieter BSD wrote: > > Would it be safe to have fsck automagically fix this problem, as the > doc (incorrectly) says it does? What happens if you add to /etc/rc.conf: fsck_y_enable="YES" background_fsck="NO" From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 19:33:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B91FD5E7 for ; Wed, 12 Jun 2013 19:33:05 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ie0-x241.google.com (mail-ie0-x241.google.com [IPv6:2607:f8b0:4001:c03::241]) by mx1.freebsd.org (Postfix) with ESMTP id 948451D06 for ; Wed, 12 Jun 2013 19:33:05 +0000 (UTC) Received: by mail-ie0-f193.google.com with SMTP id s9so2965825iec.0 for ; Wed, 12 Jun 2013 12:33:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=UpoVG02InOy9kZbG+fYfCilx6D5auDSVDyaKh2AUPkI=; b=kPuycVQX4CwURidkKjEmiHq3FRZb3kAlY03QXbbG898Uto4MBzPaT0EMPsNDlf9R1U AK2aT+mZtjxHp/Di6jUtski2OjAF/BvFNAgmSsDdNQrDU8CV+lxdNdYztRK8bVT/snT1 IPtq5612UP5+oZzEsSn4glkD2VSElh5/8bXmWjm6CAYeWJnoi19quV73yZlBkFX9lRTP 13hDSmOESPUc2wXnSrP7Hx+/NLcS0MwscvXINHpHLeoIcrXZhOg1d/1TQMrdSKyAR+lv rwWKWXVEgDI2/RaY/m2QgV6Tn46xqGnxjqls4Kr+OLQEc0WgxvbxikhmZZ7QuFhsVBCz ivnw== MIME-Version: 1.0 X-Received: by 10.50.23.108 with SMTP id l12mr4063818igf.45.1371065585296; Wed, 12 Jun 2013 12:33:05 -0700 (PDT) Received: by 10.64.139.34 with HTTP; Wed, 12 Jun 2013 12:33:04 -0700 (PDT) Date: Wed, 12 Jun 2013 12:33:04 -0700 Message-ID: Subject: Re: FFS: fsck doesn't match doc From: Dieter BSD To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 19:33:05 -0000 >> Would it be safe to have fsck automagically fix this problem, as the >> doc (incorrectly) says it does? > > What happens if you add to /etc/rc.conf: > > fsck_y_enable="YES" > background_fsck="NO" Fsck -y is not safe. :-( Would it be *safe* to have "fsck -p" automagically fix this problem, as the doc (incorrectly) says it does? From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 23:15:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1B9E644D for ; Wed, 12 Jun 2013 23:15:05 +0000 (UTC) (envelope-from wmn@siberianet.ru) Received: from mail.siberianet.ru (mail.siberianet.ru [89.105.136.7]) by mx1.freebsd.org (Postfix) with ESMTP id BFAD619D0 for ; Wed, 12 Jun 2013 23:15:04 +0000 (UTC) Received: from book.localnet (wmn.siberianet.ru [89.105.137.12]) by mail.siberianet.ru (Postfix) with ESMTP id D4FD612FB34; Thu, 13 Jun 2013 07:05:27 +0800 (KRAT) From: Sergey Lobanov Organization: ISP "SiberiaNet" To: freebsd-fs@freebsd.org Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS Date: Thu, 13 Jun 2013 07:05:26 +0800 User-Agent: KMail/1.13.7 (FreeBSD/9.0-RELEASE-p3; KDE/4.7.3; amd64; ; ) References: <51B79023.5020109@fsn.hu> <20130612114937.GA13688@icarus.home.lan> In-Reply-To: <20130612114937.GA13688@icarus.home.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201306130705.26895.wmn@siberianet.ru> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 23:15:05 -0000 On Wednesday 12 June 2013, Jeremy Chadwick wrote: > On Wed, Jun 12, 2013 at 06:40:32AM -0500, Mark Felder wrote: > > On Tue, 11 Jun 2013 16:01:23 -0500, Attila Nagy wrote: > > >BTW, the file systems are 77-78% full according to df (so ZFS > > >holds more, because UFS is -m 8). > > > > ZFS write performance can begin to drop pretty badly when you get > > around 80% full. I've not seen any benchmarks showing an improvement > > with a very fast and large ZIL or tons of memory, but I'd expect > > that would help significantly. Just note that you're right at the > > edge where performance gets impacted. > > Mark, do you have any references for this? I'd love to learn/read more > about this engineering/design aspect (I won't say flaw, I'll just say > aspect) to ZFS, as it's the first I've heard of it. > > The reason I ask: (respectfully, not judgementally) I'm worried you > might be referring to something that has to do with SSDs and not ZFS, > specifically SSD wear-levelling performing better with lots of free > space (i.e. a small FTL map; TRIM helps with this immensely) -- where > the performance hit tends to begin around the 70-80% mark. (I can talk > more about that if asked, but want to make sure the two things aren't > being mistaken for one another) http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016834.html CC'd mm@. -- ISP "SiberiaNet" System and Network Administrator From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 23:40:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 094878B6 for ; Wed, 12 Jun 2013 23:40:51 +0000 (UTC) (envelope-from jonaschuman@gmail.com) Received: from mail-bk0-x22b.google.com (mail-bk0-x22b.google.com [IPv6:2a00:1450:4008:c01::22b]) by mx1.freebsd.org (Postfix) with ESMTP id 960C31AFC for ; Wed, 12 Jun 2013 23:40:50 +0000 (UTC) Received: by mail-bk0-f43.google.com with SMTP id jm2so3640819bkc.16 for ; Wed, 12 Jun 2013 16:40:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=sjzNeJDHJ2jFZdqY2ERfpAd5B3KOog3MDMLHCiDgPGQ=; b=DKRPF+8BYHb+LzDVsRuH6WbHg419K6S0tiibMUzcB53x1JQxZXtfzxPQENBTDlpyrK mxzEUEPXv6SdcziNOsok1/TCwZ2Dl23dzq4m7u1oPxxnVmKlTuTNd/gW1Y9D0NXICgW6 HbrRH+iw1l1st7KPmpuBmisVSn5MjTwo+e+6nmcF8UTZYHnUy1vbC7mKKsfqPXgmizNO zQJThxU0HrgWDgQpuJxtilM0goUdV7/H4+vFx34CGU9AId1f4AMxlQ3xKs3R/toPHWo2 DBOtN7Prh+jxM2wcK4z8QHEounL49RiFAqzYZgBEU/W3DM7otIu1ImqlZE8wFl9RiUoi 6s8g== MIME-Version: 1.0 X-Received: by 10.204.65.69 with SMTP id h5mr3506797bki.59.1371080449628; Wed, 12 Jun 2013 16:40:49 -0700 (PDT) Received: by 10.205.125.145 with HTTP; Wed, 12 Jun 2013 16:40:49 -0700 (PDT) Date: Wed, 12 Jun 2013 19:40:49 -0400 Message-ID: Subject: zfs send/recv dies when transferring large-ish dataset From: Jona Schuman To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 23:40:51 -0000 Hi, I'm getting some strange behavior from zfs send/recv and I'm hoping someone may be able to provide some insight. I have two identical machines running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool 28) for storage. I want to use zfs send/recv for replication between the two machines. For the most part, this has worked as expected. However, send/recv fails when transferring the largest dataset (both in actual size and in terms of number of files) on either machine. With these datasets, issuing: machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send dataset@snap | nc machine2 9999 terminates early on the sending side without any error messages. The receiving end continues on as expected, cleaning up the partial data received so far and reverting to its initial state. (I've tried using mbuffer instead of nc, or just using ssh, both with similar results.) Oddly, zfs send dies slightly differently depending on how the two machines are connected. When connected through the racktop switch, zfs send dies quietly without any indication that the transfer has failed. When connected directly using a crossover cable, zfs send dies quietly and machine1 becomes unresponsive (no network, no keyboard, hard reset required). In both cases, no messages are printed to screen or to anything in /var/log/. I can transfer the same datasets successfully if I send/recv to/from file: machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump machine2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump so I don't think the datasets themselves are the issue. I've also successfully tried send/recv over the network using different network interfaces (10GbE ixgbe cards instead of the 1GbE igb links), which would suggest the issue is with the 1GbE links. Might there be some buffering parameter that I'm neglecting to tune, which is essential on the 1GbE links but may be less important on the faster links? Are there any known issues with the igb driver that might be the culprit here? Any other suggestions? Thanks, Jona From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 07:57:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3D2C2F43 for ; Thu, 13 Jun 2013 07:57:47 +0000 (UTC) (envelope-from Ivailo.Tanusheff@skrill.com) Received: from ch1outboundpool.messaging.microsoft.com (ch1ehsobe002.messaging.microsoft.com [216.32.181.182]) by mx1.freebsd.org (Postfix) with ESMTP id E43CD1E47 for ; Thu, 13 Jun 2013 07:57:46 +0000 (UTC) Received: from mail161-ch1-R.bigfish.com (10.43.68.231) by CH1EHSOBE018.bigfish.com (10.43.70.68) with Microsoft SMTP Server id 14.1.225.23; Thu, 13 Jun 2013 07:42:30 +0000 Received: from mail161-ch1 (localhost [127.0.0.1]) by mail161-ch1-R.bigfish.com (Postfix) with ESMTP id 929231C01BD; Thu, 13 Jun 2013 07:42:30 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null); IPV:NLI; H:AM2PRD0710HT004.eurprd07.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: -1 X-BigFish: PS-1(z54eehz9371I542I4015Izz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz17326ah8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1e1dh9a9j1155h) Received-SPF: pass (mail161-ch1: domain of skrill.com designates 157.56.249.213 as permitted sender) client-ip=157.56.249.213; envelope-from=Ivailo.Tanusheff@skrill.com; helo=AM2PRD0710HT004.eurprd07.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:SKI; SFS:; DIR:OUT; SFP:; SCL:-1; SRVR:DB3PR07MB057; H:DB3PR07MB059.eurprd07.prod.outlook.com; LANG:en; Received: from mail161-ch1 (localhost.localdomain [127.0.0.1]) by mail161-ch1 (MessageSwitch) id 1371109347953425_25617; Thu, 13 Jun 2013 07:42:27 +0000 (UTC) Received: from CH1EHSMHS035.bigfish.com (snatpool1.int.messaging.microsoft.com [10.43.68.242]) by mail161-ch1.bigfish.com (Postfix) with ESMTP id DAD6420004D; Thu, 13 Jun 2013 07:42:27 +0000 (UTC) Received: from AM2PRD0710HT004.eurprd07.prod.outlook.com (157.56.249.213) by CH1EHSMHS035.bigfish.com (10.43.70.35) with Microsoft SMTP Server (TLS) id 14.1.225.23; Thu, 13 Jun 2013 07:42:27 +0000 Received: from DB3PR07MB057.eurprd07.prod.outlook.com (10.242.137.144) by AM2PRD0710HT004.eurprd07.prod.outlook.com (10.255.165.39) with Microsoft SMTP Server (TLS) id 14.16.324.0; Thu, 13 Jun 2013 07:42:12 +0000 Received: from DB3PR07MB059.eurprd07.prod.outlook.com (10.242.137.149) by DB3PR07MB057.eurprd07.prod.outlook.com (10.242.137.144) with Microsoft SMTP Server (TLS) id 15.0.702.21; Thu, 13 Jun 2013 07:42:11 +0000 Received: from DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.14]) by DB3PR07MB059.eurprd07.prod.outlook.com ([169.254.2.14]) with mapi id 15.00.0702.005; Thu, 13 Jun 2013 07:42:11 +0000 From: Ivailo Tanusheff To: Jona Schuman , "freebsd-fs@freebsd.org" Subject: RE: zfs send/recv dies when transferring large-ish dataset Thread-Topic: zfs send/recv dies when transferring large-ish dataset Thread-Index: AQHOZ8ZVUN+hFJBhLk6aHw9omdejRZkzQxIg Date: Thu, 13 Jun 2013 07:42:11 +0000 Message-ID: <57e0551229684b69bc27476b8a08fb91@DB3PR07MB059.eurprd07.prod.outlook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [217.18.249.148] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: skrill.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 07:57:47 -0000 Hi, Can you try send/recv with the -v or with -vP swiches, so you can see more = verbose information? Regards, Ivailo Tanusheff -----Original Message----- From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On= Behalf Of Jona Schuman Sent: Thursday, June 13, 2013 2:41 AM To: freebsd-fs@freebsd.org Subject: zfs send/recv dies when transferring large-ish dataset Hi, I'm getting some strange behavior from zfs send/recv and I'm hoping someone= may be able to provide some insight. I have two identical machines running= 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool 28) for storage. I want to use zfs send/recv for replication between the tw= o machines. For the most part, this has worked as expected. However, send/recv fails when transferring the largest dataset (both in act= ual size and in terms of number of files) on either machine. With these datasets, issuing: machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send datase= t@snap | nc machine2 9999 terminates early on the sending side without any error messages. The receiv= ing end continues on as expected, cleaning up the partial data received so = far and reverting to its initial state. (I've tried using mbuffer instead o= f nc, or just using ssh, both with similar results.) Oddly, zfs send dies s= lightly differently depending on how the two machines are connected. When c= onnected through the racktop switch, zfs send dies quietly without any indi= cation that the transfer has failed. When connected directly using a crossover cable, zfs send dies quietly and = machine1 becomes unresponsive (no network, no keyboard, hard reset required= ). In both cases, no messages are printed to screen or to anything in /var/= log/. I can transfer the same datasets successfully if I send/recv to/from file: machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump machine= 2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump so I don't think the datasets themselves are the issue. I've also successfu= lly tried send/recv over the network using different network interfaces (10= GbE ixgbe cards instead of the 1GbE igb links), which would suggest the iss= ue is with the 1GbE links. Might there be some buffering parameter that I'm neglecting to tune, which = is essential on the 1GbE links but may be less important on the faster link= s? Are there any known issues with the igb driver that might be the culprit= here? Any other suggestions? Thanks, Jona _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 08:28:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4EAE5BF1 for ; Thu, 13 Jun 2013 08:28:33 +0000 (UTC) (envelope-from rs@bytecamp.net) Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9]) by mx1.freebsd.org (Postfix) with ESMTP id DC9671F60 for ; Thu, 13 Jun 2013 08:28:32 +0000 (UTC) Received: (qmail 41990 invoked by uid 89); 13 Jun 2013 10:28:24 +0200 Received: from stella.bytecamp.net (HELO ?212.204.60.37?) (rs%bytecamp.net@212.204.60.37) by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP; 13 Jun 2013 10:28:24 +0200 Message-ID: <51B982A8.10605@bytecamp.net> Date: Thu, 13 Jun 2013 10:28:24 +0200 From: Robert Schulze Organization: bytecamp GmbH User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS References: <51B79023.5020109@fsn.hu> <20130612114937.GA13688@icarus.home.lan> In-Reply-To: <20130612114937.GA13688@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 08:28:33 -0000 Hi, Am 12.06.2013 13:49, schrieb Jeremy Chadwick: > On Wed, Jun 12, 2013 at 06:40:32AM -0500, Mark Felder wrote: >> On Tue, 11 Jun 2013 16:01:23 -0500, Attila Nagy wrote: >> >> ZFS write performance can begin to drop pretty badly when you get >> around 80% full. I've not seen any benchmarks showing an improvement >> with a very fast and large ZIL or tons of memory, but I'd expect >> that would help significantly. Just note that you're right at the >> edge where performance gets impacted. > > Mark, do you have any references for this? I'd love to learn/read more > about this engineering/design aspect (I won't say flaw, I'll just say > aspect) to ZFS, as it's the first I've heard of it. this is even true when getting near a quota limit on a zfs, although there are e.g. 10/16 TB free in the pool. Just create a filesystem and set quota=1G, then do sequential invocations of dd to fill the fs with 100M files. You will see a sharp slowdown when the last twenty files are beeing created. Here are the results from the following short test: for i in `jot - 0 99` do dd if=/dev/zero of=/pool/quota-test/10M.$i bs=1M count=10 done 0..80: < 0.4 s 80 0.27 s 81 0.77 s 82 0.50 s 83 0.51 s 84 0.22 s 85 0.87 s 86 0.52 s 87 1.13 s 88 0.91 s 90 0.39 s 91 1.04 s 92 0.80 s 93 1.94 s 94 1.27 s 95 1.36 s 96 1.76 s 97 2.13 s 98 3.28 s 99 4.07 s of course, there are some small values beyond 80% utilisation, but I think the trend is clearly visible. In my opinion, hitting a quota limit should not give these results unless enough free physical disk space is available in the pool. This is a bug or a design flaw and creating serious problems when exporting quota'ed zfs over nfs. with kind regards, Robert Schulze -- /7\ bytecamp GmbH Geschwister-Scholl-Str. 10, 14776 Brandenburg a.d. Havel HRB15752, Amtsgericht Potsdam, Geschaeftsfuehrer: Bjoern Barnekow, Frank Rosenbaum, Sirko Zidlewitz tel +49 3381 79637-0 werktags 10-12,13-17 Uhr, fax +49 3381 79637-20 mail rs@bytecamp.net, web http://bytecamp.net/ From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 11:41:02 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DE90C3D0 for ; Thu, 13 Jun 2013 11:41:02 +0000 (UTC) (envelope-from sdenic@intech.co.rs) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id C402317D5 for ; Thu, 13 Jun 2013 11:41:02 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1Un5sT-0001wn-53 for freebsd-fs@freebsd.org; Thu, 13 Jun 2013 04:39:41 -0700 Date: Thu, 13 Jun 2013 04:39:41 -0700 (PDT) From: intech To: freebsd-fs@freebsd.org Message-ID: <1371123581091-5819759.post@n5.nabble.com> In-Reply-To: <4C6BDBB9.3020007@gibfest.dk> References: <4C61CF4D.4060009@gibfest.dk> <4C651B7E.5000805@gibfest.dk> <4C6B08BD.9080206@gibfest.dk> <20100818110655.GA2177@garage.freebsd.pl> <4C6BC0BA.9030303@gibfest.dk> <4C6BC35B.9040000@gibfest.dk> <20100818121133.GC2177@garage.freebsd.pl> <4C6BD521.1060807@gibfest.dk> <20100818125856.GE2177@garage.freebsd.pl> <4C6BDBB9.3020007@gibfest.dk> Subject: Re: HAST initial sync speed MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 11:41:02 -0000 Thought this threat is almost 3 years old, I want to ask if this MAX_SEND_SIZE adopted in freebsd 8.3 and even fbsd9.1? Indeed I have the same issue on 1Gb network - nodes performing sync at only 10MBytes/sec ?! and I can't figure out what is happening as network itself is not the problem, I tested it. And just one question fullsync is only option for HAST replication at time of writing, so could HAST perform at 100MB/sec in this mode, and when we expect memsync and async to be released? -- View this message in context: http://freebsd.1045724.n5.nabble.com/HAST-initial-sync-speed-tp4027033p5819759.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 15:56:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C19F6322 for ; Thu, 13 Jun 2013 15:56:28 +0000 (UTC) (envelope-from jonaschuman@gmail.com) Received: from mail-vb0-x22b.google.com (mail-vb0-x22b.google.com [IPv6:2607:f8b0:400c:c02::22b]) by mx1.freebsd.org (Postfix) with ESMTP id 86CDA139E for ; Thu, 13 Jun 2013 15:56:28 +0000 (UTC) Received: by mail-vb0-f43.google.com with SMTP id e12so4591892vbg.16 for ; Thu, 13 Jun 2013 08:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=AAtGmjKDJY8sF4Bjd8gnnrxn31GiYQuwEwmLIDY/D90=; b=LpzmG2065qoDwx+HjDRJTxHJm95aDjdofjFLQshxX205ltLgKpQaGZX362sJuDtAfu CTFOecO0w5WsTMNyknxj8PBqAIt+OuQUtpIDPA3UP4BA6cfl+FOm2oaXnuuC7O1twj18 76pSWfj/F2iIBD/svCTI1A4s8IyMpyiTrjbvOU+MVqzSq743uaJ/fTlD9S0WcX220IvR g3uHBt1G55KZbEeySmN8URCOu7vtbCZDo3mIEso9yrymVJexn6MP1metlBYUfrbAljGN hWPWJSteoJhMhQZo46Bc3MJxQX/E27nCPgeIrvFKNWRuQAQAaSDkdvTnFhWkZ6IQW9lO luTQ== MIME-Version: 1.0 X-Received: by 10.52.22.78 with SMTP id b14mr545924vdf.27.1371138988025; Thu, 13 Jun 2013 08:56:28 -0700 (PDT) Received: by 10.220.167.73 with HTTP; Thu, 13 Jun 2013 08:56:27 -0700 (PDT) In-Reply-To: <57e0551229684b69bc27476b8a08fb91@DB3PR07MB059.eurprd07.prod.outlook.com> References: <57e0551229684b69bc27476b8a08fb91@DB3PR07MB059.eurprd07.prod.outlook.com> Date: Thu, 13 Jun 2013 11:56:27 -0400 Message-ID: Subject: Re: zfs send/recv dies when transferring large-ish dataset From: Jona Schuman To: Ivailo Tanusheff Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 15:56:28 -0000 machine2# nc -d -l 9999 | zfs receive -v -F -d storagepool machine1# zfs send -v -R dataset@snap | nc machine2 9999 machine1-output: sending from @ to dataset@snap machine2-output: receiving full stream of dataset@snap into storagepool/dataset@snap machine1-output: warning: cannot send 'dataset@snap': Broken pipe machine1-output: Broken pipe On Thu, Jun 13, 2013 at 3:42 AM, Ivailo Tanusheff wrote: > Hi, > > Can you try send/recv with the -v or with -vP swiches, so you can see mor= e verbose information? > > Regards, > Ivailo Tanusheff > > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] = On Behalf Of Jona Schuman > Sent: Thursday, June 13, 2013 2:41 AM > To: freebsd-fs@freebsd.org > Subject: zfs send/recv dies when transferring large-ish dataset > > Hi, > > I'm getting some strange behavior from zfs send/recv and I'm hoping someo= ne may be able to provide some insight. I have two identical machines runni= ng 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool > 28) for storage. I want to use zfs send/recv for replication between the = two machines. For the most part, this has worked as expected. > However, send/recv fails when transferring the largest dataset (both in a= ctual size and in terms of number of files) on either machine. > With these datasets, issuing: > > machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send data= set@snap | nc machine2 9999 > > terminates early on the sending side without any error messages. The rece= iving end continues on as expected, cleaning up the partial data received s= o far and reverting to its initial state. (I've tried using mbuffer instead= of nc, or just using ssh, both with similar results.) Oddly, zfs send dies= slightly differently depending on how the two machines are connected. When= connected through the racktop switch, zfs send dies quietly without any in= dication that the transfer has failed. > When connected directly using a crossover cable, zfs send dies quietly an= d machine1 becomes unresponsive (no network, no keyboard, hard reset requir= ed). In both cases, no messages are printed to screen or to anything in /va= r/log/. > > > I can transfer the same datasets successfully if I send/recv to/from file= : > > machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump machi= ne2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump > > so I don't think the datasets themselves are the issue. I've also success= fully tried send/recv over the network using different network interfaces (= 10GbE ixgbe cards instead of the 1GbE igb links), which would suggest the i= ssue is with the 1GbE links. > > Might there be some buffering parameter that I'm neglecting to tune, whic= h is essential on the 1GbE links but may be less important on the faster li= nks? Are there any known issues with the igb driver that might be the culpr= it here? Any other suggestions? > > Thanks, > Jona > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 16:06:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A829363F for ; Thu, 13 Jun 2013 16:06:59 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (lrosenman-1-pt.tunnel.tserv8.dal1.ipv6.he.net [IPv6:2001:470:1f0e:3ad::2]) by mx1.freebsd.org (Postfix) with ESMTP id 65AC115C9 for ; Thu, 13 Jun 2013 16:06:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; s=lerami; h=Message-ID:References:In-Reply-To:Subject:To:From:Date:Content-Transfer-Encoding:Content-Type:MIME-Version; bh=2EbPCYEag64lSc/JF9mqcvzjg6mzkpzwh5L54Q7twlE=; b=KcINUEWUVyo2/TTIyDOUkauzakrfzx+Q8YxwZdEeByIbrORQFn4Dqnle7jabo7quf6c3kptLoABGaoaO5ixrEEc6sGrdIopR+8dRZG5+Yt9R9w7YWFOiibWuPtdyhtW0at0AIlp/3Z8uMfgCVqpHekqLFoCBYvhACzJBtudXHAA=; Received: from localhost.lerctr.org ([127.0.0.1]:36432 helo=webmail.lerctr.org) by thebighonker.lerctr.org with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UnA37-0000ep-Th for freebsd-fs@freebsd.org; Thu, 13 Jun 2013 11:06:59 -0500 Received: from [32.97.110.58] by webmail.lerctr.org with HTTP (HTTP/1.1 POST); Thu, 13 Jun 2013 11:06:57 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 13 Jun 2013 11:06:57 -0500 From: Larry Rosenman To: freebsd-fs@freebsd.org Subject: Re: zfs send/recv dies when transferring large-ish dataset In-Reply-To: References: <57e0551229684b69bc27476b8a08fb91@DB3PR07MB059.eurprd07.prod.outlook.com> Message-ID: <7dbb4a3d84381d923e22ec5ed77ea15e@webmail.lerctr.org> X-Sender: ler@lerctr.org User-Agent: Roundcube Webmail/0.9.1 X-Spam-Score: -3.3 (---) X-LERCTR-Spam-Score: -3.3 (---) X-Spam-Report: SpamScore (-3.3/5.0) ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.392 X-LERCTR-Spam-Report: SpamScore (-3.3/5.0) ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.392 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 16:06:59 -0000 This may be related to the grief I'm seeing in general with send/recv. I try to send streams and it breaks with invalid datastream or similar to yours. I've posted many times, and offered up my machine(s) to debug but no takers. On 2013-06-13 10:56, Jona Schuman wrote: > machine2# nc -d -l 9999 | zfs receive -v -F -d storagepool > machine1# zfs send -v -R dataset@snap | nc machine2 9999 > > machine1-output: sending from @ to dataset@snap > machine2-output: receiving full stream of dataset@snap into > storagepool/dataset@snap > machine1-output: warning: cannot send 'dataset@snap': Broken pipe > machine1-output: Broken pipe > > > On Thu, Jun 13, 2013 at 3:42 AM, Ivailo Tanusheff > wrote: >> Hi, >> >> Can you try send/recv with the -v or with -vP swiches, so you can see >> more verbose information? >> >> Regards, >> Ivailo Tanusheff >> >> -----Original Message----- >> From: owner-freebsd-fs@freebsd.org >> [mailto:owner-freebsd-fs@freebsd.org] On Behalf Of Jona Schuman >> Sent: Thursday, June 13, 2013 2:41 AM >> To: freebsd-fs@freebsd.org >> Subject: zfs send/recv dies when transferring large-ish dataset >> >> Hi, >> >> I'm getting some strange behavior from zfs send/recv and I'm hoping >> someone may be able to provide some insight. I have two identical >> machines running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool >> 28) for storage. I want to use zfs send/recv for replication between >> the two machines. For the most part, this has worked as expected. >> However, send/recv fails when transferring the largest dataset (both >> in actual size and in terms of number of files) on either machine. >> With these datasets, issuing: >> >> machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send >> dataset@snap | nc machine2 9999 >> >> terminates early on the sending side without any error messages. The >> receiving end continues on as expected, cleaning up the partial data >> received so far and reverting to its initial state. (I've tried using >> mbuffer instead of nc, or just using ssh, both with similar results.) >> Oddly, zfs send dies slightly differently depending on how the two >> machines are connected. When connected through the racktop switch, zfs >> send dies quietly without any indication that the transfer has failed. >> When connected directly using a crossover cable, zfs send dies quietly >> and machine1 becomes unresponsive (no network, no keyboard, hard reset >> required). In both cases, no messages are printed to screen or to >> anything in /var/log/. >> >> >> I can transfer the same datasets successfully if I send/recv to/from >> file: >> >> machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump >> machine2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump >> >> so I don't think the datasets themselves are the issue. I've also >> successfully tried send/recv over the network using different network >> interfaces (10GbE ixgbe cards instead of the 1GbE igb links), which >> would suggest the issue is with the 1GbE links. >> >> Might there be some buffering parameter that I'm neglecting to tune, >> which is essential on the 1GbE links but may be less important on the >> faster links? Are there any known issues with the igb driver that >> might be the culprit here? Any other suggestions? >> >> Thanks, >> Jona >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 20:53:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6881DCCD for ; Thu, 13 Jun 2013 20:53:50 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ea0-x231.google.com (mail-ea0-x231.google.com [IPv6:2a00:1450:4013:c01::231]) by mx1.freebsd.org (Postfix) with ESMTP id F2F48122E for ; Thu, 13 Jun 2013 20:53:49 +0000 (UTC) Received: by mail-ea0-f177.google.com with SMTP id j14so7982628eak.22 for ; Thu, 13 Jun 2013 13:53:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=4c+0qwysDJICPu0sbZVEk/2hJJQIIqsV6lcAkVV2hjU=; b=HDBDrQUGhhmF84/wn9oHjQlBFQ3MR1H7LPUc5ejnRnr684zbchTFAZ2UvdLKUhn1Ny XGcb5c/WrNGdu+YjVbjnwxeUcGwrj1IUpRPvC7uJgLENgithDODaOt81Y+JUF2SeiKTQ pEL7s0L248ohOK66NngXGkk2R1cmXpipBHCMU1hPtyqhsQdz5Z9+TZp5B2ChqXoIzBdt 3SYWB0mdYipxJ+Zawyg/bglX6/RW3RIklLf1s2139TEDwzH5S42t+b3K7YbsDkVHJMwx c9z5gYuBBWE2FmO4sdr74EBp7l7I0B7XqCqfqiEaz6rJ8s/cANhqMYQPDDjmYmcCnSFd KyYg== X-Received: by 10.15.99.2 with SMTP id bk2mr3204097eeb.76.1371156828366; Thu, 13 Jun 2013 13:53:48 -0700 (PDT) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id y10sm46209983eev.3.2013.06.13.13.53.46 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 13 Jun 2013 13:53:47 -0700 (PDT) Sender: Mikolaj Golub Date: Thu, 13 Jun 2013 23:53:45 +0300 From: Mikolaj Golub To: intech Subject: Re: HAST initial sync speed Message-ID: <20130613205344.GB8732@gmail.com> References: <4C651B7E.5000805@gibfest.dk> <4C6B08BD.9080206@gibfest.dk> <20100818110655.GA2177@garage.freebsd.pl> <4C6BC0BA.9030303@gibfest.dk> <4C6BC35B.9040000@gibfest.dk> <20100818121133.GC2177@garage.freebsd.pl> <4C6BD521.1060807@gibfest.dk> <20100818125856.GE2177@garage.freebsd.pl> <4C6BDBB9.3020007@gibfest.dk> <1371123581091-5819759.post@n5.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1371123581091-5819759.post@n5.nabble.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 20:53:50 -0000 On Thu, Jun 13, 2013 at 04:39:41AM -0700, intech wrote: > Thought this threat is almost 3 years old, I want to ask if this > MAX_SEND_SIZE adopted in freebsd 8.3 and even fbsd9.1? > Indeed I have the same issue on 1Gb network - nodes performing sync at only > 10MBytes/sec ?! and I can't figure out what is happening as network itself > is not the problem, I tested it. What version are you running? There have been several changes since that thread was started related to the synchronization speed issue (MAX_SEND_SIZE among them). It is recommended to use recent versions. > And just one question fullsync is only option for HAST replication at time > of writing, so could HAST perform at 100MB/sec in this mode, and when we > expect memsync and async to be released? Synchronization is run in background by synchronization thread and hardly depends on replication mode. Anyway, HAST from CURRENT, STABLE/9 and STABLE/8 supports all three modes. The async mode was merged to stable branches in Jan 2012, and reached 8.4 and 9.1 (I am not sure about the later, one needs to check). The memsync mode was merged in Apr 2013, after 8.4 freeze, so there is no release that would contain it yet. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 21:29:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 77CB9EA; Thu, 13 Jun 2013 21:29:19 +0000 (UTC) (envelope-from sinisa.denic@intech.co.rs) Received: from exchange.peacebellservers.com (exchange.peacebellservers.com [46.22.146.98]) by mx1.freebsd.org (Postfix) with ESMTP id 2C1421447; Thu, 13 Jun 2013 21:29:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by exchange.peacebellservers.com (Postfix) with ESMTP id 2CFF248CF3; Thu, 13 Jun 2013 23:20:41 +0200 (CEST) Received: from exchange.peacebellservers.com ([127.0.0.1]) by localhost (exchange.peacebellservers.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id pOSVSjR_z_uY; Thu, 13 Jun 2013 23:20:40 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by exchange.peacebellservers.com (Postfix) with ESMTP id 1BC7F48CF8; Thu, 13 Jun 2013 23:20:40 +0200 (CEST) X-Virus-Scanned: amavisd-new at exchange.peacebellservers.com Received: from exchange.peacebellservers.com ([127.0.0.1]) by localhost (exchange.peacebellservers.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jEGXt3Jqhlga; Thu, 13 Jun 2013 23:20:39 +0200 (CEST) Received: from exchange.peacebellservers.com (exchange.peacebellservers.com [46.22.146.98]) by exchange.peacebellservers.com (Postfix) with ESMTP id C4EDA48CF3; Thu, 13 Jun 2013 23:20:39 +0200 (CEST) Date: Thu, 13 Jun 2013 21:20:39 +0000 (UTC) From: =?utf-8?Q?Sini=C5=A1a_Deni=C4=87?= To: Mikolaj Golub Message-ID: <511373126.6399.1371158439628.JavaMail.root@intech.co.rs> In-Reply-To: <20130613205344.GB8732@gmail.com> References: <4C651B7E.5000805@gibfest.dk> <4C6BC35B.9040000@gibfest.dk> <20100818121133.GC2177@garage.freebsd.pl> <4C6BD521.1060807@gibfest.dk> <20100818125856.GE2177@garage.freebsd.pl> <4C6BDBB9.3020007@gibfest.dk> <1371123581091-5819759.post@n5.nabble.com> <20130613205344.GB8732@gmail.com> Subject: Re: HAST initial sync speed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.0.3_GA_5664 (ZimbraWebClient - GC27 (Win)/8.0.3_GA_5664) Thread-Topic: HAST initial sync speed Thread-Index: gqlsJVGtxB1in2A16T57Iz2DTryePQ== Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 21:29:19 -0000 >There have been several changes since >that thread was started related to the synchronization speed issue That's what happening to me, syncing 37GB takes more then 30min over 1Gb ne= twork, I have two Fujitsu Siemens Esprimo nodes with two seagate sata2 150M= B/s running freebsd-9.1-amd64 version. Sini=C5=A1a Deni=C4=87=20 INTECH DOO=20 www.intech.co.rs=20 ----- Original Message ----- From: "Mikolaj Golub" To: "intech" Cc: freebsd-fs@freebsd.org Sent: Thursday, June 13, 2013 10:53:45 PM Subject: Re: HAST initial sync speed On Thu, Jun 13, 2013 at 04:39:41AM -0700, intech wrote: > Thought this threat is almost 3 years old, I want to ask if this > MAX_SEND_SIZE adopted in freebsd 8.3 and even fbsd9.1? > Indeed I have the same issue on 1Gb network - nodes performing sync at on= ly > 10MBytes/sec ?! and I can't figure out what is happening as network itsel= f > is not the problem, I tested it. What version are you running? There have been several changes since that thread was started related to the synchronization speed issue (MAX_SEND_SIZE among them). It is recommended to use recent versions. > And just one question fullsync is only option for HAST replication at tim= e > of writing, so could HAST perform at 100MB/sec in this mode, and when we > expect memsync and async to be released? Synchronization is run in background by synchronization thread and hardly depends on replication mode. Anyway, HAST from CURRENT, STABLE/9 and STABLE/8 supports all three modes. The async mode was merged to stable branches in Jan 2012, and reached 8.4 and 9.1 (I am not sure about the later, one needs to check). The memsync mode was merged in Apr 2013, after 8.4 freeze, so there is no release that would contain it yet. --=20 Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Thu Jun 13 21:52:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2D4B776C for ; Thu, 13 Jun 2013 21:52:28 +0000 (UTC) (envelope-from jonaschuman@gmail.com) Received: from mail-vb0-x22f.google.com (mail-vb0-x22f.google.com [IPv6:2607:f8b0:400c:c02::22f]) by mx1.freebsd.org (Postfix) with ESMTP id E1586164F for ; Thu, 13 Jun 2013 21:52:27 +0000 (UTC) Received: by mail-vb0-f47.google.com with SMTP id x14so7412231vbb.34 for ; Thu, 13 Jun 2013 14:52:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=EvleN+1NPw0UgGpcc5woKnSZrp1buutcCAtN3I34PRg=; b=z+VKGNX7RnP/naWbUfq+RgHonUSYp/JsD28SH8dXfazUo+HCHubMmP5/EhlP09X2b6 3nHwXcVU0We0+Qm/4/5zIejKvVVQ1fv6P0JQWxi7bk17EjpquCEx/OEtTxfg1YfKcnGO SVgky1hRzvq6DWyGtkvgn1Z/l5yNawfoA6S403k5Sr466MC9c2NNQhvomv9VAh17Wedt 6Ia8/kM8kV3ZG78QYbg85YaJ8S/sfJe5lCpYFeJ4nyk1mqz3czU6m+rJY78djthmF0B5 vJKTUZW+IbyQ7zdfTCTE76DMLsvbvAM7eIoduXXXezbnVXocVZOBuBgysEpsZxjSWpLI YAlw== MIME-Version: 1.0 X-Received: by 10.220.11.143 with SMTP id t15mr1181436vct.68.1371160347436; Thu, 13 Jun 2013 14:52:27 -0700 (PDT) Received: by 10.220.167.73 with HTTP; Thu, 13 Jun 2013 14:52:27 -0700 (PDT) In-Reply-To: References: <57e0551229684b69bc27476b8a08fb91@DB3PR07MB059.eurprd07.prod.outlook.com> Date: Thu, 13 Jun 2013 17:52:27 -0400 Message-ID: Subject: Re: zfs send/recv dies when transferring large-ish dataset From: Jona Schuman To: abhay trivedi , "freebsd-fs@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 21:52:28 -0000 atime is off on both origin and destination datasets On Thu, Jun 13, 2013 at 5:30 PM, abhay trivedi wrote: > Can you set atime off on Destination file system and try again? > > > > On Thu, Jun 13, 2013 at 9:26 PM, Jona Schuman wrote: >> >> machine2# nc -d -l 9999 | zfs receive -v -F -d storagepool >> machine1# zfs send -v -R dataset@snap | nc machine2 9999 >> >> machine1-output: sending from @ to dataset@snap >> machine2-output: receiving full stream of dataset@snap into >> storagepool/dataset@snap >> machine1-output: warning: cannot send 'dataset@snap': Broken pipe >> machine1-output: Broken pipe >> >> >> On Thu, Jun 13, 2013 at 3:42 AM, Ivailo Tanusheff >> wrote: >> > Hi, >> > >> > Can you try send/recv with the -v or with -vP swiches, so you can see >> > more verbose information? >> > >> > Regards, >> > Ivailo Tanusheff >> > >> > -----Original Message----- >> > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] >> > On Behalf Of Jona Schuman >> > Sent: Thursday, June 13, 2013 2:41 AM >> > To: freebsd-fs@freebsd.org >> > Subject: zfs send/recv dies when transferring large-ish dataset >> > >> > Hi, >> > >> > I'm getting some strange behavior from zfs send/recv and I'm hoping >> > someone may be able to provide some insight. I have two identical machines >> > running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool >> > 28) for storage. I want to use zfs send/recv for replication between the >> > two machines. For the most part, this has worked as expected. >> > However, send/recv fails when transferring the largest dataset (both in >> > actual size and in terms of number of files) on either machine. >> > With these datasets, issuing: >> > >> > machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send >> > dataset@snap | nc machine2 9999 >> > >> > terminates early on the sending side without any error messages. The >> > receiving end continues on as expected, cleaning up the partial data >> > received so far and reverting to its initial state. (I've tried using >> > mbuffer instead of nc, or just using ssh, both with similar results.) Oddly, >> > zfs send dies slightly differently depending on how the two machines are >> > connected. When connected through the racktop switch, zfs send dies quietly >> > without any indication that the transfer has failed. >> > When connected directly using a crossover cable, zfs send dies quietly >> > and machine1 becomes unresponsive (no network, no keyboard, hard reset >> > required). In both cases, no messages are printed to screen or to anything >> > in /var/log/. >> > >> > >> > I can transfer the same datasets successfully if I send/recv to/from >> > file: >> > >> > machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump >> > machine2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump >> > >> > so I don't think the datasets themselves are the issue. I've also >> > successfully tried send/recv over the network using different network >> > interfaces (10GbE ixgbe cards instead of the 1GbE igb links), which would >> > suggest the issue is with the 1GbE links. >> > >> > Might there be some buffering parameter that I'm neglecting to tune, >> > which is essential on the 1GbE links but may be less important on the faster >> > links? Are there any known issues with the igb driver that might be the >> > culprit here? Any other suggestions? >> > >> > Thanks, >> > Jona >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >> > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > > -- > T@J From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 00:21:15 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E08A778A; Fri, 14 Jun 2013 00:21:15 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id B9D561371; Fri, 14 Jun 2013 00:21:15 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5E0LFYn034797; Fri, 14 Jun 2013 00:21:15 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5E0LFgE034796; Fri, 14 Jun 2013 00:21:15 GMT (envelope-from rmacklem) Date: Fri, 14 Jun 2013 00:21:15 GMT Message-Id: <201306140021.r5E0LFgE034796@freefall.freebsd.org> To: izrodix@gmail.com, rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org From: rmacklem@FreeBSD.org Subject: Re: kern/177335: [nfs] [panic] Sleeping on "vmopar" with the following non-sleepable locks held: exclusive sleep mutex NFSnode lock (NFSnode lock) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 00:21:16 -0000 Synopsis: [nfs] [panic] Sleeping on "vmopar" with the following non-sleepable locks held: exclusive sleep mutex NFSnode lock (NFSnode lock) State-Changed-From-To: feedback->closed State-Changed-By: rmacklem State-Changed-When: Fri Jun 14 00:19:54 UTC 2013 State-Changed-Why: The patch that stops this crash (int head as r251089) has been MFC'd to stable/8 as r251719. http://www.freebsd.org/cgi/query-pr.cgi?pr=177335 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 00:42:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8B74ADB4 for ; Fri, 14 Jun 2013 00:42:25 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) by mx1.freebsd.org (Postfix) with ESMTP id 6876E15CC for ; Fri, 14 Jun 2013 00:42:25 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id bi5so87413pad.18 for ; Thu, 13 Jun 2013 17:42:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=B90ze03cnmichfN4uIIxOEM0cqrOGsjdCaUr6J02k+g=; b=cG1hv6J1grieALBkXtupcw+5xAcXT5E7i4wmbjawhLEgsDXrF8hGiiht8sJVEXbNl+ h4GcAT1M7OHPSKYiL+uNExaXDnk4v+PcwsG/eGnJy29Y6MFcyf1EwXwuEb88UCJLP7yL xWoiwiTIF5uHUJrbdcAzDoYS3DJe5EtNXmg3PqUf2JY+pmG8/cot/FNpt9fOqtOnR+Sg GFxwYzXhmd9hbE1LHigOCTE40By+6M+s7snAy7h99q19ITaOwVMOby8Pq0BFUNpc9U5W UmyXMthsLMtGfjI+fA0ZFxeiblvJLguvWIwdj+WErZjiYYpgn6Xoy3X5a1P1n9Pr1Vc+ PtOg== MIME-Version: 1.0 X-Received: by 10.68.203.161 with SMTP id kr1mr426271pbc.192.1371170545166; Thu, 13 Jun 2013 17:42:25 -0700 (PDT) Received: by 10.70.31.195 with HTTP; Thu, 13 Jun 2013 17:42:25 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Jun 2013 19:42:25 -0500 Message-ID: Subject: Re: zfs send/recv dies when transferring large-ish dataset From: Adam Vande More To: Jona Schuman Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 00:42:25 -0000 On Wed, Jun 12, 2013 at 6:40 PM, Jona Schuman wrote: > Might there be some buffering parameter that I'm neglecting to tune, > which is essential on the 1GbE links but may be less important on the > faster links? Are there any known issues with the igb driver that > might be the culprit here? Any other suggestions? > ZFS borks on low memory/high io situations. Thinks have improved a lot since your version. The first thing I would try to do is upgrade and 9.0 isn't supported anymore regardless of the ZFS issue. Migrating to STABLE is probably your best chance of success, but 9.1 probably would work too. You also didn't indicate amd64/i386 or any other system specs. IIRC, vm.kmem_size had to set higher even on AMD64 for that era. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 00:52:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0CBE3F5D for ; Fri, 14 Jun 2013 00:52:26 +0000 (UTC) (envelope-from beastie@tardisi.com) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) by mx1.freebsd.org (Postfix) with ESMTP id C629015FF for ; Fri, 14 Jun 2013 00:52:25 +0000 (UTC) Received: from ip70-179-144-108.fv.ks.cox.net ([70.179.144.108] helo=zen.lhaven.homeip.net) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1UnIFX-00005g-9C for freebsd-fs@freebsd.org; Fri, 14 Jun 2013 00:52:19 +0000 X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 70.179.144.108 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/4JoORhAPmozf83kVKfb7aoFG/Mby/QUU= Message-ID: <51BA6941.7040909@tardisi.com> Date: Thu, 13 Jun 2013 19:52:17 -0500 From: The BSD Dreamer User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130516 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS triggered 9-STABLE r246646 panic "vdrop: holdcnt 0" References: <513E8E95.6010802@freebsd.org> In-Reply-To: <513E8E95.6010802@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 00:52:26 -0000 On 03/11/2013 21:10, Lawrence Stewart wrote: > Hi all, > > I got this panic yesterday. I haven't seen it before (or since), but I > have the crashdump and kernel here if there's additional information I > can provide that would be useful in finding the cause. > > The machine runs ZFS exclusively and was under quite heavy CPU and IO > load at the time of the crash as I was compiling in a VirtualBox VM and > on the host itself, as well as running a full KDE desktop environment. > I'm fairly certain the machine was not swapping at the time of the crash. > > lstewart@lstewart> uname -a > FreeBSD lstewart 9.1-STABLE FreeBSD 9.1-STABLE #8 r246646M: Mon Feb 11 > 14:57:13 EST 2013 > root@lstewart:/usr/obj/usr/src/sys/LSTEWART-DESKTOP amd64 > > lstewart@lstewart> sudo kgdb /boot/kernel/kernel /var/crash/vmcore.0 > > [...] > > (kgdb) bt > #0 doadump (textdump=) at pcpu.h:229 > #1 0xffffffff808e5824 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:448 > #2 0xffffffff808e5d27 in panic (fmt=0x1
) at > /usr/src/sys/kern/kern_shutdown.c:636 > #3 0xffffffff8097a71e in vdropl (vp=) at > /usr/src/sys/kern/vfs_subr.c:2465 > #4 0xffffffff80b4da2b in vm_page_alloc (object=0xffffffff8132c000, > pindex=143696, req=32) at /usr/src/sys/vm/vm_page.c:1569 > #5 0xffffffff80b3f312 in kmem_back (map=0xfffffe00020000e8, > addr=18446743524542296064, size=131072, flags=705200752) > at /usr/src/sys/vm/vm_kern.c:361 I just came home to find that my system had panic'd (around 11:30am)....and this was the only FreeBSD 9 'panic: vdrop: holdcnt: 0' that I found. The machine runs ZFS exclusively as well....CPU would be busy, since I run BOINC and distributed.net (go Team FreeBSD :) And, IO load would be high from BackupPC_nightly running...out of the box this job starts at 1am, but I had moved it to run at 11am so that it doesn't run into all things that get scheduled in cron around this time, along with all the backups that I'm running... as well as out of the way when I'm checking email and such first thing in the morning over coffee before heading into work. And, it takes a few hours to grind through the 7.2TB zpool... Its possible that this was happening when it was set to 1am, but I never had a crash dump when it had happened and no indication that a panic was why. Though I did later find out that recollindex cleans itself up when something goes wrong by sending TERM to its pgid....and running recollindex as root from cron during this time....means its sending TERM to init. And, not running it anymore seems to have solved that.... and there didn't seem to be any reason to move BackupPC_nightly back. Plus the other problem would have me wake up to find the machine with console screen in single user mode. With this, I came home to gnome login screen.... So, my system is: lchen@zen:~ 102> uname -a FreeBSD zen.lhaven.homeip.net 9.1-RELEASE-p3 FreeBSD 9.1-RELEASE-p3 #0: Mon Apr 29 18:27:25 UTC 2013 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 but, when I try to look at the dump: lchen@zen:~ 103> sudo kgdb /boot/kernel/kernel /var/crash/vmcore.0 Password: GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff808e9ecb in doadump () (kgdb) There's no kernel.symbols either. The only one that is, is the backup of my 9.0 kernel. Is that because I've been using freebsd-update to update? Here's the info.0 file.... lchen@zen:~ 104> sudo cat /var/crash/info.0 Dump header from device /dev/gpt/swap0 Architecture: amd64 Architecture Version: 2 Dump Length: 9172926464B (8747 MB) Blocksize: 512 Dumptime: Thu Jun 13 11:31:10 2013 Hostname: zen.lhaven.homeip.net Magic: FreeBSD Kernel Dump Version String: FreeBSD 9.1-RELEASE-p3 #0: Mon Apr 29 18:27:25 UTC 2013 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC Panic String: vdrop: holdcnt 0 Dump Parity: 4285100545 Bounds: 0 Dump Status: good So, just to see if anything meaningful might result....I move my /etc/make.conf aside and do a "make buildkernel", and tried a kgdb /usr/obj/usr/src/sys/generic/kernel.debug /var/crash/vmcore.0 which get's me this... GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: vdrop: holdcnt 0 cpuid = 1 KDB: stack backtrace: #0 0xffffffff809208d6 at kdb_backtrace+0x66 #1 0xffffffff808ea8ee at panic+0x1ce #2 0xffffffff8097fa86 at vdropl+0x366 #3 0xffffffff80b522ab at vm_page_alloc+0x28b #4 0xffffffff80bd9096 at uma_small_alloc+0x66 #5 0xffffffff80b3b5fa at keg_alloc_slab+0x9a #6 0xffffffff80b3bb72 at keg_fetch_slab+0xb2 #7 0xffffffff80b3bede at zone_fetch_slab+0x3e #8 0xffffffff80b3b229 at zone_alloc_item+0x59 #9 0xffffffff80b3b431 at uma_large_malloc+0x31 #10 0xffffffff808d5a99 at malloc+0xd9 #11 0xffffffff815b28ee at zio_write_bp_init+0x1fe #12 0xffffffff815b2063 at zio_execute+0xc3 #13 0xffffffff815b3fad at zio_ready+0x17d #14 0xffffffff815b2063 at zio_execute+0xc3 #15 0xffffffff8092cf85 at taskqueue_run_locked+0x85 #16 0xffffffff8092df06 at taskqueue_thread_loop+0x46 #17 0xffffffff808bba1f at fork_exit+0x11f Uptime: 15d13h35m36s Dumping 8747 out of 16308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/nullfs.ko...done. Loaded symbols for /boot/kernel/nullfs.ko Reading symbols from /boot/kernel/zfs.ko...done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/if_tap.ko...done. Loaded symbols for /boot/kernel/if_tap.ko Reading symbols from /boot/kernel/aio.ko...done. Loaded symbols for /boot/kernel/aio.ko Reading symbols from /boot/kernel/accf_data.ko...done. Loaded symbols for /boot/kernel/accf_data.ko Reading symbols from /boot/kernel/accf_http.ko...done. Loaded symbols for /boot/kernel/accf_http.ko Reading symbols from /boot/kernel/coretemp.ko...done. Loaded symbols for /boot/kernel/coretemp.ko Reading symbols from /boot/kernel/cpuctl.ko...done. Loaded symbols for /boot/kernel/cpuctl.ko Reading symbols from /boot/kernel/sem.ko...done. Loaded symbols for /boot/kernel/sem.ko Reading symbols from /boot/modules/cuse4bsd.ko...done. Loaded symbols for /boot/modules/cuse4bsd.ko Reading symbols from /boot/modules/vboxdrv.ko...done. Loaded symbols for /boot/modules/vboxdrv.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko Reading symbols from /boot/kernel/linux.ko...done. Loaded symbols for /boot/kernel/linux.ko Reading symbols from /boot/kernel/libiconv.ko...done. Loaded symbols for /boot/kernel/libiconv.ko Reading symbols from /boot/kernel/libmchain.ko...done. Loaded symbols for /boot/kernel/libmchain.ko Reading symbols from /boot/kernel/cd9660_iconv.ko...done. Loaded symbols for /boot/kernel/cd9660_iconv.ko Reading symbols from /boot/kernel/msdosfs_iconv.ko...done. Loaded symbols for /boot/kernel/msdosfs_iconv.ko Reading symbols from /boot/kernel/ichwd.ko...done. Loaded symbols for /boot/kernel/ichwd.ko Reading symbols from /boot/kernel/fdescfs.ko...done. Loaded symbols for /boot/kernel/fdescfs.ko Reading symbols from /boot/kernel/ipl.ko...done. Loaded symbols for /boot/kernel/ipl.ko Reading symbols from /boot/modules/vboxnetflt.ko...done. Loaded symbols for /boot/modules/vboxnetflt.ko Reading symbols from /boot/kernel/netgraph.ko...done. Loaded symbols for /boot/kernel/netgraph.ko Reading symbols from /boot/kernel/ng_ether.ko...done. Loaded symbols for /boot/kernel/ng_ether.ko Reading symbols from /boot/modules/vboxnetadp.ko...done. Loaded symbols for /boot/modules/vboxnetadp.ko Reading symbols from /usr/local/modules/fuse.ko...done. Loaded symbols for /usr/local/modules/fuse.ko Reading symbols from /boot/kernel/linprocfs.ko...done. Loaded symbols for /boot/kernel/linprocfs.ko Reading symbols from /boot/kernel/linsysfs.ko...done. Loaded symbols for /boot/kernel/linsysfs.ko Reading symbols from /usr/local/libexec/linux_adobe/linux_adobe.ko...done. Loaded symbols for /usr/local/libexec/linux_adobe/linux_adobe.ko Reading symbols from /usr/local/modules/rtc.ko...done. Loaded symbols for /usr/local/modules/rtc.ko #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 224 __asm("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 #1 0xffffffff808ea3d1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0xffffffff808ea8c7 in panic (fmt=0x1
) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0xffffffff8097fa86 in vdropl (vp=Variable "vp" is not available. ) at /usr/src/sys/kern/vfs_subr.c:2400 #4 0xffffffff80b522ab in vm_page_alloc (object=0x0, pindex=0, req=32) at /usr/src/sys/vm/vm_page.c:1537 #5 0xffffffff80bd9096 in uma_small_alloc (zone=Variable "zone" is not available. ) at /usr/src/sys/amd64/amd64/uma_machdep.c:58 #6 0xffffffff80b3b5fa in keg_alloc_slab (keg=0xfffffe043ffef0e0, zone=0xfffffe043ffee000, wait=258) at /usr/src/sys/vm/uma_core.c:844 #7 0xffffffff80b3bb72 in keg_fetch_slab (keg=0xfffffe043ffef0e0, zone=0xfffffe043ffee000, flags=2) at /usr/src/sys/vm/uma_core.c:2173 #8 0xffffffff80b3bede in zone_fetch_slab (zone=0xfffffe043ffee000, keg=0xfffffe043ffef0e0, flags=2) at /usr/src/sys/vm/uma_core.c:2233 #9 0xffffffff80b3b229 in zone_alloc_item (zone=0xfffffe043ffee000, udata=0x0, flags=2) at /usr/src/sys/vm/uma_core.c:2490 #10 0xffffffff80b3b431 in uma_large_malloc (size=16384, wait=2) at /usr/src/sys/vm/uma_core.c:3064 #11 0xffffffff808d5a99 in malloc (size=16384, mtp=0xffffffff81734c20, flags=2) at /usr/src/sys/kern/kern_malloc.c:492 #12 0xffffffff815b28ee in zio_write_bp_init () from /boot/kernel/zfs.ko ---Type to continue, or q to quit--- #13 0x0000000000000010 in ?? () #14 0xfffffe022b9726e0 in ?? () #15 0xfffffe03c81a2a50 in ?? () #16 0xffffff801b78e880 in ?? () #17 0xfffffe000e99e000 in ?? () #18 0xffffff8471d93ae0 in ?? () #19 0xffffffff815b2063 in zio_execute () from /boot/kernel/zfs.ko #20 0x0000000000000000 in ?? () #21 0x0000000000000000 in ?? () #22 0xfffffe03c81a2a50 in ?? () #23 0xffffff801b78e880 in ?? () #24 0xfffffe000e99e000 in ?? () #25 0xffffff8471d93b10 in ?? () #26 0xffffffff815b3fad in zio_ready () from /boot/kernel/zfs.ko #27 0xfffffe03c81a2a50 in ?? () #28 0x0000000000000006 in ?? () #29 0x0000000000000006 in ?? () #30 0xffffff8471d93b50 in ?? () #31 0xffffffff815b2063 in zio_execute () from /boot/kernel/zfs.ko #32 0xfffffe0013c79800 in ?? () #33 0xfffffe03c81a2d90 in ?? () #34 0xfffffe0013c70000 in ?? () #35 0x0000000000000001 in ?? () ---Type to continue, or q to quit--- #36 0xfffffe0013c70000 in ?? () #37 0xffffff8471d93bc0 in ?? () #38 0xffffffff8092cf85 in taskqueue_run_locked (queue=0xffffff800904e380) at /usr/src/sys/kern/subr_taskqueue.c:308 Previous frame inner to this frame (corrupt stack?) (kgdb) l *0xffffffff8097fa86 0xffffffff8097fa86 is at /usr/src/sys/kern/vfs_subr.c:2400. 2395 int active; 2396 2397 ASSERT_VI_LOCKED(vp, "vdropl"); 2398 CTR2(KTR_VFS, "%s: vp %p", __func__, vp); 2399 if (vp->v_holdcnt <= 0) 2400 panic("vdrop: holdcnt %d", vp->v_holdcnt); 2401 vp->v_holdcnt--; 2402 if (vp->v_holdcnt > 0) { 2403 VI_UNLOCK(vp); 2404 return; so, it seems to work, but beyond the fact that it says to panic if vp->v_holdcnt is <= 0...don't know how to look to see why this variable had come to be 0, when it thinks it shouldn't have. I have periodic (about twice a year) scrubs enabled on my system, and the zpool for backuppc was last scrubbed on May 24th (it took 47h57m - repaired 0 with 0 errors.) -- Name: Lawrence "The Dreamer" Chen Email: beastie@tardisi.com Snail: 1530 College Ave, A5 Blog: http://lawrencechen.net Manhattan, KS 66502-2768 Phone: 785-789-4132 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 03:39:07 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 355ECA02; Fri, 14 Jun 2013 03:39:07 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 100EC1CB9; Fri, 14 Jun 2013 03:39:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5E3d6dI073333; Fri, 14 Jun 2013 03:39:06 GMT (envelope-from pfg@freefall.freebsd.org) Received: (from pfg@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5E3d5We073332; Fri, 14 Jun 2013 03:39:06 GMT (envelope-from pfg) Date: Fri, 14 Jun 2013 03:39:06 GMT Message-Id: <201306140339.r5E3d5We073332@freefall.freebsd.org> To: cederom@tlen.pl, pfg@FreeBSD.org, freebsd-fs@FreeBSD.org From: pfg@FreeBSD.org Subject: Re: kern/174060: [ext2fs] Ext2FS system crashes (buffer overflow?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 03:39:07 -0000 Synopsis: [ext2fs] Ext2FS system crashes (buffer overflow?) State-Changed-From-To: open->closed State-Changed-By: pfg State-Changed-When: Fri Jun 14 03:34:15 UTC 2013 State-Changed-Why: Testing with fsx revealed issues that have been worked around by disabling reallocation in r245817 (MFC'd). Without reallocation the filesystem appears to be stable. http://www.freebsd.org/cgi/query-pr.cgi?pr=174060 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 07:04:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 88D4F8E6 for ; Fri, 14 Jun 2013 07:04:28 +0000 (UTC) (envelope-from Ivailo.Tanusheff@skrill.com) Received: from ch1outboundpool.messaging.microsoft.com (ch1ehsobe003.messaging.microsoft.com [216.32.181.183]) by mx1.freebsd.org (Postfix) with ESMTP id 3872016D4 for ; Fri, 14 Jun 2013 07:04:27 +0000 (UTC) Received: from mail97-ch1-R.bigfish.com (10.43.68.225) by CH1EHSOBE018.bigfish.com (10.43.70.68) with Microsoft SMTP Server id 14.1.225.23; Fri, 14 Jun 2013 07:04:26 +0000 Received: from mail97-ch1 (localhost [127.0.0.1]) by mail97-ch1-R.bigfish.com (Postfix) with ESMTP id 168152E033F; Fri, 14 Jun 2013 07:04:26 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null); IPV:NLI; H:AM2PRD0710HT001.eurprd07.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: -3 X-BigFish: PS-3(z54eehz98dI9371I542I1432I4015Izz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz17326ah8275bh8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1e1dh9a9j1155h) Received-SPF: pass (mail97-ch1: domain of skrill.com designates 157.56.249.213 as permitted sender) client-ip=157.56.249.213; envelope-from=Ivailo.Tanusheff@skrill.com; helo=AM2PRD0710HT001.eurprd07.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:SKI; SFS:; DIR:OUT; SFP:; SCL:-1; SRVR:DBXPR07MB062; H:DBXPR07MB064.eurprd07.prod.outlook.com; LANG:en; Received: from mail97-ch1 (localhost.localdomain [127.0.0.1]) by mail97-ch1 (MessageSwitch) id 1371193463719231_26787; Fri, 14 Jun 2013 07:04:23 +0000 (UTC) Received: from CH1EHSMHS026.bigfish.com (snatpool2.int.messaging.microsoft.com [10.43.68.231]) by mail97-ch1.bigfish.com (Postfix) with ESMTP id A3C94460244; Fri, 14 Jun 2013 07:04:23 +0000 (UTC) Received: from AM2PRD0710HT001.eurprd07.prod.outlook.com (157.56.249.213) by CH1EHSMHS026.bigfish.com (10.43.70.26) with Microsoft SMTP Server (TLS) id 14.1.225.23; Fri, 14 Jun 2013 07:04:22 +0000 Received: from DBXPR07MB062.eurprd07.prod.outlook.com (10.242.147.20) by AM2PRD0710HT001.eurprd07.prod.outlook.com (10.255.165.36) with Microsoft SMTP Server (TLS) id 14.16.324.0; Fri, 14 Jun 2013 07:04:12 +0000 Received: from DBXPR07MB064.eurprd07.prod.outlook.com (10.242.147.24) by DBXPR07MB062.eurprd07.prod.outlook.com (10.242.147.20) with Microsoft SMTP Server (TLS) id 15.0.702.21; Fri, 14 Jun 2013 07:04:11 +0000 Received: from DBXPR07MB064.eurprd07.prod.outlook.com ([169.254.7.13]) by DBXPR07MB064.eurprd07.prod.outlook.com ([169.254.7.13]) with mapi id 15.00.0702.005; Fri, 14 Jun 2013 07:04:11 +0000 From: Ivailo Tanusheff To: Jona Schuman , abhay trivedi , "freebsd-fs@freebsd.org" Subject: RE: zfs send/recv dies when transferring large-ish dataset Thread-Topic: zfs send/recv dies when transferring large-ish dataset Thread-Index: AQHOZ8ZVUN+hFJBhLk6aHw9omdejRZkzQxIggACKgICAAGOU04AAmc2g Date: Fri, 14 Jun 2013 07:04:10 +0000 Message-ID: <97ca7eedc13f4b2b809945d067a732b6@DBXPR07MB064.eurprd07.prod.outlook.com> References: <57e0551229684b69bc27476b8a08fb91@DB3PR07MB059.eurprd07.prod.outlook.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [217.18.249.148] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: skrill.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 07:04:28 -0000 This sounds to me as a communicational problem or system getting out of mem= ory ... -----Original Message----- From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On= Behalf Of Jona Schuman Sent: Friday, June 14, 2013 12:52 AM To: abhay trivedi; freebsd-fs@freebsd.org Subject: Re: zfs send/recv dies when transferring large-ish dataset atime is off on both origin and destination datasets On Thu, Jun 13, 2013 at 5:30 PM, abhay trivedi wr= ote: > Can you set atime off on Destination file system and try again? > > > > On Thu, Jun 13, 2013 at 9:26 PM, Jona Schuman wro= te: >> >> machine2# nc -d -l 9999 | zfs receive -v -F -d storagepool machine1#=20 >> zfs send -v -R dataset@snap | nc machine2 9999 >> >> machine1-output: sending from @ to dataset@snap >> machine2-output: receiving full stream of dataset@snap into=20 >> storagepool/dataset@snap >> machine1-output: warning: cannot send 'dataset@snap': Broken pipe >> machine1-output: Broken pipe >> >> >> On Thu, Jun 13, 2013 at 3:42 AM, Ivailo Tanusheff=20 >> wrote: >> > Hi, >> > >> > Can you try send/recv with the -v or with -vP swiches, so you can=20 >> > see more verbose information? >> > >> > Regards, >> > Ivailo Tanusheff >> > >> > -----Original Message----- >> > From: owner-freebsd-fs@freebsd.org=20 >> > [mailto:owner-freebsd-fs@freebsd.org] >> > On Behalf Of Jona Schuman >> > Sent: Thursday, June 13, 2013 2:41 AM >> > To: freebsd-fs@freebsd.org >> > Subject: zfs send/recv dies when transferring large-ish dataset >> > >> > Hi, >> > >> > I'm getting some strange behavior from zfs send/recv and I'm hoping=20 >> > someone may be able to provide some insight. I have two identical=20 >> > machines running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5,=20 >> > zpool >> > 28) for storage. I want to use zfs send/recv for replication=20 >> > between the two machines. For the most part, this has worked as expect= ed. >> > However, send/recv fails when transferring the largest dataset=20 >> > (both in actual size and in terms of number of files) on either machin= e. >> > With these datasets, issuing: >> > >> > machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs=20 >> > send dataset@snap | nc machine2 9999 >> > >> > terminates early on the sending side without any error messages.=20 >> > The receiving end continues on as expected, cleaning up the partial=20 >> > data received so far and reverting to its initial state. (I've=20 >> > tried using mbuffer instead of nc, or just using ssh, both with=20 >> > similar results.) Oddly, zfs send dies slightly differently=20 >> > depending on how the two machines are connected. When connected=20 >> > through the racktop switch, zfs send dies quietly without any indicati= on that the transfer has failed. >> > When connected directly using a crossover cable, zfs send dies=20 >> > quietly and machine1 becomes unresponsive (no network, no keyboard,=20 >> > hard reset required). In both cases, no messages are printed to=20 >> > screen or to anything in /var/log/. >> > >> > >> > I can transfer the same datasets successfully if I send/recv=20 >> > to/from >> > file: >> > >> > machine1# zfs send dataset@snap > /tmp/dump machine1# scp /tmp/dump=20 >> > machine2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump >> > >> > so I don't think the datasets themselves are the issue. I've also=20 >> > successfully tried send/recv over the network using different=20 >> > network interfaces (10GbE ixgbe cards instead of the 1GbE igb=20 >> > links), which would suggest the issue is with the 1GbE links. >> > >> > Might there be some buffering parameter that I'm neglecting to=20 >> > tune, which is essential on the 1GbE links but may be less=20 >> > important on the faster links? Are there any known issues with the=20 >> > igb driver that might be the culprit here? Any other suggestions? >> > >> > Thanks, >> > Jona >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list=20 >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >> > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > > -- > T@J _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 09:14:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A5E553FA; Fri, 14 Jun 2013 09:14:07 +0000 (UTC) (envelope-from tomek.cedro@gmail.com) Received: from mail-pa0-x22f.google.com (mail-pa0-x22f.google.com [IPv6:2607:f8b0:400e:c03::22f]) by mx1.freebsd.org (Postfix) with ESMTP id 821091F32; Fri, 14 Jun 2013 09:14:07 +0000 (UTC) Received: by mail-pa0-f47.google.com with SMTP id kl14so469141pab.6 for ; Fri, 14 Jun 2013 02:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=nm45a9YZG9Z/fzxGO9cvGaUAVHMcAHc+lYbaM1Fem5A=; b=cq6kStlnY5tCEBhazCDUqATZQfJv6e1pwcu8aPHv4lwqNffGN+M9LEO7wBLRNfL+vq ZlLdTXIZTrSFc3KIF1dyJlyUFYhUpMzbYbeueHMttNSScWMGGbC8lnfmFyQxL4XrOqx6 SlGe+AcbVcZT/inu5CdMY6hrznRJ3bEBiq7msexl123NjcfLYXKtrRlzUgNNeQtV/I9/ CUXW4UV9VaJ+44acIGpFjXaI+N/qNrCodfjegpfX0JkVdepSGGUe9+wMWVtYzp/oYX1F H0vqa5QkMCuj4dgm1bAu7+O7yaWpTmwjhqEMgn7wWkZt2kFVP/aE0zOPkykvylt8jI0x DZRg== MIME-Version: 1.0 X-Received: by 10.66.150.40 with SMTP id uf8mr1649975pab.66.1371201247308; Fri, 14 Jun 2013 02:14:07 -0700 (PDT) Sender: tomek.cedro@gmail.com Received: by 10.68.112.4 with HTTP; Fri, 14 Jun 2013 02:14:07 -0700 (PDT) In-Reply-To: <201306140339.r5E3d5We073332@freefall.freebsd.org> References: <201306140339.r5E3d5We073332@freefall.freebsd.org> Date: Fri, 14 Jun 2013 11:14:07 +0200 X-Google-Sender-Auth: sta4BRjpf0KXaNq7C8G3S40LkX4 Message-ID: Subject: Re: kern/174060: [ext2fs] Ext2FS system crashes (buffer overflow?) From: CeDeROM To: pfg@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 09:14:07 -0000 Thank you!! :-) -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 09:55:31 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0DCBA51B; Fri, 14 Jun 2013 09:55:31 +0000 (UTC) (envelope-from josefkarthauser@gmail.com) Received: from mail-we0-x22a.google.com (mail-we0-x22a.google.com [IPv6:2a00:1450:400c:c03::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 7514D1320; Fri, 14 Jun 2013 09:55:30 +0000 (UTC) Received: by mail-we0-f170.google.com with SMTP id w57so309923wes.1 for ; Fri, 14 Jun 2013 02:55:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:message-id:mime-version:date:subject:cc:to :x-mailer; bh=nN0SSje1VyLm4dtAbnSdrhpxlSPscJ35WGIH2IUbT58=; b=UFEWAIK2WQ9Tf9t4RtU9eO7zOOHqj40xe3sAkN3dY2FqMpeUG6/mHD9LDz91pIf/iv +Y44XkFL3q8IqpdoXOmd6x0l9H99OHrBrIuE0ZXdc2oIaL9L3BJALjhCG8mdOyfLto4/ cjayiZCtAOi6Nk19j0ZbZ+fUxJ4pUG+e/0pZYlKpCTbiXeq9wDap86C7F+/5QNenEp/r wYjax2otkCwKIbPThrIDTwPWC++7cSJrBb0wZ/bxy2YuGDoSb+oYgZrt5t9X7ANVqtP0 rOpdV5KNDUxlDGhyUs0ka+qnnz3tMZ3ogFIcPwaQcmAz6EXy8xVkN5LGxsiLdA+sVC0C lCXA== X-Received: by 10.180.39.136 with SMTP id p8mr877971wik.11.1371203729585; Fri, 14 Jun 2013 02:55:29 -0700 (PDT) Received: from phoenix.fritz.box ([81.187.183.70]) by mx.google.com with ESMTPSA id o14sm1977777wiv.3.2013.06.14.02.55.28 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Jun 2013 02:55:28 -0700 (PDT) From: Dr Josef Karthauser Message-Id: <301B4131-F677-4B8D-ABF6-A6D269FE604E@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Date: Fri, 14 Jun 2013 10:55:29 +0100 Subject: Help! :( ZFS panic on boot, importing pool after server crash. To: fs@freebsd.org X-Mailer: Apple Mail (2.1503) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 09:55:31 -0000 Hi, I'm a bit at the end of my tether. We had a ZFS panic last night on a machine that hosts all my mail and = web; it was rebooted and it now panics mounting the ZFS root filesystem. The call stack info is: solaris assert: ss =3D=3D NULL, file: = /usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/s= pace_map.c, line: 109 kdb_backtrace panic space_map_add space_map_load metaslab_activate metaslab_allocate zio_dva_allocate=09 zio_execute taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline I can boot from the live DVD filesystem, but I can only mount the pool = read-only without getting the same kernel panic. This is with FreeBSD = 9.0. The machine is remote, and I don't have access other than through a DRAC = console port (so I can't cut and paste; sorry for the poor stack trace). Is anyone here in the position to advice me how I might process to get = this machine mounting and running again in multi-user mode? Thanks so much. Joe p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS = root file system.= From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 11:00:57 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 726CE80B; Fri, 14 Jun 2013 11:00:57 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-lb0-f176.google.com (mail-lb0-f176.google.com [209.85.217.176]) by mx1.freebsd.org (Postfix) with ESMTP id C181F10D3; Fri, 14 Jun 2013 11:00:56 +0000 (UTC) Received: by mail-lb0-f176.google.com with SMTP id z5so453531lbh.35 for ; Fri, 14 Jun 2013 04:00:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=jCKtUJQG1oIn/J1GPhD2GxSSIaMiTzdi43gmUk5mmOY=; b=DWPhNWoUPjDVfYX0tXEHzWpA5lOaKcJI+c1KxE13qJjq8tjV9NNYvHn7iiITkv06ut u/aNFyxss0EZ0BRdzLivS7YDdZx+pzHKVoBuJ+ecgZBVTgR0dYW1kuwhhz7pAEqnLC/Z pHYIHJ4VARUVXQPgZAyKIeu/N+YxHwYBxOp1gndK5/CXy3lrlusPZqRmmUeD3pWmLCMn siWATlGRlNu6lm/zQ5Wvmo8TPiwRAXPHT4Gbv18funeRvwiRT1D4kl/zXMug/rNDb7iE o0DdJuiXJNekNFljpKEHbXCjEkYoab+u9iZFsoYBRe/a2hxZVLa+R6q9dL9UAyeT5KhR wrRQ== X-Received: by 10.152.20.66 with SMTP id l2mr920749lae.30.1371207655326; Fri, 14 Jun 2013 04:00:55 -0700 (PDT) Received: from [192.168.1.125] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPSA id p20sm663301lbb.17.2013.06.14.04.00.52 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Jun 2013 04:00:54 -0700 (PDT) Message-ID: <51BAF7E3.4020401@gmail.com> Date: Fri, 14 Jun 2013 14:00:51 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:20.0) Gecko/20100101 Firefox/20.0 SeaMonkey/2.17.1 MIME-Version: 1.0 To: Dr Josef Karthauser , fs@freebsd.org Subject: Re: Help! :( ZFS panic on boot, importing pool after server crash. References: <301B4131-F677-4B8D-ABF6-A6D269FE604E@gmail.com> In-Reply-To: <301B4131-F677-4B8D-ABF6-A6D269FE604E@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 11:00:57 -0000 14.06.2013 12:55, Dr Josef Karthauser: > Hi, I'm a bit at the end of my tether. > > We had a ZFS panic last night on a machine that hosts all my mail and web; it was rebooted and it now panics mounting the ZFS root filesystem. > > The call stack info is: > > solaris assert: ss == NULL, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/space_map.c, line: 109 > > kdb_backtrace > panic > space_map_add > space_map_load > metaslab_activate > metaslab_allocate > zio_dva_allocate > zio_execute > taskqueue_run_locked > taskqueue_thread_loop > fork_exit > fork_trampoline > > I can boot from the live DVD filesystem, but I can only mount the pool read-only without getting the same kernel panic. This is with FreeBSD 9.0. > > The machine is remote, and I don't have access other than through a DRAC console port (so I can't cut and paste; sorry for the poor stack trace). > > Is anyone here in the position to advice me how I might process to get this machine mounting and running again in multi-user mode? There's no official way. > p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root file system. If you are fairly sure about your devices you can: 1. Remove second disk from pool or create another pool on top of it. 2. Recreate all FS structure on the second disk. You can dump al your FS with something like: zfs list -Ho name | xargs -n1 zfs get -H all | awk 'BEGIN{shard="";output=""}{if(shard!=$1 && shard!=""){output="zfs create";for(param in params)output=output" -o "param"="params[param];print output" "shard;delete params;shard=""}}$4~/local/{params[$2]=$3;shard=$1;next}$2~/type/{shard=$1}END{output="zfs create";for(param in params)output=output" -o "param"="params[param];print output" "shard;}' Be sure to rename the pool and change the first line. 3. Rsync all data to the second disk. 4. Try to boot from the second disk. If everything worked you are free to attach first disk to second one to create a mirror again. -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 11:04:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8116ABD8; Fri, 14 Jun 2013 11:04:31 +0000 (UTC) (envelope-from tomek.cedro@gmail.com) Received: from mail-pa0-x22e.google.com (mail-pa0-x22e.google.com [IPv6:2607:f8b0:400e:c03::22e]) by mx1.freebsd.org (Postfix) with ESMTP id 5BF721174; Fri, 14 Jun 2013 11:04:31 +0000 (UTC) Received: by mail-pa0-f46.google.com with SMTP id fa11so544830pad.5 for ; Fri, 14 Jun 2013 04:04:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=Vq7tzr7emA/C3NoeNL7IVbRQmqubNAWOXN3lItraZHI=; b=XzrnTAT+rWRXRh0T4rb9nOqnjI98DatQaG/haddByGRv1NOdPrMrFNERxelApMIB0g OcGcxBLxne4PxxkATShw+UPuySWzSN8vW6JEp4JxjCdfiA0XFcGGaMWuM2dH2zxckwn9 2GvRLRw6BMHINpx9N2k21iFzd08V4DxbMHYZO3v6HpF4OhKFWompVU+ImhPFc/i0dLW0 WDeobGeOwW1aIR8vWOfrNYg3jbzJkQgwmSSkENniKCyt0PODHoQLhBArE//5EppWLQ3s sdZTuJdjX7rxIV3dR1E4qqHxxzz9lO0v+dB+fSjPC0vKL1+sjdASEezgHKdssgF/EQoA KYSg== MIME-Version: 1.0 X-Received: by 10.68.247.69 with SMTP id yc5mr2036094pbc.66.1371207871048; Fri, 14 Jun 2013 04:04:31 -0700 (PDT) Sender: tomek.cedro@gmail.com Received: by 10.68.112.4 with HTTP; Fri, 14 Jun 2013 04:04:30 -0700 (PDT) In-Reply-To: <51BAF609.9040201@FreeBSD.org> References: <201306140339.r5E3d5We073332@freefall.freebsd.org> <51BAF609.9040201@FreeBSD.org> Date: Fri, 14 Jun 2013 13:04:30 +0200 X-Google-Sender-Auth: lW-obJ5CSj0Kr0sTqi-DV0T8J2s Message-ID: Subject: Re: kern/174060: [ext2fs] Ext2FS system crashes (buffer overflow?) From: CeDeROM To: Pedro Giffuni Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 11:04:31 -0000 On Fri, Jun 14, 2013 at 12:52 PM, Pedro Giffuni wrote: > On 14.06.2013 04:14, CeDeROM wrote: >> Thank you!! :-) >> -- >> CeDeROM, SQ7MHZ, http://www.tomek.cedro.info > > Thank you for the report... > I am still working on a fix for the reallocblk issue, which > happens to be a can of worms. > Pedro. Wow, I can imagine, I keep my fingers crossed, good luck!! :-) -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 11:53:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1BD131E8 for ; Fri, 14 Jun 2013 11:53:52 +0000 (UTC) (envelope-from pfg@FreeBSD.org) Received: from nm8-vm8.bullet.mail.gq1.yahoo.com (nm8-vm8.bullet.mail.gq1.yahoo.com [98.136.218.231]) by mx1.freebsd.org (Postfix) with ESMTP id DAECE1950 for ; Fri, 14 Jun 2013 11:53:51 +0000 (UTC) Received: from [98.137.12.191] by nm8.bullet.mail.gq1.yahoo.com with NNFMP; 14 Jun 2013 10:53:00 -0000 Received: from [208.71.42.191] by tm12.bullet.mail.gq1.yahoo.com with NNFMP; 14 Jun 2013 10:53:00 -0000 Received: from [127.0.0.1] by smtp202.mail.gq1.yahoo.com with NNFMP; 14 Jun 2013 10:53:00 -0000 X-Yahoo-Newman-Id: 78476.65236.bm@smtp202.mail.gq1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: OMT0QVMVM1nfVLyOrfQdEgwqXhc5LQZxozAGEiHr7aYnbOR 83QeuAUPCAwx77JCQenFY9_c9YdMPP6aMMpw8V6_z2cszQi.Zw.LdBrJTLBR R6BFZf5fxGv62hszvqwfg2LWVGMEWYMJZQ_1mt1ewP5jQ80LBITt7lN.dyGj uPqU8zP7LJ_NgMhYD4i9FUm3_raJP8CiWp2Fru3evdmodZcSFH0xLuwWS4sH AK.fNr1fEGvOWWF21.KFhaZxL35CDjD4KTP1csm5gWJ.02ELJvFzm_b2pXBP SFa9tt_56E3dfw8SRLw27odDNusgpIFk8X7SBjNtcKY0dScYpIlBvl9CWIyr 7wiMW.jpGHgire2esR1OUgSJzQZqqWeFn39odhqpjXj6ljDkkP3C9ZwYT53d itpCDPx_12VHwFqt5QylvY_BKFIFfLSg7jQjmkd6ZgSP28CH1ae2t2ure46B neQfb4BTISl9_CXky.Zpe05Tis2SvUkygd8aPzVLIuG7E3.e6Td.tPXLihfZ PmBaan4SDCamSMs8J_8_OWun2QO5fJ4Dls0XkO.GZzJtz3y7UAjuRnvMdD4C EHsHrFNVV7dRNpiOz8q0- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with ) by smtp202.mail.gq1.yahoo.com with SMTP; 14 Jun 2013 03:53:00 -0700 PDT Message-ID: <51BAF609.9040201@FreeBSD.org> Date: Fri, 14 Jun 2013 05:52:57 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: CeDeROM Subject: Re: kern/174060: [ext2fs] Ext2FS system crashes (buffer overflow?) References: <201306140339.r5E3d5We073332@freefall.freebsd.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 11:53:52 -0000 On 14.06.2013 04:14, CeDeROM wrote: > Thank you!! :-) > > -- > CeDeROM, SQ7MHZ, http://www.tomek.cedro.info Thank you for the report... I am still working on a fix for the reallocblk issue, which happens to be a can of worms. Pedro. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 12:51:44 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7E8ACD3F; Fri, 14 Jun 2013 12:51:44 +0000 (UTC) (envelope-from josefkarthauser@gmail.com) Received: from mail-wg0-x231.google.com (mail-wg0-x231.google.com [IPv6:2a00:1450:400c:c00::231]) by mx1.freebsd.org (Postfix) with ESMTP id E52521D45; Fri, 14 Jun 2013 12:51:43 +0000 (UTC) Received: by mail-wg0-f49.google.com with SMTP id a12so461576wgh.28 for ; Fri, 14 Jun 2013 05:51:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=WXoR6eZeOIXmv9TKhiys0rQMPHQV+SdWKECKdmhBQ/M=; b=RIsPWINv+dXnITu0ky8QQj+VjColWoba5hlA+bUB1kApkL4UmN1Xhbhm/r2jtivvtV rn0IfYFAHFwFCJIUMId9Bv1biS3XM6P5R9HQ2kj+vej6a5Zi2eujgj1MgacFzikfMy8A 5wcw7mZ/pr/Gl9K8fsyUBl3m0sK0NbHYHPFuv5/mVtbJ0915yrri8X9dYujp1Ry4ExSh YMyeU7eNq8MecuZJglFaX8yVStNLWwQMaqroF5WlbJwBKoUylNmDC48FeFMS7E9Nmv7T q39BynSUmaX0tACHJEFpSjgLKIRG39Q8QgJt9hfxbLnQbcSNRY05SLrI16SqA/dp3bda Ldgw== X-Received: by 10.194.242.136 with SMTP id wq8mr1367543wjc.60.1371214303095; Fri, 14 Jun 2013 05:51:43 -0700 (PDT) Received: from ?IPv6:2001:8b0:3a3::ad3c:495b:5c7b:32f1? ([2001:8b0:3a3:0:ad3c:495b:5c7b:32f1]) by mx.google.com with ESMTPSA id fo10sm3008607wib.8.2013.06.14.05.51.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Jun 2013 05:51:42 -0700 (PDT) Subject: Re: Help! :( ZFS panic on boot, importing pool after server crash. Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Content-Type: text/plain; charset=us-ascii From: Dr Josef Karthauser In-Reply-To: <51BAF7E3.4020401@gmail.com> Date: Fri, 14 Jun 2013 13:51:40 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <9AF22029-B753-4D74-A798-11C0A1C55D88@gmail.com> References: <301B4131-F677-4B8D-ABF6-A6D269FE604E@gmail.com> <51BAF7E3.4020401@gmail.com> To: Volodymyr Kostyrko X-Mailer: Apple Mail (2.1503) Cc: "freebsd-stable@freebsd.org" , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 12:51:44 -0000 On 14 Jun 2013, at 12:00, Volodymyr Kostyrko wrote: > 14.06.2013 12:55, Dr Josef Karthauser: >> Hi, I'm a bit at the end of my tether. >> p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a = ZFS root file system. >=20 > If you are fairly sure about your devices you can: >=20 > 1. Remove second disk from pool or create another pool on top of it. >=20 > 2. Recreate all FS structure on the second disk. You can dump al your = FS with something like: >=20 Great. Thanks for that. Have you got a hint as to how I can get access to the root file system? = It's currently set to have a legacy mount point. Which means that when = I import the pool: # zfs import -o readonly=3Don -o altroot=3D/tmp/zfs -f poolname the root filesystem is missing. Then if I try and set the mount point: #zfs set mountpoint=3D/tmp/zfs2 poolname it just sits there; probably because the command is blocking on the R/O = pool, or something. How do I temporarily remount the root filesystem so that I can get = access to the files? Thanks, Joe From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 13:06:42 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EFC17928; Fri, 14 Jun 2013 13:06:42 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com [209.85.217.174]) by mx1.freebsd.org (Postfix) with ESMTP id 48E001E20; Fri, 14 Jun 2013 13:06:42 +0000 (UTC) Received: by mail-lb0-f174.google.com with SMTP id x10so577997lbi.33 for ; Fri, 14 Jun 2013 06:06:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=/61i0b7zhMma+Wf2YiJn0SDe+BaC8XnCfd2m+7eHpJU=; b=CoGk7IcI0PcqPOmd6thaf6JWHGMG7oNO6x9lSjlDgdpjvGFprMgnXWmcdxMZVANF1v v4b6q97GIpDCfgAQgyEywwEfKb4xbnNBpf7WBanzIAig0ltnvSaRBVMdRGha7oUvx9FK bgmqRlJjozhEz8Rbf8jt1a1XM1/UaWeg6HlUV+nGz0TOXz/cK2IINOzCgTIwfcylcAf4 YCHEh/e7uJhcYppq7JGBG+6rCW+R1cn3WcRkzwUQSstl5WVp553aDmLAJgH/6kjVbTlf KO2YcDFdWRZYb9dOc4iFNBslpXK5ZQEbC9mAq+CrsQfz37GnMAX4aFAW7tiQGZMnIILd FCXQ== X-Received: by 10.112.144.6 with SMTP id si6mr1090609lbb.61.1371215195200; Fri, 14 Jun 2013 06:06:35 -0700 (PDT) Received: from [192.168.1.125] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPSA id p20sm842159lbb.17.2013.06.14.06.06.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Jun 2013 06:06:34 -0700 (PDT) Message-ID: <51BB1559.7050803@gmail.com> Date: Fri, 14 Jun 2013 16:06:33 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:20.0) Gecko/20100101 Firefox/20.0 SeaMonkey/2.17.1 MIME-Version: 1.0 To: Dr Josef Karthauser Subject: Re: Help! :( ZFS panic on boot, importing pool after server crash. References: <301B4131-F677-4B8D-ABF6-A6D269FE604E@gmail.com> <51BAF7E3.4020401@gmail.com> <9AF22029-B753-4D74-A798-11C0A1C55D88@gmail.com> In-Reply-To: <9AF22029-B753-4D74-A798-11C0A1C55D88@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-stable@freebsd.org" , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 13:06:43 -0000 14.06.2013 15:51, Dr Josef Karthauser: > On 14 Jun 2013, at 12:00, Volodymyr Kostyrko wrote: > >> 14.06.2013 12:55, Dr Josef Karthauser: >>> Hi, I'm a bit at the end of my tether. > >>> p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root file system. >> >> If you are fairly sure about your devices you can: >> >> 1. Remove second disk from pool or create another pool on top of it. >> >> 2. Recreate all FS structure on the second disk. You can dump al your FS with something like: >> > > Great. Thanks for that. > > Have you got a hint as to how I can get access to the root file system? It's currently set to have a legacy mount point. Which means that when I import the pool: > > # zfs import -o readonly=on -o altroot=/tmp/zfs -f poolname > > the root filesystem is missing. Then if I try and set the mount point: > > #zfs set mountpoint=/tmp/zfs2 poolname > > it just sits there; probably because the command is blocking on the R/O pool, or something. > > How do I temporarily remount the root filesystem so that I can get access to the files? mount -t zfs Personally when I need to work with such pools I first import the pool with -N (nomount) option, then I mount root fs by hand and after that goes `zfs mount -a` which handles everything else. -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 17:36:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BBD30904; Fri, 14 Jun 2013 17:36:33 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 8C39B1F14; Fri, 14 Jun 2013 17:36:33 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.31]) by ltcfislmsgpa06.fnfis.com (8.14.5/8.14.5) with ESMTP id r5EHaWlF029633 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 14 Jun 2013 12:36:32 -0500 Received: from LTCFISWMSGMB21.FNFIS.com ([10.132.99.23]) by LTCFISWMSGHT03.FNFIS.com ([10.132.206.31]) with mapi id 14.02.0309.002; Fri, 14 Jun 2013 12:36:06 -0500 From: "Teske, Devin" To: "freebsd-fs@freebsd.org" Subject: ZFS Union Thread-Topic: ZFS Union Thread-Index: AQHOaSWmuit0fMTVvU+5jica75CJmg== Date: Fri, 14 Jun 2013 17:36:05 +0000 Message-ID: <13CA24D6AB415D428143D44749F57D7201F81804@ltcfiswmsgmb21> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.126] Content-Type: text/plain; charset="Windows-1252" Content-ID: <875536F73867504C99A7BE0F16643FDE@fisglobal.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.431, 0.0.0000 definitions=2013-06-14_06:2013-06-14,2013-06-14,1970-01-01 signatures=0 Cc: Devin Teske X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Devin Teske List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 17:36:33 -0000 Hi List, I had an idea recently that I thought might be worth chasing. I've thought for a long time that it would be really great if I could have = (for the purpose of jails): + ZFS filesystem /vm/master + ZFS filesystem /vm/unit1 + ZFS filesystem /vm/unit2 + ZFS filesystem /vm/unit3 Unlike a ZFS snapshot/clone system, where changes in /vm/master made beyond= the snapshot for which /vm/unit{1,2,3} were cloned-from do not effect /vm/= unit{1,2,3}=85 What if there was a way to layer /vm/unit{1,2,3} in a union manner to be th= e top-layer above /vm/master. I believe that UnionFS isn't of help here specifically because I believe it= to not support the following use-case example: Step 1. touch /vm/master/foo ASIDE: /vm/unit1/foo does not exist prior to Step 1 Step 2. See that now /vm/unit{1,2,3}/foo exists (that was the litmus test for any layering filesystem, now comes the part t= hat I believe UnionFS fails) Step 3. Now, rm /vm/unit1/foo Step 4. See that /vm/master/foo is still there, but /vm/unit1/foo remains g= one Step 5. Counter to Step 4 above, See that /vm/unit2/foo and /vm/unit3/foo s= till exist In other words=85 the filesystem should be able to keep track of unlinked f= iles as a black-list. Enhancing ZFS to support union is quite sexy, because when you go down the = rabbit hole of Steps 3-5 above, you realize you may want to (as an administ= rator) "reclaim" files from a lower layer. This could perhaps be tacked ont= o the "zfs" utility (whereas if enhancing UnionFS, a new utility would need= to be born). I would imagine a "zfs reclaim" (hypothetical fictional command) to allow t= he path from lower layers to become visible again, so-long-as a lower-layer= hasn't black-listed it from an even lower-layer. The end-run production use of this would be to allow jails to "inherit" fil= es from a lower layer but unlike a snapshot system, continue to inherit per= petually in realtime. Going into a lower layer and making a change would im= mediately percolate that change to all the jails layered on top. It would also mean nice lean deltas. Layering "/foo" on top of "/" would be= a quick and dirty way of cloning your base system into a new jail where al= l the writes go off to that directory (something existing UnionFS technolog= ies already do) and -- something not done by existing UnionFS technologies = -- unlinked files will not appear (giving the idea that, while chroot'd or = jail'd into that directory, you have more control over your universe becaus= e you "rm" a file and it goes away; but of course [hypothetically] an admin= istrator in parent host to the jail can "reclaim" it for you from a lower l= ayer if you want him/her to do-so). Thoughts? --=20 Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 18:00:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 39262BC8 for ; Fri, 14 Jun 2013 18:00:34 +0000 (UTC) (envelope-from feld@feld.me) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id 0DAF01FF9 for ; Fri, 14 Jun 2013 18:00:33 +0000 (UTC) Received: from compute4.internal (compute4.nyi.mail.srv.osa [10.202.2.44]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 07AB9207D3 for ; Fri, 14 Jun 2013 14:00:33 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute4.internal (MEProxy); Fri, 14 Jun 2013 14:00:33 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=feld.me; h= content-type:to:subject:references:date:mime-version :content-transfer-encoding:from:message-id:in-reply-to; s= mesmtp; bh=XrNUxlD/NZOwpUt1+J/0OMoNnik=; b=i8w9k6KU2/mOpGk+LK9Cp 8o5nUlQ07kLo2OwDRTAbIkk6FhYj0Od4RrmZHsQ2CNYNgXrtTp+Y7RbvMT2R35l9 ttY0lcSp/zxG0y6U+cNszygBzSMwtg7xgeqvJdFtyi2wHLDjtyiZld+AGXN9hea5 XtNCT6vgMSm85GmD+CDo/g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-type:to:subject:references:date :mime-version:content-transfer-encoding:from:message-id :in-reply-to; s=smtpout; bh=XrNUxlD/NZOwpUt1+J/0OMoNnik=; b=iZPq 5L9Y7OBbx8u7URG8SlGR/p2TFsJfHE86SnaQgGSHU1U3XE42LYKP0N0Z3YZUDO4S Q+Q3NmR7txfFI4nXkNHwkB6GRUkb6h1+BIia0naOI6lDimcTGhEz84ROTWCXsQMe 4XzxQUKJ1D7VDi4orxOx0iktSBSt+/lvTrdqmFk= X-Sasl-enc: pTBzqYqoLI/ys6ynXX6XQuCtVNu9uCadcZBQjxd+uSxn 1371232832 Received: from tech304.office.supranet.net (unknown [66.170.8.18]) by mail.messagingengine.com (Postfix) with ESMTPA id C3477C00E84 for ; Fri, 14 Jun 2013 14:00:32 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "freebsd-fs@freebsd.org" Subject: Re: ZFS Union References: <13CA24D6AB415D428143D44749F57D7201F81804@ltcfiswmsgmb21> Date: Fri, 14 Jun 2013 13:00:32 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: <13CA24D6AB415D428143D44749F57D7201F81804@ltcfiswmsgmb21> User-Agent: Opera Mail/12.15 (FreeBSD) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 18:00:34 -0000 On Fri, 14 Jun 2013 12:36:05 -0500, Teske, Devin wrote: > Thoughts? Yes. A unanimous yes. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 18:51:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4E5B1B09; Fri, 14 Jun 2013 18:51:51 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 2D809123E; Fri, 14 Jun 2013 18:51:51 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r5EIpkl2054401; Fri, 14 Jun 2013 11:51:46 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201306141851.r5EIpkl2054401@chez.mckusick.com> To: Devin Teske , "Teske, Devin" Subject: Re: ZFS Union In-reply-to: <13CA24D6AB415D428143D44749F57D7201F81804@ltcfiswmsgmb21> Date: Fri, 14 Jun 2013 11:51:46 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 18:51:51 -0000 The union filesystem uses "whiteout" to remove files that appear in a lower layer. In your example, when you `rm /vm/unit1/foo' what happens is that a whiteout entry gets created for /vm/unit1/foo. (Whiteout is implemented by creating a name with inode number 1; Inode 1 is the "anti-inode" which when combined with any other inode disappears in a cloud of greasy smoke.) Thus /vm/master/foo continues to exist and is visible as /vm/unit2/foo and /vm/unit3/foo. You can "recover" /vm/unit1/foo using `rm -W /vm/unit1/foo' which will remove the whiteout entry causing /vm/master/foo to once again be visible as /vm/unit1/foo. In short, I believe that the existing union filesystem will do what you want to do. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Fri Jun 14 19:01:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 56110E25; Fri, 14 Jun 2013 19:01:24 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 24AD812B2; Fri, 14 Jun 2013 19:01:23 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.17]) by ltcfislmsgpa03.fnfis.com (8.14.5/8.14.5) with ESMTP id r5EJ1Ej6015037 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 14 Jun 2013 14:01:16 -0500 Received: from LTCFISWMSGMB21.FNFIS.com ([10.132.99.23]) by LTCFISWMSGHT06.FNFIS.com ([10.132.206.17]) with mapi id 14.02.0309.002; Fri, 14 Jun 2013 14:01:11 -0500 From: "Teske, Devin" To: Kirk McKusick Subject: Re: ZFS Union Thread-Topic: ZFS Union Thread-Index: AQHOaTA7lOdSGgtNJkChFq/RufGZx5k15IEA Date: Fri, 14 Jun 2013 19:01:11 +0000 Message-ID: <13CA24D6AB415D428143D44749F57D7201F81A60@ltcfiswmsgmb21> References: <201306141851.r5EIpkl2054401@chez.mckusick.com> In-Reply-To: <201306141851.r5EIpkl2054401@chez.mckusick.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.126] MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.431, 0.0.0000 definitions=2013-06-14_07:2013-06-14,2013-06-14,1970-01-01 signatures=0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-fs@freebsd.org" , Devin Teske X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Devin Teske List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jun 2013 19:01:24 -0000 On Jun 14, 2013, at 11:51 AM, Kirk McKusick wrote: The union filesystem uses "whiteout" to remove files that appear in a lower layer. In your example, when you `rm /vm/unit1/foo' what happens is that a whiteout entry gets created for /vm/unit1/foo. (Whiteout is implemented by creating a name with inode number 1; Inode 1 is the "anti-inode" which when combined with any other inode disappears in a cloud of greasy smoke.) WINO=85 yes=85 just as your response came in, I was finding the code=85 http://svnweb.freebsd.org/base/head/sys/ufs/ufs/ufs_lookup.c?r1=3D156418&r2= =3D160269 if (ep->d_ino =3D=3D 0 || - (ep->d_ino =3D=3D WINO && + (ep->d_ino =3D=3D WINO && namlen =3D=3D dirp->d_namlen && bcmp(ep->d_name, dirp->d_name, dirp->d_namlen) =3D=3D 0)) { Thus /vm/master/foo continues to exist and is visible as /vm/unit2/foo and /vm/unit3/foo. You can "recover" /vm/unit1/foo using `rm -W /vm/unit1/foo' which will remove the whiteout entry causing /vm/master/foo to once again be visible as /vm/unit1/foo. Beautiful=85 that was my next consternation after seeing that it was in the= filesystem layer (how to reset the value from WINO to something that will = allow the lower layer to bleed through). In short, I believe that the existing union filesystem will do what you want to do. Kirk McKusick Absolutely right=85 thank you much Sir! I didn't know about "rm -W" until today. -- Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. From owner-freebsd-fs@FreeBSD.ORG Sat Jun 15 14:52:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 23FF62AD for ; Sat, 15 Jun 2013 14:52:45 +0000 (UTC) (envelope-from break19@gmail.com) Received: from mail-yh0-x232.google.com (mail-yh0-x232.google.com [IPv6:2607:f8b0:4002:c01::232]) by mx1.freebsd.org (Postfix) with ESMTP id DDEBB1A8E for ; Sat, 15 Jun 2013 14:52:44 +0000 (UTC) Received: by mail-yh0-f50.google.com with SMTP id i72so525901yha.37 for ; Sat, 15 Jun 2013 07:52:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=F+SWI9D/WnONJLM30NukGnOlFceyXrTv7xvmqREmrBU=; b=fkyIM4EuGsnZEiXJJhg3JxGHuFyrAIEk4GIv2OHMjKjO8qZvWZKQrlZpL9x3kpLUi5 gcCu4mRrZN0zsy5WkYPCgzW0f/bXuZdY7fd5tfIlkEqTyN0tzmv0ghuZSdbD0CKS+PGJ ymzTYMZpApwxtweUyxioHSKEctC85ZjkU53whMf4UNT5A0C8TfKdbtbJ5RNZjctZF5YP ulS7h+9rJpWW6v3YXKdKNEe4Rg69yu+HIy25PSbvmZRJYdM4i34RkbBcfltFd44eKbi5 TMzj53oW1kj6gq2ffLbhip5evdF1VTttxsLICoQUOTsiDeeGfYfY2AIHraHdSYx3j2Bv lz1g== X-Received: by 10.236.17.165 with SMTP id j25mr4094624yhj.89.1371307964174; Sat, 15 Jun 2013 07:52:44 -0700 (PDT) Received: from [192.168.2.16] (173-17-218-61.client.mchsi.com. [173.17.218.61]) by mx.google.com with ESMTPSA id a62sm10842574yhk.4.2013.06.15.07.52.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 15 Jun 2013 07:52:43 -0700 (PDT) Message-ID: <51BC7FB1.8040000@gmail.com> Date: Sat, 15 Jun 2013 09:52:33 -0500 From: Chuck Burns User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Changing the default for ZFS atime to off? References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <20130608213331.GB18201@icarus.home.lan> <01719722FD8A41B4A4366611972A703A@multiplay.co.uk> <20130609001532.GA21540@icarus.home.lan> <459E2FCADB4E40079066E4ABDBE47AFE@multiplay.co.uk> In-Reply-To: <459E2FCADB4E40079066E4ABDBE47AFE@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Jun 2013 14:52:45 -0000 On 6/8/2013 7:48 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Jeremy Chadwick" > >>> To clarify when I say "by default" this only effect newly created >>> pools / volumes, it would not effect any existing volumes and hence >>> couldn't break existing installs. >>> >>> As I mentioned there are apps, mainly mail focused ones, which rely >>> on on atime, but thats easy to keep working by ensuring these are >>> stored on volumes which do have atime=on. >> >> The problem is that your proposed change (to set atime=off as the >> default) means the administrator: >> >> 1. Has to be aware that the default is now atime=off going forward, >> and thus, >> >> 2. Must manually set atime=on on filesystems where it matters, which may >> also mean creating a separate filesystem just for certain >> purposes/tasks (which may not be possible with UFS after-the-fact). >> >> The reality of #1, I'm sorry to say, is that barring some kind of mass >> announcement on every single FreeBSD mailing list (I don't mean just >> -announce, I mean EVERY LIST) to inform people of this change, as well >> as some gigantic 72pt font text on www.freebsd.org telling people, most >> people are not going to know about it. I know that reality doesn't work >> in your favour, but it's how things are. A single line in the Release >> Notes is going to be overlooked. >> >> I cannot even begin to cover all the situations/cases of #2, so I'll >> just do a brain dump as I think: >> >> i) ZFS: You might think this is as easy as creating a separate >> filesystem that's for /var/mail -- it is not that simple. Many people >> have their mail delivered to mboxes within $HOME, i.e. ~user/Mail, and >> /var/mail never gets used. It worsens when you consider people are >> being insane with ZFS filesystems, such as creating a separate >> filesystem for every single user on the system. >> >> ii) With UFS, you might think it's as easy as removing noatime from >> /etc/fstab for /var, but it isn't -- same situation as (i). >> >> iii) There is the situation with UFS and bsdinstall where you can choose >> the "quick and easy" partitioning/filesystem setu results in one big / >> and that's all. Now the admin has to remove noatime from /etc/fstab and >> basically loses any benefit noatime provided per your proposal. > > The initial question was for ZFS, with UFS being secondary, but yes > UFS isn't as easy as UFS. > >> iv) It is very common for setups to have two separate places for mail >> storage, i.e. the default is /var/mail/username, but users with a >> .forward and/or .procmailrc may be siphoning mail to $HOME/Mail/folder >> instead. So now you have two filesystems where atime needs to be >> enabled. > > Could that not be covered by: /var /home for the common case at least? > >> v) Non-mail-related stuff, meaning there may actually be users and >> administrators who rely upon access times to indicate something. >> >> None of these touche base on what Bruce Evans stated too: that atime=on >> by default is a requirement to be POSIX-compliant. That's also >> confirmed here at Wikipedia WRT stat(2) (which also mentions some other >> software that relies on atime too): >> >> http://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime > > So yes others think its a less than stellar idea ;-) > >>> The messaging and changes to installers which support ZFS root >>> installs, such as mfsbsd, would need to be included in this but >>> I don't see that as a blocker. >> >> See above -- I think you are assuming mail always gets stored on one >> filesystem, which quite often not the case. > > Its still seems simple to fix, see above. > >>> I suggesting this now as it seems like its time to consider that >>> the vast majority of systems don't need this option for all volumes >>> and the performance and reliability of systems are in question if >>> we don't consider it. >> >> My personal feeling is that this is extremely hasty -- do we have any >> idea how much software relies on atime? Because I certainly don't. > > Hasty no, just opening the idea up for discussion ;-) > >> Sorry for sounding rude (I don't mean to be, I just can't be bothered to >> phrase it differently), but: were you yourself even aware that atime was >> relied upon/used for classic UNIX mailboxes? I get the impression you >> weren't, which just strengthens my point. > > Yes I am aware, which is why I mentioned mail in my original post. > >> For example, I use atime everywhere, simply because I do not know what >> might break/stop working reliably if atime was disabled on some >> filesystems. I do not know the internals of every single daemon and >> program on a system (does anyone?), so I must take the stance of >> choosing stability/reliability. > > I did already mention, we set atime=off on everything and have never had > an issue, there's been similar mentions on the illumos list too. > > Now that doesn't mean its suitable for everthing, mail has already been > mentioned, but thats still seems like a small set of use cases where its > required. > > I guess where I'm coming from is making better for the vast majority. > > I believe there's no point in configuring for a rare case by default > when it will make the much more common case worse. > >> All said and done: I do appreciate having this discussion, particularly >> publicly on a list. Too many "key changes" in FreeBSD in the past few >> years have been results of closed-door meetings of sorts (private mail >> or in-person *con meetings), so the fact this is public is good. > > Everyone has their different uses of any OS, different experience etc, > so things like this need open discussion IMO. > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained in > it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Plainly put, FreeBSD almost -never- changes the defaults. They're steady, reliable, and pretty much set in stone. It's the entire point. If you want something where the defaults can change from day to day, then perhaps FreeBSD is not for you. Personally, I don't mind having these defaults, as long as I can change them. I mean, seriously, it really isn't all that hard to type "zfs set atime=off /some/pool/some/where" -- Chuck Burns From owner-freebsd-fs@FreeBSD.ORG Sat Jun 15 14:54:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3B2C233C for ; Sat, 15 Jun 2013 14:54:54 +0000 (UTC) (envelope-from break19@gmail.com) Received: from mail-yh0-x233.google.com (mail-yh0-x233.google.com [IPv6:2607:f8b0:4002:c01::233]) by mx1.freebsd.org (Postfix) with ESMTP id 03F751AA7 for ; Sat, 15 Jun 2013 14:54:53 +0000 (UTC) Received: by mail-yh0-f51.google.com with SMTP id l109so517498yhq.10 for ; Sat, 15 Jun 2013 07:54:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=9NcFW0mxfLLv+IAQJAEamNkxZ2UTEBK1ZaTwOa81GH0=; b=ON601YwhjctUNQXRU1fIBVqH/dF01aFzWVVjjgfyPLwMgndtJnYacM+JRrU292gBtQ YXkcnTXjaWxENZORu1/MUIbxkkg5zQk/92m6DLIOLWd7AbZ4rD/Pnjevzac0Nokkx3uy xEKNN2872MycEFC1kAfxFTp3Bch9u1/F2M+DQC2Rw3Qz/fa/ntHFYKaYY0c7YD1Fmn54 u42+U2cVSwrPELoxXRfB3V5lE7LflcsVrQr+NLYtX7atKT9D2S7ozcZF+yEjad0LgtfB A0ECGqeIWVIHGBlJoRRH8NMKJJAHYO6AZyQp5Y3kkRoX7W8Vb0oaKDTX24geQB/LIhzp Ey7Q== X-Received: by 10.236.139.75 with SMTP id b51mr4202234yhj.6.1371308093592; Sat, 15 Jun 2013 07:54:53 -0700 (PDT) Received: from [192.168.2.16] (173-17-218-61.client.mchsi.com. [173.17.218.61]) by mx.google.com with ESMTPSA id y24sm10759026yhn.20.2013.06.15.07.54.52 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 15 Jun 2013 07:54:53 -0700 (PDT) Message-ID: <51BC8033.6020605@gmail.com> Date: Sat, 15 Jun 2013 09:54:43 -0500 From: Chuck Burns User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Changing the default for ZFS atime to off? References: <16FEF774EE8E4100AD2CAEC65276A49D@multiplay.co.uk> <2AC5E8F4-3AF1-4EA5-975D-741506AC70A5@my.gd> In-Reply-To: <2AC5E8F4-3AF1-4EA5-975D-741506AC70A5@my.gd> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Jun 2013 14:54:54 -0000 On 6/9/2013 5:39 AM, Damien Fleuriot wrote: > On 8 Jun 2013, at 20:54, "Steven Hartland" wrote: > >> One of the first changes we make here when installing machines >> here to changing atime=off on all ZFS pool roots. >> >> I know there are a few apps which can rely on atime updates >> such as qmail and possibly postfix, but those seem like special >> cases for which admins should enable atime instead of the other >> way round. >> >> This is going to of particular interest for flash based storage >> which should avoid unnessacary writes to reduce wear, but it will >> also help improve performance in general. >> >> So what do people think is it worth considering changing the >> default from atime=on to atime=off moving forward? >> >> If so what about UFS, same change? >> > > I strongly oppose the change for reasons already raised by many people regarding the mbox file. > > Besides, if atime should default to off on 2 filesystems and on on all others, that would definitely create confusion. > > Last, I believe it should be the admin's decision to turn atime off, just like it is his decision to turn compression on. > > Don't mistake me, we turn atime=off on every box, every filesystem, even on Mac's HFS. > Yet I believe defaulting it to off is a mistake. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" +1 here. I, too, usually turn it off, and doing so isn't especially difficult. Changing DEFAULTS is only good when the defaults actually break stuff. -- Chuck Burns