From owner-freebsd-fs@FreeBSD.ORG Sun Sep 11 18:50:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB84B1065675 for ; Sun, 11 Sep 2011 18:50:57 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8DD618FC1D for ; Sun, 11 Sep 2011 18:50:57 +0000 (UTC) Received: by yib19 with SMTP id 19so2410717yib.13 for ; Sun, 11 Sep 2011 11:50:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; bh=JPNM92sW5F2DAEUG2RXqFqj3XwOrqWFWL2zjJtItRmA=; b=ItSMaglvNq1JNNERN+9N0QwwCgq9dkAYGR9jMA7116KvT6MOPAOg/DAZxSX8MurUeJ Ra2TO0nhCLnd/ifUP/tHppv8In/lwa2BNr+jDGpAGRh601TsB2HDKGSNUEvfzFQ7sa0t lNIaJ+keOKradLMdgm7VlJhWFbjR2DUKtgX/4= Received: by 10.236.187.70 with SMTP id x46mr22359263yhm.71.1315767056654; Sun, 11 Sep 2011 11:50:56 -0700 (PDT) Received: from DataIX.net (adsl-99-190-81-85.dsl.klmzmi.sbcglobal.net [99.190.81.85]) by mx.google.com with ESMTPS id o21sm12108770yhi.8.2011.09.11.11.50.53 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Sep 2011 11:50:53 -0700 (PDT) Sender: "J. Hellenthal" Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.5/8.14.5) with ESMTP id p8BIooGx063108 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 11 Sep 2011 14:50:51 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.5/8.14.5/Submit) id p8BIooLh063107; Sun, 11 Sep 2011 14:50:50 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Sun, 11 Sep 2011 14:50:49 -0400 From: Jason Hellenthal To: Jeremy Chadwick Message-ID: <20110911185049.GA62897@DataIX.net> References: <4E688BE4.8040602@karlov.mff.cuni.cz> <20110909034855.GA70001@DataIX.net> <20110909043601.GA49649@icarus.home.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="k+w/mQv8wyuph6w0" Content-Disposition: inline In-Reply-To: <20110909043601.GA49649@icarus.home.lan> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS monitoring X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Sep 2011 18:50:58 -0000 --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 08, 2011 at 09:36:01PM -0700, Jeremy Chadwick wrote: > On Thu, Sep 08, 2011 at 11:48:55PM -0400, Jason Hellenthal wrote: > >=20 > >=20 > > On Thu, Sep 08, 2011 at 11:33:24AM +0200, Tom?? Drbohlav wrote: > > > Hi, > > >=20 > > > Dne 8.9.2011 11:27, Borja Marcos napsal(a): > > > > > > > > Hello, > > > > > > > > Some time ago I wrote a FreeBSD data collector for Orca (www.orcawa= re.com/orca). Seems it wasn't much of a success, but I use it all the time = :) > > > > > > > > I would like to add some ZFS aware monitoring. Any suggestions for= key statistics that would be useful to see in graphic form? Something like= cache hits/misses, etc. > > >=20 > > > we've got great experience with http://cuddletech.com/arc_summary/ (b= tw=20 > > > adapted to Nagios), actually, we use some version with added l2 cache= stats. > > >=20 > > > Drb > > >=20 > >=20 > > You have considerable missed this then: > >=20 > >=20 > > https://jhell.googlecode.com/files/arc_summary.pl > >=20 > > and the current repository version: > >=20 > > http://bit.ly/arc_summary > >=20 > > Which at this time are essentially the same. > >=20 > > sysutils/zfs-stats is essentially a clone of an older version with > > slight modifications. >=20 > I could have a field day fixing this perl script. Would you like me to > contact you off-list discussing all the problems with it? There are > many (at least 6 just from skimming). I am always open for suggestions, patches, and issue reports. They are always appreciated. >=20 > I would worry if this script was used by something like Nagios natively. > Major eek. >=20 > --=20 > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --k+w/mQv8wyuph6w0 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJObQMJAAoJEJBXh4mJ2FR+SYEH/3CGbyFAr26sXTHq2jtH2dYn 273GaAY17CfKqW3frT0zgIiyKDzhzuS1HAQL0E5FLRhVZvn+uKk2Vgyq7qJJPH5N 6kO1G5Lbi38AktgClc3qosK33aMuu7GQg/DpGHPaFymcMwBwyXyfqp3KBrf9cmgu +XRrzi7jaqAHVL6gO9Bx+qVWwPUI3iYcJ+wRa1rYzbjfDUfX7naQfPoitUf2rpZ6 Dgd64YjgHs8R0xWNcuwsBwAR/6HeW9gf5zhZua+G9Ce0hKuAEWP1GFbgJKc2DdbW xFZWoy0Rk3wkTKf5d7vYu23rUK+6BCnf0WtZg6d5mD0NxUPRny49QX4vFeLdI2k= =SLwb -----END PGP SIGNATURE----- --k+w/mQv8wyuph6w0-- From owner-freebsd-fs@FreeBSD.ORG Sun Sep 11 18:55:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B0B1106566C for ; Sun, 11 Sep 2011 18:55:51 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-gx0-f179.google.com (mail-gx0-f179.google.com [209.85.161.179]) by mx1.freebsd.org (Postfix) with ESMTP id D1A2A8FC17 for ; Sun, 11 Sep 2011 18:55:50 +0000 (UTC) Received: by gxk1 with SMTP id 1so3104055gxk.10 for ; Sun, 11 Sep 2011 11:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; bh=S+BNlN7v/Z0F1RPS8ImYj9A1lkqClv4RI2TRQuX2oBk=; b=vxV2+eb+C58jliDSLrZGsqoBV6yZe+loJcTm4h0kDj6yR0jpr2WUpAEx+dX3t1hKAW k7/8K6SK6mk3WVQ7SUKL7J1Lqqfe/sG84qs0RCA5s2w1+3RdQ3gWMniv6bHz7DdOXyml AvBPVAWOcfQgA1h04iF9vxzpGQeP8oHtBAvUs= Received: by 10.101.154.17 with SMTP id g17mr3412578ano.32.1315767350210; Sun, 11 Sep 2011 11:55:50 -0700 (PDT) Received: from DataIX.net (adsl-99-190-81-85.dsl.klmzmi.sbcglobal.net [99.190.81.85]) by mx.google.com with ESMTPS id m4sm11593100ang.4.2011.09.11.11.55.48 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Sep 2011 11:55:48 -0700 (PDT) Sender: "J. Hellenthal" Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.5/8.14.5) with ESMTP id p8BItjjD063566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 11 Sep 2011 14:55:46 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.5/8.14.5/Submit) id p8BIti2u063565; Sun, 11 Sep 2011 14:55:44 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Sun, 11 Sep 2011 14:55:44 -0400 From: Jason Hellenthal To: Jeremy Chadwick Message-ID: <20110911185544.GB62897@DataIX.net> References: <4E688BE4.8040602@karlov.mff.cuni.cz> <20110909034855.GA70001@DataIX.net> <20110909043601.GA49649@icarus.home.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5/uDoXvLw7AC5HRs" Content-Disposition: inline In-Reply-To: <20110909043601.GA49649@icarus.home.lan> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS monitoring X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Sep 2011 18:55:51 -0000 --5/uDoXvLw7AC5HRs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 08, 2011 at 09:36:01PM -0700, Jeremy Chadwick wrote: > On Thu, Sep 08, 2011 at 11:48:55PM -0400, Jason Hellenthal wrote: > >=20 > >=20 > > On Thu, Sep 08, 2011 at 11:33:24AM +0200, Tom?? Drbohlav wrote: > > > Hi, > > >=20 > > > Dne 8.9.2011 11:27, Borja Marcos napsal(a): > > > > > > > > Hello, > > > > > > > > Some time ago I wrote a FreeBSD data collector for Orca (www.orcawa= re.com/orca). Seems it wasn't much of a success, but I use it all the time = :) > > > > > > > > I would like to add some ZFS aware monitoring. Any suggestions for= key statistics that would be useful to see in graphic form? Something like= cache hits/misses, etc. > > >=20 > > > we've got great experience with http://cuddletech.com/arc_summary/ (b= tw=20 > > > adapted to Nagios), actually, we use some version with added l2 cache= stats. > > >=20 > > > Drb > > >=20 > >=20 > > You have considerable missed this then: > >=20 > >=20 > > https://jhell.googlecode.com/files/arc_summary.pl > >=20 > > and the current repository version: > >=20 > > http://bit.ly/arc_summary > >=20 > > Which at this time are essentially the same. > >=20 > > sysutils/zfs-stats is essentially a clone of an older version with > > slight modifications. >=20 > I could have a field day fixing this perl script. Would you like me to > contact you off-list discussing all the problems with it? There are > many (at least 6 just from skimming). >=20 > I would worry if this script was used by something like Nagios natively. > Major eek. >=20 Just for reference and to keep this part of the thread seperate, this script was not designed to work with nagios... cron jobs, terminal use yes. if there is something else being done then i would suspect it is near what I have been working on for a long time now like monitoring snmp & spilling out into XML so the values can be contorted into which ever format you would like along with logging them to a RRD that can store the values for upto 10 years and export graphs. --5/uDoXvLw7AC5HRs Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJObQQwAAoJEJBXh4mJ2FR+GJAH/R2pJyoVpAkgFQSj8+Vp15Is LJVBo8U1yYIhQ9USVQZjRCoZtb1EtYelLsKZqlnfr+nGayPabD0t/HBBiuPoofeM LlG0wkhiilFECyg/EI/Jyv64ux04GEP8IMAFct8tj2GbaJjZTwQt8CHKmjSAG9WH AYc09ba1W/HtwFUNRlwSgNUkUN5hvd7T+TM1N6jwB6X7Ffq9nl9FTKGeotNjb+LU /L/rYNefgLLVMvdgF61Jy6XqQi5vsZVPmuGTHFIlHqVRMBD3aqc8XJ1r4qagbXF4 O8UP9OnuA79lvsaHG8icJmScPfCDg5HSv8288SfTF7Njp2Cufwwz9o9dVfix4ro= =Gazs -----END PGP SIGNATURE----- --5/uDoXvLw7AC5HRs-- From owner-freebsd-fs@FreeBSD.ORG Sun Sep 11 19:03:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E95921065670 for ; Sun, 11 Sep 2011 19:03:16 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9CFF78FC13 for ; Sun, 11 Sep 2011 19:03:16 +0000 (UTC) Received: by yxk36 with SMTP id 36so3310437yxk.13 for ; Sun, 11 Sep 2011 12:03:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; bh=UZe3gpNmBIHTJmoOI+2vrLnEu8ICWqG9iMhgypCAATU=; b=exx6uLdDP0PeRM9ie2Jqo7dz/wYmNfb1IflL5UjF1fPYCFMeYi8leQX8sTm6YBiZnc BVvBh7szIKkGv1mqe+7GlF20+NVkQjxX2ztR0XLNhgi9V54IVsyqhoViy5NSKWU/PR/g c1m88wHXy8IKI5yK8KrZjyVgb9eOZGJW3LVhM= Received: by 10.42.145.129 with SMTP id f1mr1414432icv.64.1315767795551; Sun, 11 Sep 2011 12:03:15 -0700 (PDT) Received: from DataIX.net (adsl-99-190-81-85.dsl.klmzmi.sbcglobal.net [99.190.81.85]) by mx.google.com with ESMTPS id v2sm18889451ibg.2.2011.09.11.12.03.13 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Sep 2011 12:03:14 -0700 (PDT) Sender: "J. Hellenthal" Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.5/8.14.5) with ESMTP id p8BJ39d8064512 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 11 Sep 2011 15:03:10 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.5/8.14.5/Submit) id p8BJ39Kg064511; Sun, 11 Sep 2011 15:03:09 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Sun, 11 Sep 2011 15:03:09 -0400 From: Jason Hellenthal To: Patrick Proniewski Message-ID: <20110911190309.GC62897@DataIX.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xesSdrSSBC0PokLI" Content-Disposition: inline In-Reply-To: Cc: FreeBSD Filesystems Subject: Re: ZFS: zpool history size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Sep 2011 19:03:17 -0000 --xesSdrSSBC0PokLI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 09, 2011 at 10:49:58AM +0200, Patrick Proniewski wrote: > Hello, >=20 > I have a production server hosting about 250 web sites. I'm using ZFS, so= that I can create a dedicated FS whenever a user wants to create a web sit= e. > I've started 6 mouths ago to dump `zpool history` into a file, daily, and= to commit the output to a remote subversion server, so that if I need I co= uld recreate my storage pool with all its properties. > I've noticed that in fact the `zpool history` is limited. I've many autom= ated daily/weekly/monthly snapshots creations and that "noise" is filling t= he history log and overwrites my precious history. >=20 > Today, my history log is only 13 valuable lines long: >=20 > # zpool history | grep -v @ > History for 'tank': > 2011-02-22.14:17:10 zpool create tank da1 > 2011-06-30.16:46:38 zfs create tank/Sites/m > 2011-06-30.16:46:38 zfs set refquota=3D500M tank/Sites/m > 2011-07-11.10:46:16 zfs create tank/Sites/sites/egales-uk > 2011-07-11.10:46:17 zfs set refquota=3D500M tank/Sites/sites/egales-uk > 2011-07-11.10:46:41 zfs create tank/Sites/sites/egales-es > 2011-07-11.10:46:42 zfs set refquota=3D500M tank/Sites/sites/egales-es > 2011-07-11.10:46:47 zfs create tank/Sites/sites/egales-ro > 2011-07-11.10:46:47 zfs set refquota=3D500M tank/Sites/sites/egales-ro > 2011-07-11.10:46:51 zfs create tank/Sites/sites/egales-se > 2011-07-11.10:46:52 zfs set refquota=3D500M tank/Sites/sites/egales-se > 2011-07-12.10:56:14 zfs set mountpoint=3D/Sites/a_supprimer/perso-truchau= d-2011071210 tank/user/truchaud > 2011-07-12.11:04:37 zfs destroy -r tank/user/truchaud >=20 > Every interesting events between 2011-02-22 and 2011-06-30 is gone, that'= s more than 700 lines of ZFS command. Thanks to svn, I lose nothing. >=20 > My questions are:=20 > - what is the zpool history size limit?=20 > - and is it possible to increase its value? >=20 I do not have this answer off the top of my head because as explained to me once or I read it in some internal documentation. The pool history was supposed to be static to the pool until it was destroyed... Ill look into this though. As for a temporary workaround I am almost pretty sure you might have already done this is run a `zpool history` per day and log that to something like /var/log/zpool.history with a dated header and footer and just continue to append to that file while you chflags sappnd,sunlnk $FILE. --xesSdrSSBC0PokLI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJObQXsAAoJEJBXh4mJ2FR+Xg8H/jdPruQHwU+mlHuLz5/p301c VKApfB1u+u1tepyh8QMhnxtYMmo6QGjKjRPyQ/ukHJAayBtkc6SExdixzSIKUzTB yfxAu+LgXhTDWMymEtbxpTvzBDJh+zM10Q1vEztgbiGfRZLvGLAg1B0R0W/WE4CD OXqszlGIBSvNzfYWXplY3y94PexR2a/0GHaizRrbulBxKfb7ZiwCm2FAgxDdQZLx E25YSKjuvmCzpXX37llFyNX+SMCcbVxsAV6Q1Do20FvY+4Udm67BZtajluomy1qA Gt9O5IgC8CQyzHd5SrkEfHfOUhGDR4HKXwAXj84gXF8IompVceaWiOjELUY+qzg= =Fbmi -----END PGP SIGNATURE----- --xesSdrSSBC0PokLI-- From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 00:56:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29446106564A for ; Mon, 12 Sep 2011 00:56:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id DA1048FC12 for ; Mon, 12 Sep 2011 00:56:27 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAIVYbU6DaFvO/2dsb2JhbABBFoQ/pDWBUgEBAQEDAQEBIAQiJQsbGAICDRkCKTAGE4d7pHGQKYEshDGBEQSTO5E6 X-IronPort-AV: E=Sophos;i="4.68,365,1312171200"; d="scan'208";a="134054468" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 11 Sep 2011 20:56:27 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0F46FB3F2F; Sun, 11 Sep 2011 20:56:27 -0400 (EDT) Date: Sun, 11 Sep 2011 20:56:27 -0400 (EDT) From: Rick Macklem To: Thomas Haynes Message-ID: <1219055906.1150611.1315788987030.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Trond Myklebust , nfsv4@ietf.org Subject: Re: [nfsv4] NFSv4.1 FreeBSD client for testing X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 00:56:28 -0000 Thomas Haynes wrote: > On Sep 8, 2011, at 8:39 PM, Trond Myklebust wrote: > > > On Thu, 2011-09-08 at 17:45 -0400, Rick Macklem wrote: > >> First off, I hope no one minds the crosspost, but I thought > >> some of the FreeBSD crowd might be interested... > >> > >> I've put a patch against FreeBSD9.0 (currently Beta2, but I don't > >> think the NFS client code will change before the final release, so > >> hopefully the patch will continue to apply) up in the hopes that > >> server vendors (and anyone else, of course) will use it for > >> testing. > >> The main difference between what I tested during the June Bakeathon > >> and this patch is support for the back channel. It does not yet > >> have > >> any support for pNFS. I'll need to figure out a way to set up a DS > >> for > >> testing before I can work on that. > >> > >> If you are interested, the patch is at: > >> http://people.freebsd.org/~rmacklem/nfsv4.1-client > >> > >> Hopefully the Readme covers the basic setup. > >> > >> Good luck with it, if you try it, rick > > > > Cool... Are you going to be attending the Bakeathon next month? If > > not, > > can you supply a VM or something like that so that we can try to do > > some > > FreeBSD client testing against the various server implementations > > out > > there? > > > > > I can run one, I got pretty good at it for a while for some testing. > :-> > Thanks for volunteering Tom ;-) I'll admit I might not even be able to read email daily during the Bakeathon, since I'll be "on the road", but I will try to do so, when Tom finds issues. rick > > > > Cheers > > Trond > > -- > > Trond Myklebust > > Linux NFS client maintainer > > > > NetApp > > Trond.Myklebust@netapp.com > > www.netapp.com > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 01:32:11 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62D341065689; Mon, 12 Sep 2011 01:32:11 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 39A728FC08; Mon, 12 Sep 2011 01:32:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p8C1WB26047961; Mon, 12 Sep 2011 01:32:11 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p8C1WBCZ047957; Mon, 12 Sep 2011 01:32:11 GMT (envelope-from linimon) Date: Mon, 12 Sep 2011 01:32:11 GMT Message-Id: <201109120132.p8C1WBCZ047957@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/160591: [zfs] Fail to boot on zfs root with degraded raidz2 [regression] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 01:32:11 -0000 Old Synopsis: Boot on zfs root with degraded raidz2 New Synopsis: [zfs] Fail to boot on zfs root with degraded raidz2 [regression] Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon Sep 12 01:31:33 UTC 2011 Responsible-Changed-Why: reclassify and assign. http://www.freebsd.org/cgi/query-pr.cgi?pr=160591 From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 07:00:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5B5C106566B for ; Mon, 12 Sep 2011 07:00:38 +0000 (UTC) (envelope-from patpro@patpro.net) Received: from rack.patpro.net (rack.patpro.net [193.30.227.216]) by mx1.freebsd.org (Postfix) with ESMTP id 284738FC0C for ; Mon, 12 Sep 2011 07:00:37 +0000 (UTC) Received: from rack.patpro.net (localhost [127.0.0.1]) by rack.patpro.net (Postfix) with ESMTP id B46461CC020; Mon, 12 Sep 2011 09:00:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at patpro.net Received: from amavis-at-patpro.net ([127.0.0.1]) by rack.patpro.net (rack.patpro.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oGxymMa8Tmvd; Mon, 12 Sep 2011 09:00:34 +0200 (CEST) Received: from [127.0.0.1] (localhost [127.0.0.1]) by rack.patpro.net (Postfix) with ESMTP; Mon, 12 Sep 2011 09:00:34 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/signed; boundary=Apple-Mail-33-348701443; protocol="application/pkcs7-signature"; micalg=sha1 From: Patrick Proniewski In-Reply-To: <20110911190309.GC62897@DataIX.net> Date: Mon, 12 Sep 2011 09:00:33 +0200 Message-Id: <2978932D-9F31-4D11-A168-158FBC4E59C2@patpro.net> References: <20110911190309.GC62897@DataIX.net> To: Jason Hellenthal X-Mailer: Apple Mail (2.1084) X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: FreeBSD Filesystems Subject: Re: ZFS: zpool history size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 07:00:38 -0000 --Apple-Mail-33-348701443 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 11 sept. 2011, at 21:03, Jason Hellenthal wrote: >> Every interesting events between 2011-02-22 and 2011-06-30 is gone, = that's more than 700 lines of ZFS command. Thanks to svn, I lose = nothing. >>=20 >> My questions are:=20 >> - what is the zpool history size limit?=20 >> - and is it possible to increase its value? >=20 > As for a temporary workaround I am almost pretty sure you might have > already done this is run a `zpool history` per day and log that to > something like /var/log/zpool.history with a dated header and footer = and > just continue to append to that file while you chflags sappnd,sunlnk = $FILE. Well, I dump history every day already, and inject the result into a = subversion repository. I could use a flat file, but subversion allows me = to retrieve diff or complete version very easily. Eventually, the problem is the same, if I need to reconstruct my FS from = history, I'll have to merge every version and make sure every command is = issued only once and in the correct order (no sort -u for me, then). That's not a big issue, and I guess I can design a shell script using = "svn diff" to create a proper history file, but that's annoying. regards, Patrick --Apple-Mail-33-348701443-- From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 07:37:00 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB474106564A; Mon, 12 Sep 2011 07:37:00 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 49E2D8FC14; Mon, 12 Sep 2011 07:36:58 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA04123; Mon, 12 Sep 2011 10:36:57 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1R314a-000Kla-NY; Mon, 12 Sep 2011 10:36:56 +0300 Message-ID: <4E6DB696.1080608@FreeBSD.org> Date: Mon, 12 Sep 2011 10:36:54 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110907 Thunderbird/6.0.2 MIME-Version: 1.0 To: FreeBSD-Current , freebsd-fs@FreeBSD.org X-Enigmail-Version: undefined Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Subject: archaic/useless CFLAGS options for x86 boot blocks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 07:37:01 -0000 This email is in part inspired by the following problem: http://article.gmane.org/gmane.os.freebsd.current/135292 So "harmful" could also be added to the subject line. So here is my proposal. Part I. ZFS and GPT bootblocks. I believe that we do not need here any extra optimizations and happy dances that seem to be carried over from boot2. I think that just the -Os should be sufficient of optimization flags. Maybe even that is not really required. Rationale: - these boot blocks are not as nearly space-constrained as boot2 - using untypical flags increases chances of hitting compiler bugs, especially for those compilers where we are stuck with unsupported / locally-maintained versions or where a compiler is maturing yet - assembly / machine code and debugging may become easier Additionally, the '/align/d' '/nop/d' filtering of the intermediate assembly file seems to be not needed for zfsboot. Part II. The original boot2. My testing shows that -Os -fomit-frame-pointer are sufficient to produce a small enough boot2 image (~300 bytes remain available with gcc, 51 bytes for clang). -mrtd -mregparm=3 do not change size with gcc, but with clang they increase _available_ space to 79 bytes. The '/align/d' '/nop/d' filtering seems to shave off only 7 bytes here. Not suggesting anything, just an observation... Part III. History. It seems that all those optimization related options were introduced very long time ago when the compiler(s) were quite different from what they are now. So, some re-evaluation may be (long over)due. For example, -fno-unit-at-a-time is definitely an anti-optmization option and it was introduced to fight some gcc bugs way back in 2004 (r132870). Its merits have never been re-evaluated after switch to gcc 4.2, it seems. -fno-guess-branch-probability and -mno-align-long-strings are even less obvious options (see e.g. r108149). Finally, here is a diff: http://people.freebsd.org/~avg/boot-cflags.diff All the boot blocks are boot tested in qemu. boot2 is also tested with -mrtd -mregparm removed. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 08:40:27 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C9EB106566C; Mon, 12 Sep 2011 08:40:27 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id B4F088FC12; Mon, 12 Sep 2011 08:40:26 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA05009; Mon, 12 Sep 2011 11:40:24 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1R3240-000KoS-Fg; Mon, 12 Sep 2011 11:40:24 +0300 Message-ID: <4E6DC577.9050007@FreeBSD.org> Date: Mon, 12 Sep 2011 11:40:23 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110907 Thunderbird/6.0.2 MIME-Version: 1.0 To: FreeBSD-Current , freebsd-fs@FreeBSD.org References: <4E6DB696.1080608@FreeBSD.org> In-Reply-To: <4E6DB696.1080608@FreeBSD.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=x-viet-vps Content-Transfer-Encoding: 7bit Cc: Subject: Re: archaic/useless CFLAGS options for x86 boot blocks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 08:40:27 -0000 on 12/09/2011 10:36 Andriy Gapon said the following: > > This email is in part inspired by the following problem: > http://article.gmane.org/gmane.os.freebsd.current/135292 > So "harmful" could also be added to the subject line. > > So here is my proposal. I would like to clarify that my intention was to solicit opinions, explanations, discussions, alternative proposals and *testing*. > Part I. ZFS and GPT bootblocks. > > I believe that we do not need here any extra optimizations and happy dances that > seem to be carried over from boot2. > I think that just the -Os should be sufficient of optimization flags. Maybe > even that is not really required. > Rationale: > - these boot blocks are not as nearly space-constrained as boot2 > - using untypical flags increases chances of hitting compiler bugs, > especially for those compilers where we are stuck with > unsupported / locally-maintained versions or where a compiler is maturing yet > - assembly / machine code and debugging may become easier > > Additionally, the '/align/d' '/nop/d' filtering of the intermediate assembly > file seems to be not needed for zfsboot. > > Part II. The original boot2. > > My testing shows that -Os -fomit-frame-pointer are sufficient to produce a small > enough boot2 image (~300 bytes remain available with gcc, 51 bytes for clang). > -mrtd -mregparm=3 do not change size with gcc, but with clang they increase > _available_ space to 79 bytes. > > The '/align/d' '/nop/d' filtering seems to shave off only 7 bytes here. > Not suggesting anything, just an observation... > > Part III. History. > > It seems that all those optimization related options were introduced very long > time ago when the compiler(s) were quite different from what they are now. > So, some re-evaluation may be (long over)due. > For example, -fno-unit-at-a-time is definitely an anti-optmization option and it > was introduced to fight some gcc bugs way back in 2004 (r132870). Its merits > have never been re-evaluated after switch to gcc 4.2, it seems. > -fno-guess-branch-probability and -mno-align-long-strings are even less obvious > options (see e.g. r108149). > > > Finally, here is a diff: > http://people.freebsd.org/~avg/boot-cflags.diff > All the boot blocks are boot tested in qemu. > boot2 is also tested with -mrtd -mregparm removed. > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 11:07:02 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 147FA106564A for ; Mon, 12 Sep 2011 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0245E8FC08 for ; Mon, 12 Sep 2011 11:07:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p8CB71XL005426 for ; Mon, 12 Sep 2011 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p8CB71ta005424 for freebsd-fs@FreeBSD.org; Mon, 12 Sep 2011 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 12 Sep 2011 11:07:01 GMT Message-Id: <201109121107.p8CB71ta005424@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 11:07:02 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159971 fs [ffs] [panic] panic with soft updates journaling durin o kern/159930 fs [ufs] [panic] kernel core o kern/159418 fs [tmpfs] [panic] tmpfs kernel panic: recursing on non r o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159233 fs [ext2fs] [patch] fs/ext2fs: finish reallocblk implemen o kern/159232 fs [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs [amd] amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs f kern/130133 fs [panic] [zfs] 'kmem_map too small' caused by make clea o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi f kern/126703 fs [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/120210 fs [zfs] [panic] reboot after panic: solaris assert: arc_ o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 247 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 11:08:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A3A5106564A; Mon, 12 Sep 2011 11:08:21 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 33A0C8FC21; Mon, 12 Sep 2011 11:08:21 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id C63A746B17; Mon, 12 Sep 2011 07:08:20 -0400 (EDT) Date: Mon, 12 Sep 2011 12:08:20 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Attilio Rao In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD FS , freebsd-current@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org Subject: Call to arms: MPSAFE file systems (was: Re: Removal of Giant from the VFS layer for 10.0) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 11:08:21 -0000 On Sat, 27 Aug 2011, Attilio Rao wrote: > With the aid of kib and rwatson I made a roughly outlined plan about what is > left to do in order to have all the filesystems locked (or eventually > dropped) before 10.0) and is summarized here: > http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS Here's a more succinct summary of the key points from the wiki: FreeBSD has supported Giant lock-free file systems for years, and almost all file systems have been shipping "MPSAFE" for several years. However, VFS retains compatibility support for non-MPSAFE file systems. We want to remove that compatibility support, as it adds non-trivial complexity to an already quite complex VFS, simplifying the code and making it easier to maintain and enhance. This means either fixing or removing any file systems that can't operate without compatibility support. Attilio has posted a schedule for the removal of compatibility crutches, which in turn means removing any un-updated file systems. We are looking for volunteers to perform those updates. Here's the schedule: 27 August 2011 Attilio posts plan on arch@ 1 October 2011 Add VFS_GIANT_COMPATIBILITY option (enabled) 1 March 2012 Disable VFS_GIANT_COMPATIBILITY option by default 1 September 2012 Disconnect non-MPSAFE file systems from build 1 March 2013 Garbage collect any un-updated file systems Most of our critical file systems are already done: UFS, ZFS, the NFS client and server (both old and new), unionfs, pseudofs, tmpfs, nullfs, devfs, cd9660, ext2fs, fdescfs, msdosfs, udf, and procfs. However, some remain, and they require owners: File system Owner State coda rwatson Non-MPSAFE hpfs ??? Non-MPSAFE ntfs attilio Non-MPSAFE nwfs ??? Non-MPSAFE portalfs ??? Non-MPSAFE smbfs ??? Non-MPSAFE reiserfs ??? Non-MPSAFE xfs ??? Non-MPSAFE Any file system that remains on this list will be removed by 10.0 -- so, if you care about one of the above file systems, please help us get them updated. You can find more information here, including on the methodology for making a file system MPSAFE, with worked examples: http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS Robert From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 11:37:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A41CA106566C for ; Mon, 12 Sep 2011 11:37:27 +0000 (UTC) (envelope-from szoftos@freemail.hu) Received: from fmx05.freemail.hu (fmx05.freemail.hu [195.228.245.55]) by mx1.freebsd.org (Postfix) with SMTP id 1AB2F8FC16 for ; Mon, 12 Sep 2011 11:37:26 +0000 (UTC) Received: (qmail 52239 invoked from network); 12 Sep 2011 13:10:45 +0200 Received: from 195.228.245.211 (HELO localhost) (91.82.87.114) by fmx05.freemail.hu with SMTP; 12 Sep 2011 13:10:45 +0200 Date: Mon, 12 Sep 2011 13:10:45 +0200 (CEST) From: Laszlo KAROLYI To: freebsd-fs@freebsd.org Message-ID: X-Originating-IP: [91.82.87.114] X-HTTP-User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 Firefox/5.0 MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=ISO-8859-2 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 11:37:27 -0000 Hello, Recently I installed a FreeBSD with the newest 8.2-STABLE and zfs version. I use lighttpd2 and zfs on it, and munin to monitor the outgoing bandwidth. Zpool version 28, zfs version 5, with the latest kernel. We have a big mp3 archive (half hour, 256kbit/s mp3-s) which lighty serves. This means full random IO. When I see that the server serves 15mbit/sec, i see constantly 8-10Mbyte/sec reads on the zfs raidz1 array, which is too much. As I could see from truss logs, lighty uses sendfile and writev.My settings: vfs.zfs.l2c_only_size: 15106737664 vfs.zfs.mfu_ghost_data_lsize: 661388288 vfs.zfs.mfu_ghost_metadata_lsize: 345885696 vfs.zfs.mfu_ghost_size: 1007273984 vfs.zfs.mfu_data_lsize: 1440963584 vfs.zfs.mfu_metadata_lsize: 24143872 vfs.zfs.mfu_size: 1523631104 vfs.zfs.mru_ghost_data_lsize: 5427200 vfs.zfs.mru_ghost_metadata_lsize: 522937344 vfs.zfs.mru_ghost_size: 528364544 vfs.zfs.mru_data_lsize: 1384169984 vfs.zfs.mru_metadata_lsize: 200904704 vfs.zfs.mru_size: 1728416256 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 1736192 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 0 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.arc_meta_limit: 1775121408 vfs.zfs.arc_meta_used: 895904760 vfs.zfs.arc_min: 887560704 vfs.zfs.arc_max: 7100485632 vfs.zfs.dedup.prefetch: 1 vfs.zfs.mdcomp_disable: 0 vfs.zfs.write_limit_override: 0 vfs.zfs.write_limit_inflated: 25327300608 vfs.zfs.write_limit_max: 1055304192 vfs.zfs.write_limit_min: 33554432 vfs.zfs.write_limit_shift: 3 vfs.zfs.no_write_throttle: 0 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 0 vfs.zfs.mg_alloc_failures: 8 vfs.zfs.check_hostid: 1 vfs.zfs.recover: 0 vfs.zfs.txg.synctime_ms: 1000 vfs.zfs.txg.timeout: 5 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 0 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 8 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 28 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 Any suggestions? Thanks,Laszlo From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 11:43:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8C95106564A; Mon, 12 Sep 2011 11:43:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C07D58FC1F; Mon, 12 Sep 2011 11:43:49 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 7727246B0D; Mon, 12 Sep 2011 07:43:49 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0ABE38A02E; Mon, 12 Sep 2011 07:43:49 -0400 (EDT) From: John Baldwin To: Andriy Gapon Date: Mon, 12 Sep 2011 07:43:01 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E6DB696.1080608@FreeBSD.org> In-Reply-To: <4E6DB696.1080608@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201109120743.02181.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 12 Sep 2011 07:43:49 -0400 (EDT) Cc: freebsd-fs@freebsd.org, FreeBSD-Current Subject: Re: archaic/useless CFLAGS options for x86 boot blocks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 11:43:50 -0000 On Monday, September 12, 2011 3:36:54 am Andriy Gapon wrote: > > This email is in part inspired by the following problem: > http://article.gmane.org/gmane.os.freebsd.current/135292 > So "harmful" could also be added to the subject line. > > So here is my proposal. > > Part I. ZFS and GPT bootblocks. > > I believe that we do not need here any extra optimizations and happy dances that > seem to be carried over from boot2. > I think that just the -Os should be sufficient of optimization flags. Maybe > even that is not really required. > Rationale: > - these boot blocks are not as nearly space-constrained as boot2 > - using untypical flags increases chances of hitting compiler bugs, > especially for those compilers where we are stuck with > unsupported / locally-maintained versions or where a compiler is maturing yet > - assembly / machine code and debugging may become easier > > Additionally, the '/align/d' '/nop/d' filtering of the intermediate assembly > file seems to be not needed for zfsboot. > > Part II. The original boot2. > > My testing shows that -Os -fomit-frame-pointer are sufficient to produce a small > enough boot2 image (~300 bytes remain available with gcc, 51 bytes for clang). > -mrtd -mregparm=3 do not change size with gcc, but with clang they increase > _available_ space to 79 bytes. > > The '/align/d' '/nop/d' filtering seems to shave off only 7 bytes here. > Not suggesting anything, just an observation... > > Part III. History. > > It seems that all those optimization related options were introduced very long > time ago when the compiler(s) were quite different from what they are now. > So, some re-evaluation may be (long over)due. > For example, -fno-unit-at-a-time is definitely an anti-optmization option and it > was introduced to fight some gcc bugs way back in 2004 (r132870). Its merits > have never been re-evaluated after switch to gcc 4.2, it seems. > -fno-guess-branch-probability and -mno-align-long-strings are even less obvious > options (see e.g. r108149). > > > Finally, here is a diff: > http://people.freebsd.org/~avg/boot-cflags.diff > All the boot blocks are boot tested in qemu. > boot2 is also tested with -mrtd -mregparm removed. I suspect some of the recent changes to shave space down for Clang have made some of the optimization options no longer necessary. I think the patch is fine, and I'd even prefer to go ahead and drop the extra cruft (like removing nops and aligns as well as -mrtd and -mregparm) from the UFS boot2 as well. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 12:02:35 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13BC910656B4; Mon, 12 Sep 2011 12:02:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E46218FC14; Mon, 12 Sep 2011 12:02:33 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA09545; Mon, 12 Sep 2011 15:02:31 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E6DF4D6.8050501@FreeBSD.org> Date: Mon, 12 Sep 2011 15:02:30 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: John Baldwin References: <4E6DB696.1080608@FreeBSD.org> <201109120743.02181.jhb@freebsd.org> In-Reply-To: <201109120743.02181.jhb@freebsd.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, FreeBSD-Current Subject: Re: archaic/useless CFLAGS options for x86 boot blocks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 12:02:36 -0000 on 12/09/2011 14:43 John Baldwin said the following: > I suspect some of the recent changes to shave space down for Clang have made > some of the optimization options no longer necessary. Just a note of all the options in question were added long before clang. > I think the patch is > fine, and I'd even prefer to go ahead and drop the extra cruft (like removing > nops and aligns as well as -mrtd and -mregparm) from the UFS boot2 as well. I personally agree, thank you for this suggestion. My current plan is to leave boot2 alone until stable/9 is branched, but to try to get zfs/gpt boot changes into 9.0. What do you think? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 12:26:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F7F6106564A; Mon, 12 Sep 2011 12:26:38 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 342258FC19; Mon, 12 Sep 2011 12:26:38 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id CD55446B06; Mon, 12 Sep 2011 08:26:37 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4B3F48A02E; Mon, 12 Sep 2011 08:26:37 -0400 (EDT) From: John Baldwin To: Andriy Gapon Date: Mon, 12 Sep 2011 08:26:36 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E6DB696.1080608@FreeBSD.org> <201109120743.02181.jhb@freebsd.org> <4E6DF4D6.8050501@FreeBSD.org> In-Reply-To: <4E6DF4D6.8050501@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201109120826.36804.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 12 Sep 2011 08:26:37 -0400 (EDT) Cc: freebsd-fs@freebsd.org, FreeBSD-Current Subject: Re: archaic/useless CFLAGS options for x86 boot blocks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 12:26:38 -0000 On Monday, September 12, 2011 8:02:30 am Andriy Gapon wrote: > > I think the patch is > > fine, and I'd even prefer to go ahead and drop the extra cruft (like removing > > nops and aligns as well as -mrtd and -mregparm) from the UFS boot2 as well. > > I personally agree, thank you for this suggestion. > My current plan is to leave boot2 alone until stable/9 is branched, but to try to > get zfs/gpt boot changes into 9.0. What do you think? I think this is fine. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 12:33:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6C06106564A for ; Mon, 12 Sep 2011 12:33:37 +0000 (UTC) (envelope-from prvs=1236a5984e=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3FB8B8FC0A for ; Mon, 12 Sep 2011 12:33:36 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 12 Sep 2011 13:23:00 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 12 Sep 2011 13:22:59 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014929202.msg for ; Mon, 12 Sep 2011 13:22:58 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1236a5984e=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Laszlo KAROLYI" , References: Date: Mon, 12 Sep 2011 13:23:23 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="ISO-8859-2"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 12:33:37 -0000 sendfile doesn't work as you might expect on zfs, its not zero copy due to the use of none buffer pool memory. You do get a benefit but it requires double the amount of memory to get it so we've disabled sendfile under nginx for zfs based hosts for this very reason. Regards Steve ----- Original Message ----- From: "Laszlo KAROLYI" To: Sent: Monday, September 12, 2011 12:10 PM Subject: ZFS-lighttpd2-sendfile, too high IO > Hello, Recently I installed a FreeBSD with the newest 8.2-STABLE and zfs version. I use lighttpd2 and zfs on it, and munin to > monitor the outgoing bandwidth. Zpool version 28, zfs version 5, with the latest kernel. We have a big mp3 archive (half hour, > 256kbit/s mp3-s) which lighty serves. This means full random IO. When I see that the server serves 15mbit/sec, i see constantly > 8-10Mbyte/sec reads on the zfs raidz1 array, which is too much. As I could see from truss logs, lighty uses sendfile and > writev.My settings: vfs.zfs.l2c_only_size: 15106737664 ... ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 12:34:45 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84169106564A; Mon, 12 Sep 2011 12:34:45 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 727ED8FC1C; Mon, 12 Sep 2011 12:34:44 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA10102; Mon, 12 Sep 2011 15:34:42 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E6DFC62.9080405@FreeBSD.org> Date: Mon, 12 Sep 2011 15:34:42 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: John Baldwin References: <4E6DB696.1080608@FreeBSD.org> <201109120743.02181.jhb@freebsd.org> <4E6DF4D6.8050501@FreeBSD.org> <201109120826.36804.jhb@freebsd.org> In-Reply-To: <201109120826.36804.jhb@freebsd.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, FreeBSD-Current Subject: Re: archaic/useless CFLAGS options for x86 boot blocks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 12:34:45 -0000 on 12/09/2011 15:26 John Baldwin said the following: > On Monday, September 12, 2011 8:02:30 am Andriy Gapon wrote: >>> I think the patch is >>> fine, and I'd even prefer to go ahead and drop the extra cruft (like removing >>> nops and aligns as well as -mrtd and -mregparm) from the UFS boot2 as well. >> >> I personally agree, thank you for this suggestion. >> My current plan is to leave boot2 alone until stable/9 is branched, but to try to >> get zfs/gpt boot changes into 9.0. What do you think? > > I think this is fine. > Another thing that was suggested to me via private communication is to change -Os to -O1 for those "non-constrained" boot blocks. I do not see anything wrong with that. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 18:38:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8B94106566C for ; Mon, 12 Sep 2011 18:38:52 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-vw0-f44.google.com (mail-vw0-f44.google.com [209.85.212.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9CC108FC08 for ; Mon, 12 Sep 2011 18:38:52 +0000 (UTC) Received: by vws12 with SMTP id 12so3883430vws.17 for ; Mon, 12 Sep 2011 11:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=FVjdH7lC12Q3Dq7MDwoxG1yJQhVdETO8/+D6Xn6I7Yk=; b=Niemeo5nQUiSGU5Hr+zaSmlqkzo8497osmwQbOVB8EbS3CCb1NjYQUKztTazckrGEJ efNqeQzfzNU3RQcfY/6L+YjvQ/HmMH5fLVjLlzdo0QAEIBmnejoRRfLYkSzg9gYa6W7S lofBXmNPmcq3ppNorcW22CvP9H6eUBiPEKKS8= MIME-Version: 1.0 Received: by 10.52.90.229 with SMTP id bz5mr2920823vdb.77.1315851135169; Mon, 12 Sep 2011 11:12:15 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.52.113.225 with HTTP; Mon, 12 Sep 2011 11:12:15 -0700 (PDT) In-Reply-To: References: Date: Mon, 12 Sep 2011 20:12:15 +0200 X-Google-Sender-Auth: 9D2wAP5hf6OUB8L9LbYZ8slQYwU Message-ID: From: "K. Macy" To: Steven Hartland Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 18:38:53 -0000 2011/9/12 Steven Hartland : > sendfile doesn't work as you might expect on zfs, its not zero copy due t= o > the use of none buffer pool memory. > > You do get a benefit but it requires double the amount of memory to get i= t > so we've disabled sendfile > under nginx for zfs based hosts for this very reason. In my performance testing, sending from mmaped I/O on ZFS was dramatically faster (> 2x) than sendfile. Cheers > =A0 Regards > =A0 Steve > ----- Original Message ----- From: "Laszlo KAROLYI" > To: > Sent: Monday, September 12, 2011 12:10 PM > Subject: ZFS-lighttpd2-sendfile, too high IO > > >> Hello, Recently I installed a FreeBSD with the newest 8.2-STABLE and zfs >> version. I use lighttpd2 and zfs on it, and munin to monitor the outgoin= g >> bandwidth. Zpool version 28, zfs version 5, with the latest kernel. We h= ave >> a big mp3 archive (half hour, 256kbit/s mp3-s) which lighty serves. This >> means full random IO. When I see that the server serves 15mbit/sec, i se= e >> constantly 8-10Mbyte/sec reads on the zfs raidz1 array, which is too muc= h. >> As I could see from truss logs, lighty uses sendfile and writev.My setti= ngs: >> vfs.zfs.l2c_only_size: 15106737664 > > ... > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. and t= he > person or entity to whom it is addressed. In the event of misdirection, t= he > recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 18:58:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1DEE106574B for ; Mon, 12 Sep 2011 18:58:25 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 75B888FC16 for ; Mon, 12 Sep 2011 18:58:24 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p8CIwNEa029634; Mon, 12 Sep 2011 13:58:23 -0500 (CDT) Date: Mon, 12 Sep 2011 13:58:23 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "K. Macy" In-Reply-To: Message-ID: References: User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 12 Sep 2011 13:58:23 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 18:58:25 -0000 On Mon, 12 Sep 2011, K. Macy wrote: > 2011/9/12 Steven Hartland : >> sendfile doesn't work as you might expect on zfs, its not zero copy due to >> the use of none buffer pool memory. >> >> You do get a benefit but it requires double the amount of memory to get it >> so we've disabled sendfile >> under nginx for zfs based hosts for this very reason. > > In my performance testing, sending from mmaped I/O on ZFS was > dramatically faster (> 2x) than sendfile. The problem is that since the zfs ARC and the backing memory used for the mapping are not the same, twice as much memory is used. This would not be good for streaming if the machine is short on RAM. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 19:09:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0EDD1106566B for ; Mon, 12 Sep 2011 19:09:11 +0000 (UTC) (envelope-from szoftos@freemail.hu) Received: from fmx21.freemail.hu (fmx21.freemail.hu [195.228.245.71]) by mx1.freebsd.org (Postfix) with SMTP id 4F69E8FC17 for ; Mon, 12 Sep 2011 19:09:09 +0000 (UTC) Received: (qmail 7957 invoked from network); 12 Sep 2011 21:09:08 +0200 Received: from 195.228.245.211 (HELO localhost) (82.131.146.152) by fmx21.freemail.hu with SMTP; 12 Sep 2011 21:09:08 +0200 Date: Mon, 12 Sep 2011 21:09:08 +0200 (CEST) From: Laszlo KAROLYI To: freebsd-fs@freebsd.org In-Reply-To: Message-ID: X-Originating-IP: [82.131.146.152] X-HTTP-User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=ISO-8859-2 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 19:09:11 -0000 But does this explain the 4-5mbyte/s reads when having a 15mbit/s network l= oad? Bob Friesenhahn =EDrta: >On Mon, 12 Sep 2011, K. Macy wrote: > >> 2011/9/12 Steven Hartland : >>> sendfile doesn't work as you might expect on zfs, its not zero copy= due to >>> the use of none buffer pool memory. >>> >>> You do get a benefit but it requires double the amount of memory to get= it >>> so we've disabled sendfile >>> under nginx for zfs based hosts for this very reason. >> >> In my performance testing, sending from mmaped I/O on ZFS was >> dramatically faster (> 2x) than sendfile. > >The problem is that since the zfs ARC and the backing memory used for=20 >the mapping are not the same, twice as much memory is used. This=20 >would not be good for streaming if the machine is short on RAM. > >Bob >--=20 >Bob Friesenhahn >bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ >GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >_______________________________________________ >freebsd-fs@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-fs >To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 19:32:35 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87C031065670 for ; Mon, 12 Sep 2011 19:32:35 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 4F4D18FC08 for ; Mon, 12 Sep 2011 19:32:35 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p8CJWYYp029835; Mon, 12 Sep 2011 14:32:34 -0500 (CDT) Date: Mon, 12 Sep 2011 14:32:34 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Laszlo KAROLYI In-Reply-To: Message-ID: References: User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 12 Sep 2011 14:32:34 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 19:32:35 -0000 On Mon, 12 Sep 2011, Laszlo KAROLYI wrote: > But does this explain the 4-5mbyte/s reads when having a 15mbit/s network load? There are only two viable explanations: o Insuffient caching due to insufficient resources o Data is not being cached at all Zfs reads whole 128K blocks (or whatever the filesystem blocksize is) at a time. It does not read partial blocks from underlying storage. This makes it very expensive to perform many small read accesses if the reads are not subsequently cached in the ARC. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Sep 12 19:45:46 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7A44106564A; Mon, 12 Sep 2011 19:45:46 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D31D78FC12; Mon, 12 Sep 2011 19:45:45 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA15615; Mon, 12 Sep 2011 22:45:44 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1R3CRr-000LIW-OK; Mon, 12 Sep 2011 22:45:43 +0300 Message-ID: <4E6E6166.4090408@FreeBSD.org> Date: Mon, 12 Sep 2011 22:45:42 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110907 Thunderbird/6.0.2 MIME-Version: 1.0 To: "K. Macy" References: In-Reply-To: X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2011 19:45:46 -0000 on 12/09/2011 21:12 K. Macy said the following: > 2011/9/12 Steven Hartland : >> sendfile doesn't work as you might expect on zfs, its not zero copy due to >> the use of none buffer pool memory. >> >> You do get a benefit but it requires double the amount of memory to get it >> so we've disabled sendfile >> under nginx for zfs based hosts for this very reason. > > > In my performance testing, sending from mmaped I/O on ZFS was > dramatically faster (> 2x) than sendfile. When did you last test that (FreeBSD svn revision)? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 08:43:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 865D6106564A; Tue, 13 Sep 2011 08:43:55 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 3F2EF8FC12; Tue, 13 Sep 2011 08:43:55 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC41D5D.dip.t-dialin.net [79.196.29.93]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id C2405844015; Tue, 13 Sep 2011 10:24:29 +0200 (CEST) Received: from unknown (IO.Leidinger.net [192.168.1.12]) by outgoing.leidinger.net (Postfix) with ESMTP id 0BA673BA7; Tue, 13 Sep 2011 10:24:27 +0200 (CEST) Date: Tue, 13 Sep 2011 10:24:27 +0200 From: Alexander Leidinger To: Pawel Jakub Dawidek Message-ID: <20110913102427.000030df@unknown> In-Reply-To: <20110907094554.GB1674@garage.freebsd.pl> References: <20110907044800.GA96277@server.vk2pj.dyndns.org> <4E673751.5080503@FreeBSD.org> <20110907094554.GB1674@garage.freebsd.pl> X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: C2405844015.AF67F X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.4, required 6, autolearn=disabled, ALL_TRUSTED -1.00, J_CHICKENPOX_55 0.60) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1316507070.41574@Y9j7To/gF2TTUMOKRS7q6Q X-EBL-Spam-Status: No Cc: freebsd-fs@FreeBSD.org, Artem Belevich , Martin =?ISO-8859-1?Q?Matu=C5=A1ka?= , Andriy Gapon Subject: Re: "can't load 'kernel'" on ZFS root X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 08:43:55 -0000 On Wed, 7 Sep 2011 11:45:56 +0200 Pawel Jakub Dawidek wrote: > On Wed, Sep 07, 2011 at 12:20:17PM +0300, Andriy Gapon wrote: > > on 07/09/2011 10:35 Artem Belevich said the following: > > > It makes me wonder, though -- if we're probing devices anyways, > > > why is zpool.cache existence mandatory? According to the name > > > it's a *cache*, presumably to speed up zpool detection on a > > > normal boot. Perhaps we can fall back to probing all drives if > > > zpool.cache is missing. Slower boot definitely beats no booting > > > at all. > > > > Very good point indeed. > > > > Pawel, Martin, do you know how the relevant code works? I suspect > > that you do :-) Maybe this could be improved trivially?... > > The zpool.cache file contains pools that are automatically imported at > system start-up. There might be pools visible in the system that are > not suppose to be automatically imported (eg. a pool on iSCSI disks on > secondary cluster node - importing such pool automatically will > corrupt the data). If I try to import a pool which is imported somewhere else, I get an error. The hostid is used to determine this. I know that the hostid may not be set at point in booting-time, but don't you think this can be changed? Maybe adding a /boot/hostid.conf which the loader tries to load if it exists, and is updated by /etc/rc.d/hostid if there is write access to it and the ID written there is not the same as the hostid script wants it to be. This way the hostid can be known before the root-mount, and all the pools (or at least the pool with the rootfs) can be imported (if the hostid matches, off course), if no cache files exists. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 09:31:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CDCF1065670 for ; Tue, 13 Sep 2011 09:31:48 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 5725F8FC14 for ; Tue, 13 Sep 2011 09:31:48 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1R3PLH-000166-4u for freebsd-fs@freebsd.org; Tue, 13 Sep 2011 11:31:47 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 13 Sep 2011 11:31:47 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 13 Sep 2011 11:31:47 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Tue, 13 Sep 2011 11:31:35 +0200 Lines: 23 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.1) Gecko/20110907 Thunderbird/6.0.1 In-Reply-To: X-Enigmail-Version: 1.1.2 Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 09:31:48 -0000 On 12/09/2011 21:32, Bob Friesenhahn wrote: > On Mon, 12 Sep 2011, Laszlo KAROLYI wrote: > >> But does this explain the 4-5mbyte/s reads when having a 15mbit/s >> network load? > > There are only two viable explanations: > > o Insuffient caching due to insufficient resources > > o Data is not being cached at all > > Zfs reads whole 128K blocks (or whatever the filesystem blocksize is) at > a time. It does not read partial blocks from underlying storage. This > makes it very expensive to perform many small read accesses if the reads > are not subsequently cached in the ARC. Yes! Which makes it particularly "interesting" if you try to run a database on it while forgetting to reset the block size to e.g. 8K before the database is initialized - you get 16x more IO than you expected. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 09:56:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E52DE1065670 for ; Tue, 13 Sep 2011 09:56:43 +0000 (UTC) (envelope-from szoftos@freemail.hu) Received: from fmx21.freemail.hu (fmx21.freemail.hu [195.228.245.71]) by mx1.freebsd.org (Postfix) with SMTP id 31DC48FC08 for ; Tue, 13 Sep 2011 09:56:42 +0000 (UTC) Received: (qmail 45534 invoked from network); 13 Sep 2011 11:56:41 +0200 Received: from 195.228.245.211 (HELO localhost) (91.82.87.114) by fmx21.freemail.hu with SMTP; 13 Sep 2011 11:56:41 +0200 Date: Tue, 13 Sep 2011 11:56:41 +0200 (CEST) From: Laszlo KAROLYI To: freebsd-fs In-Reply-To: Message-ID: X-Originating-IP: [91.82.87.114] X-HTTP-User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 Firefox/5.0 MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=ISO-8859-2 Content-Transfer-Encoding: QUOTED-PRINTABLE Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 09:56:44 -0000 Hello,=0A=0Ait seems that turning off sendfile() in lighttpd2 completely so= lved my problem.=0A=0AThe inactive memory usage ceased, wired memory raised= for ARC cache, and I see an enormous fall of IO load under the same (or ev= en higher) network load. The disk IO load now scales to the network load.= =0A=0ASomething must be not right with the sendfile support in ZFS.=0A=0AHo= wever, my problem solved, and I'd suggest turning off sendfile in any progr= ams in the future, unless this bug gets fixed.=0A=0ALaszlo=0A=0ABob Friesen= hahn =EDrta:=0A>On Mon, 12 Sep 2011, Laszlo = KAROLYI wrote:=0A>=0A>> But does this explain the 4-5mbyte/s reads when hav= ing a 15mbit/s network load?=0A>=0A>There are only two viable explanations:= =0A>=0A> o Insuffient caching due to insufficient resources=0A>=0A> o D= ata is not being cached at all=0A>=0A>Zfs reads whole 128K blocks (or whate= ver the filesystem blocksize is) =0A>at a time. It does not read partial b= locks from underlying storage. =0A>This makes it very expensive to perform = many small read accesses if =0A>the reads are not subsequently cached in th= e ARC.=0A>=0A>Bob=0A>-- =0A>Bob Friesenhahn=0A>bfriesen@simple.dallas.tx.us= , http://www.simplesystems.org/users/bfriesen/=0A>GraphicsMagick Maintainer= , http://www.GraphicsMagick.org/=0A From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 09:56:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E82B41065675; Tue, 13 Sep 2011 09:56:49 +0000 (UTC) (envelope-from mark@exonetric.com) Received: from relay0.exonetric.net (relay0.exonetric.net [82.138.248.161]) by mx1.freebsd.org (Postfix) with ESMTP id 74CB58FC12; Tue, 13 Sep 2011 09:56:49 +0000 (UTC) Received: from [172.16.0.129] (94-30-105-106.xdsl.murphx.net [94.30.105.106]) by relay0.exonetric.net (Postfix) with ESMTP id 5010A57013; Tue, 13 Sep 2011 10:38:21 +0100 (BST) Mime-Version: 1.0 (Apple Message framework v1244.3) From: Mark Blackman In-Reply-To: Date: Tue, 13 Sep 2011 10:38:20 +0100 Message-Id: References: To: Ivan Voras X-Mailer: Apple Mail (2.1244.3) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 09:56:50 -0000 On 13 Sep 2011, at 10:31, Ivan Voras wrote: >> >> Zfs reads whole 128K blocks (or whatever the filesystem blocksize is) at >> a time. It does not read partial blocks from underlying storage. This >> makes it very expensive to perform many small read accesses if the reads >> are not subsequently cached in the ARC. > > Yes! > > Which makes it particularly "interesting" if you try to run a database > on it while forgetting to reset the block size to e.g. 8K before the > database is initialized - you get 16x more IO than you expected. > Yes, I got this but what can you do if the database has already been initialized? I'm guessing the only realistic option is to set the block size, then copy the database files to a new directory and rename the old and new directories so the new directory is the database store. Perhaps you need to use an entirely new dataset? - Mark From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 10:01:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 030471065670 for ; Tue, 13 Sep 2011 10:01:01 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id B068E8FC0C for ; Tue, 13 Sep 2011 10:01:00 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1R3PnX-0003PG-Q3 for freebsd-fs@freebsd.org; Tue, 13 Sep 2011 12:00:59 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 13 Sep 2011 12:00:59 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 13 Sep 2011 12:00:59 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Tue, 13 Sep 2011 12:00:46 +0200 Lines: 25 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.1) Gecko/20110907 Thunderbird/6.0.1 In-Reply-To: X-Enigmail-Version: 1.1.2 Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 10:01:01 -0000 On 13/09/2011 11:38, Mark Blackman wrote: > > On 13 Sep 2011, at 10:31, Ivan Voras wrote: >>> >>> Zfs reads whole 128K blocks (or whatever the filesystem blocksize is) at >>> a time. It does not read partial blocks from underlying storage. This >>> makes it very expensive to perform many small read accesses if the reads >>> are not subsequently cached in the ARC. >> >> Yes! >> >> Which makes it particularly "interesting" if you try to run a database >> on it while forgetting to reset the block size to e.g. 8K before the >> database is initialized - you get 16x more IO than you expected. >> > > Yes, I got this but what can you do if the database has already been > initialized? I'm guessing the only realistic option is to set the block > size, then copy the database files to a new directory and rename the old > and new directories so the new directory is the database store. > > Perhaps you need to use an entirely new dataset? Luckily, you just need to create a new file system and move/copy the data over and back again. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 09:21:15 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E83F1106564A for ; Tue, 13 Sep 2011 09:21:15 +0000 (UTC) (envelope-from lisen1001@gmail.com) Received: from mail-ey0-f176.google.com (mail-ey0-f176.google.com [209.85.215.176]) by mx1.freebsd.org (Postfix) with ESMTP id 7E35F8FC0A for ; Tue, 13 Sep 2011 09:21:15 +0000 (UTC) Received: by eyz10 with SMTP id 10so179898eyz.21 for ; Tue, 13 Sep 2011 02:21:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=n16OfWJruIQflBpuKuIt/WeCYdM4fF9O08zmOcG0Is4=; b=ep4iQwiMidSbqZwor2uQ6xQsCUxgpSJ3dx3CvIGIhvis6qOnP9/ZTU31Lemd6Abaru q6dBWFD94iIKsp3RWLVcA+cOYX4YkUoz9NvO0E5O6vUxoQ3eXYsvNKzKbFjMp3vCU4+x OYeCf5WpbIjvgWHco4B40xoOJBm3KTCMaoD6E= MIME-Version: 1.0 Received: by 10.14.8.20 with SMTP id 20mr1690990eeq.204.1315903938845; Tue, 13 Sep 2011 01:52:18 -0700 (PDT) Received: by 10.14.27.130 with HTTP; Tue, 13 Sep 2011 01:52:18 -0700 (PDT) In-Reply-To: References: Date: Tue, 13 Sep 2011 16:52:18 +0800 Message-ID: From: =?GB2312?B?wO7JrQ==?= To: freebsd-fs@freebsd.org X-Mailman-Approved-At: Tue, 13 Sep 2011 11:22:28 +0000 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: file lose inode in Memory-Based file system. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 09:21:16 -0000 my syetem is FreeBSD 8.2. i build a memory disk : mdmfs -s 10G -i 512 -o rw md1 /home/test1 After a period of time=A3=ACsome file in the memory disk lose their inode: #ls 90020595.o #ls -l 90020595.o ls: 90020595.o: No such file or directory it seem the inode of this file was lost. how to solve this problem? From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 13:57:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D6A2106564A for ; Tue, 13 Sep 2011 13:57:22 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 258A98FC0C for ; Tue, 13 Sep 2011 13:57:21 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p8DDvKj6005259; Tue, 13 Sep 2011 08:57:20 -0500 (CDT) Date: Tue, 13 Sep 2011 08:57:20 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Laszlo KAROLYI In-Reply-To: Message-ID: References: User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 13 Sep 2011 08:57:21 -0500 (CDT) Cc: freebsd-fs Subject: Re: ZFS-lighttpd2-sendfile, too high IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 13:57:22 -0000 On Tue, 13 Sep 2011, Laszlo KAROLYI wrote: > Hello, > > it seems that turning off sendfile() in lighttpd2 completely solved my problem. > > The inactive memory usage ceased, wired memory raised for ARC cache, > and I see an enormous fall of IO load under the same (or even > higher) network load. The disk IO load now scales to the network > load. > > Something must be not right with the sendfile support in ZFS. It seems likely that sendfile is disabling read caching but is also not reading full block-aligned filesystem blocks from zfs (perhaps it does MMU pagesize sized reads). This would result in read amplification. The sendfile() API allows the application to do the wrong thing since it allows the application to specify the starting offset and number of bytes. This means that lighttpd2 may also be causing a problem if it does partial transfers which are not well aligned/sized for underlying filesystem blocks. A system call trace of lighttpd2 when it is configured to use sendfile may be illuminating. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Sep 13 13:59:42 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 443C4106564A; Tue, 13 Sep 2011 13:59:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0BCFE8FC0C; Tue, 13 Sep 2011 13:59:40 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA03034; Tue, 13 Sep 2011 16:59:39 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E6F61CA.6020003@FreeBSD.org> Date: Tue, 13 Sep 2011 16:59:38 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-current@FreeBSD.org, freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek References: <20110901223646.14b8aae8@o2.pl> <4E60DBBD.1040703@FreeBSD.org> <4E679D3D.1000007@FreeBSD.org> In-Reply-To: <4E679D3D.1000007@FreeBSD.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS: i/o error - all block copies unavailable after upgrading to r225312 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Sep 2011 13:59:42 -0000 on 07/09/2011 19:35 Andriy Gapon said the following: > on 02/09/2011 16:35 Andriy Gapon said the following: >> Then: >> - obtain this patch http://people.freebsd.org/~avg/zfstest.head.diff >> - cd sys/boot/zfs >> - apply the patch to zfstest.c >> - cc -I. -I../../cddl/boot/zfs zfstest.c -o zfstest >> - run the resulting binary as root and provide your pool device(s) as >> parameter(s); e.g.: >> ./zfstest /dev/ada0p4 > > Thanks to a lot of excellent testing, debugging and analysis from Sebastian (which > went behind the scenes) we now have this patch: > http://people.freebsd.org/~avg/zfs-boot-gang.diff I've updated the patch place. The essence of the changes is the same, just done in a slightly different fashion. That should minimize the scope of the diff. One extra change is that now the checksum is also verified for uberblock. Pawel, can you please review it? > The patch introduces the following changes: > - checksum is now verified for gang header blocks > - checksum is now verified for reconstituted data of whole gang blocks > (previously it is verified only for individual gang member leaf blocks) > - reconstituted data of a whole gang block is now decompressed if the gang block > is compressed > > The last change is _the_ change. > > If you use compression for a filesystem where your kernel resides and you get a > problem with booting, then please test this patch and report back. > > Many thanks to Sebastian! > Additional heap of thanks to Doug Rabson who came up with the idea and > implementation of zfstest.c! This tool is of the immense help when debugging an > issue like this one. > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Sep 14 21:21:27 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AAFD11065670; Wed, 14 Sep 2011 21:21:27 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4F33B8FC13; Wed, 14 Sep 2011 21:21:27 +0000 (UTC) Received: by qyk4 with SMTP id 4so2379265qyk.13 for ; Wed, 14 Sep 2011 14:21:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=zz1Yo1TYnuIQ3qFKyDCWLav9d8I6AxoMUDj/Xova5zo=; b=nI8kxn4AXbblcw9jaR1XYaJ+YBCe27/EYy0ozhKGnmmYT/2T5WjXk1Kcnyx+PZFJXI dC9JNZCmfT/mcmlc0h++l8jm1v+Zmsf8XHgzKuIln9vMqoQ+JVTah5M5yDpvN0IScJcs Kb0gmL1tVQDZ2+i/Ob7JYmDdfjYVwzhi/C4GA= MIME-Version: 1.0 Received: by 10.224.215.194 with SMTP id hf2mr223540qab.242.1316033451373; Wed, 14 Sep 2011 13:50:51 -0700 (PDT) Received: by 10.224.37.83 with HTTP; Wed, 14 Sep 2011 13:50:51 -0700 (PDT) In-Reply-To: <201109111605.p8BG59cc084589@svn.freebsd.org> References: <201109111605.p8BG59cc084589@svn.freebsd.org> Date: Wed, 14 Sep 2011 13:50:51 -0700 Message-ID: From: Garrett Cooper To: Konstantin Belousov , John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: svn-src-head@freebsd.org, FreeBSD Current , fs@freebsd.org Subject: Re: svn commit: r225474 - in head/sys: amd64/amd64 amd64/ia32 i386/i386 ia64/ia32 ia64/ia64 kern powerpc/aim powerpc/booke sparc64/sparc64 sys X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Sep 2011 21:21:27 -0000 On Sun, Sep 11, 2011 at 9:05 AM, Konstantin Belousov wrot= e: > Author: kib > Date: Sun Sep 11 16:05:09 2011 > New Revision: 225474 > URL: http://svn.freebsd.org/changeset/base/225474 > > Log: > =A0Inline the syscallenter() and syscallret(). This reduces the time meas= ured > =A0by the syscall entry speed microbenchmarks by ~10% on amd64. > > =A0Submitted by: jhb > =A0Approved by: =A0re (bz) > =A0MFC after: =A0 =A02 weeks This change completely breaks ZFS mounting (for some odd reason) with the following backtrace. #0 doadump (textdump=3D0) at /usr/src/sys/kern/kern_shutdown.c:260 260 /usr/src/sys/kern/kern_shutdown.c: No such file or directory. in /usr/src/sys/kern/kern_shutdown.c (kgdb) #0 doadump (textdump=3D0) at /usr/src/sys/kern/kern_shutdown.c:260 #1 0xffffffff802b1cd0 in db_dump (dummy=3DVariable "dummy" is not availabl= e. ) at /usr/src/sys/ddb/db_command.c:537 #2 0xffffffff802b12c1 in db_command (last_cmdp=3D0xffffffff809b96c0, cmd_table=3DVariable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:448 #3 0xffffffff802b1510 in db_command_loop () at /usr/src/sys/ddb/db_command.c:501 #4 0xffffffff802b3664 in db_trap (type=3DVariable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0xffffffff804b29d1 in kdb_trap (type=3D3, code=3D0, tf=3D0xffffff8231a5= f3d0) at /usr/src/sys/kern/subr_kdb.c:631 #6 0xffffffff80646ac8 in trap (frame=3D0xffffff8231a5f3d0) at /usr/src/sys/amd64/amd64/trap.c:590 #7 0xffffffff8063113f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #8 0xffffffff804b277b in kdb_enter (why=3D0xffffffff806e022b "panic", msg=3D0x80
) at cpufunc.h:63 #9 0xffffffff8047db5c in panic (fmt=3DVariable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:599 #10 0xffffffff8046e5cc in _mtx_assert (m=3DVariable "m" is not available. ) at /usr/src/sys/kern/kern_mutex.c:706 #11 0xffffffff80620f31 in vm_page_free_toq (m=3D0xfffffe021bf3d1f0) at /usr/src/sys/vm/vm_page.c:1756 #12 0xffffffff80c77938 in zfs_freebsd_getpages () from /boot/kernel/zfs.ko #13 0xffffffff8046ebd6 in _mtx_unlock_flags (m=3D0xfffffe0006dc7000, opts=3D421100272, file=3D0xfffffe0006dc70e8 "=B8P=CE\200=FF=FF=FF=FF", = line=3D1) at /usr/src/sys/kern/kern_mutex.c:223 Previous frame inner to this frame (corrupt stack?) (kgdb) Here's my system info: $ uname -a FreeBSD streetfighter.ixsystems.com 9.0-BETA2 FreeBSD 9.0-BETA2 #2 r225558M: Wed Sep 14 12:09:35 PDT 2011 gcooper@streetfighter.ixsystems.com:/usr/obj/usr/src/sys/STREETFIGHTER amd64 $ mount /dev/ada0p2 on / (ufs, local, soft-updates) devfs on /dev (devfs, local) tank/scratch on /scratch (zfs, local, nfsv4acls) tank/usr on /usr (zfs, local, nfsv4acls) tank/usr/home on /usr/home (zfs, local, nfsv4acls) tank/usr/src on /usr/src (zfs, local, nfsv4acls) tank/var on /var (zfs, local, nfsv4acls) I tried inspecting 'm' (the mutex), but the value had been optimized out (for some odd reason). I can provide more details offline if interested. Thanks, -Garrett From owner-freebsd-fs@FreeBSD.ORG Thu Sep 15 20:44:34 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8BBF106566C; Thu, 15 Sep 2011 20:44:34 +0000 (UTC) (envelope-from olgeni@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9FCB88FC08; Thu, 15 Sep 2011 20:44:34 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p8FKiYOS030141; Thu, 15 Sep 2011 20:44:34 GMT (envelope-from olgeni@freefall.freebsd.org) Received: (from olgeni@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p8FKiYDV030137; Thu, 15 Sep 2011 20:44:34 GMT (envelope-from olgeni) Date: Thu, 15 Sep 2011 20:44:34 GMT Message-Id: <201109152044.p8FKiYDV030137@freefall.freebsd.org> To: olgeni@freebsd.org, olgeni@FreeBSD.org, freebsd-fs@FreeBSD.org From: olgeni@FreeBSD.org Cc: Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Sep 2011 20:44:34 -0000 Synopsis: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c State-Changed-From-To: open->closed State-Changed-By: olgeni State-Changed-When: Thu Sep 15 20:43:30 UTC 2011 State-Changed-Why: The slow probing issue doesn't seem to exist on 8-STABLE anymore. http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 From owner-freebsd-fs@FreeBSD.ORG Thu Sep 15 22:55:35 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F210106566B for ; Thu, 15 Sep 2011 22:55:35 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id C19DB8FC13 for ; Thu, 15 Sep 2011 22:55:34 +0000 (UTC) Received: by yia13 with SMTP id 13so1155499yia.13 for ; Thu, 15 Sep 2011 15:55:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=0z2RRVbD5gr5tS+iiyL84C8KeI8Ag8bExV9UBO/WD7k=; b=VuCTOwFX/vFFFSklutHqGJASJhIW+jpJwT+gfh3n0B8Y83dfMkkuCNvNgsU/+ErWWX w106ZdOD8FtX+UB6T11HHmCygdMBCpRKEdFb7r2gGFEu+/rpjYKg4Z5r6pUBWcRTtOqm 72ZA9hgui3s/s2lOaG21+sUZHUSOudOHI1720= MIME-Version: 1.0 Received: by 10.236.201.233 with SMTP id b69mr10879144yho.51.1316127334174; Thu, 15 Sep 2011 15:55:34 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.236.102.147 with HTTP; Thu, 15 Sep 2011 15:55:34 -0700 (PDT) In-Reply-To: <201109152044.p8FKiYDV030137@freefall.freebsd.org> References: <201109152044.p8FKiYDV030137@freefall.freebsd.org> Date: Thu, 15 Sep 2011 15:55:34 -0700 X-Google-Sender-Auth: aSR1Zf-y-K6agbo39Eb4xJuG6As Message-ID: From: Artem Belevich To: olgeni@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Sep 2011 22:55:35 -0000 Jimmy, On Thu, Sep 15, 2011 at 1:44 PM, wrote: > Synopsis: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c > > State-Changed-From-To: open->closed > State-Changed-By: olgeni > State-Changed-When: Thu Sep 15 20:43:30 UTC 2011 > State-Changed-Why: > The slow probing issue doesn't seem to exist on 8-STABLE anymore. > > http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 Any idea what fixed it? Actually, I'm not sure it's actually got fixed. At least I don't see anything obvious in SVN history for sys/boot/zfs/zfs.c since ZVSv28 commit. I've got gptzfsboot built about 4-6 weeks back and at that time it was as slow as it ever was probing drives in an 8-drive raidz pool. --Artem From owner-freebsd-fs@FreeBSD.ORG Fri Sep 16 03:07:47 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5EB24106564A for ; Fri, 16 Sep 2011 03:07:47 +0000 (UTC) (envelope-from olgeni@FreeBSD.org) Received: from mail.colby.tv (93-62-141-58.ip22.fastwebnet.it [93.62.141.58]) by mx1.freebsd.org (Postfix) with ESMTP id DC8168FC14 for ; Fri, 16 Sep 2011 03:07:46 +0000 (UTC) Received: from server.colby.local (localhost [127.0.0.1]) by server.colby.local (8.14.4/8.14.4) with ESMTP id p8G2SmfH046369; Fri, 16 Sep 2011 04:28:48 +0200 (CEST) (envelope-from olgeni@FreeBSD.org) Received: from exchange.colby.local ([192.168.1.11] helo=exchange.colby.local) with IPv4:25 by server.colby.local; 16 Sep 2011 04:28:48 +0200 Received: from [10.1.0.3] ([10.1.0.3]) by exchange.colby.local over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Fri, 16 Sep 2011 04:28:48 +0200 Date: Fri, 16 Sep 2011 04:28:47 +0200 (CEST) From: Jimmy Olgeni X-X-Sender: olgeni@olgeni.olgeni To: Artem Belevich In-Reply-To: Message-ID: References: <201109152044.p8FKiYDV030137@freefall.freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-OriginalArrivalTime: 16 Sep 2011 02:28:48.0209 (UTC) FILETIME=[5D68D410:01CC7418] Cc: freebsd-fs@FreeBSD.org Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Sep 2011 03:07:47 -0000 On Thu, 15 Sep 2011, Artem Belevich wrote: > Any idea what fixed it? > > Actually, I'm not sure it's actually got fixed. At least I don't see > anything obvious in SVN history for sys/boot/zfs/zfs.c since ZVSv28 > commit. I've got gptzfsboot built about 4-6 weeks back and at that > time it was as slow as it ever was probing drives in an 8-drive raidz > pool. For me it was happening when booting off a mirror, but I haven't seen this happen in a while. Last time I saw this was definitely in the pre-ZFSv28 times. It could be that the problem is still there but I am unable to see it due to different caching on the disks, as I swapped hardware a few times since then. Also, the attached patch did not handle "holes" in the GPT partition sequence so it was kind of dangerous if you had a less than linear setup :) -- jimmy From owner-freebsd-fs@FreeBSD.ORG Fri Sep 16 07:20:46 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A43AF1065673; Fri, 16 Sep 2011 07:20:46 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 341058FC19; Fri, 16 Sep 2011 07:20:45 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA24945; Fri, 16 Sep 2011 10:20:43 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1R4Sj4-0007YK-U7; Fri, 16 Sep 2011 10:20:42 +0300 Message-ID: <4E72F8C8.8070603@FreeBSD.org> Date: Fri, 16 Sep 2011 10:20:40 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110907 Thunderbird/6.0.2 MIME-Version: 1.0 To: freebsd-current@FreeBSD.org, freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek References: <20110901223646.14b8aae8@o2.pl> <4E60DBBD.1040703@FreeBSD.org> <4E679D3D.1000007@FreeBSD.org> <4E6F61CA.6020003@FreeBSD.org> In-Reply-To: <4E6F61CA.6020003@FreeBSD.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS: i/o error - all block copies unavailable after upgrading to r225312 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Sep 2011 07:20:46 -0000 I've discovered another issue with both the current code and the previous versions of the patch. So another version is at the same URL: http://people.freebsd.org/~avg/zfs-boot-gang.diff The problem was with gang blocks on a raidz vdev. zio_read_gang would pass a NULL bp to vdev_raidz_read, but that function doesn't expect that and really needs a valid bp. Pawel, I also understood the code in zio_read that does size rounding up based on v_ashift. That code is for vdev_raidz_read which needs the size to be multiple of 1 << v_ashift. And, again, this would have posed a problem for zio_read_gang, which always passed SPA_GANGBLOCKSIZE when reading a gang block header. So, a list of fixes and logical changes: - correctly read gang header from raidz [*] - decompress assembled gang block data if compressed - verify checksum of a gang header - verify checksum of assembled gang block data - verify checksum of uber block [*] - new in this version of patch. Description of the code changes: - remove offset parameter from zio_checksum_error This parameter has only been used for the case of verifying checksum of a vdev label. The offset is now passed via DVA offset field in a made up bp pointing to the label. zio_checksum_error now gets all checksum parameters from the bp. - zio_read_gang new uses an artificial bp to read a gang header via zio_read This solves all the problems with gang blocks on raidz as zio_read already has all the code to handle raidz correctly. - zio_read performs size rounding only if v_read == vdev_raidz_read This is to make the intention of the code more clear. And also to slightly optimize non-raidz cases with non-default ashift where an unnecessary intermediate buffer would otherwise be used. Some inline comments (marked with %): Index: sys/cddl/boot/zfs/zfssubr.c =================================================================== --- sys/cddl/boot/zfs/zfssubr.c (revision 225581) +++ sys/cddl/boot/zfs/zfssubr.c (working copy) @@ -181,14 +181,17 @@ } static int -zio_checksum_error(const blkptr_t *bp, void *data, uint64_t offset) +zio_checksum_error(const blkptr_t *bp, void *data) { - unsigned int checksum = BP_IS_GANG(bp) ? ZIO_CHECKSUM_GANG_HEADER : BP_GET_CHECKSUM(bp); - uint64_t size = BP_GET_PSIZE(bp); + uint64_t size; + unsigned int checksum; zio_checksum_info_t *ci; zio_cksum_t actual_cksum, expected_cksum, verifier; int byteswap; + checksum = BP_GET_CHECKSUM(bp); + size = BP_GET_PSIZE(bp); + % % checksum type and size are always taken from the bp. % Note that BP_IS_GANG(bp) doesn't imply that the caller wants to verify % the gang header (as was assumed before); the caller want want to verify % the whole assembled data as well. % if (checksum >= ZIO_CHECKSUM_FUNCTIONS) return (EINVAL); ci = &zio_checksum_table[checksum]; @@ -206,7 +209,8 @@ if (checksum == ZIO_CHECKSUM_GANG_HEADER) zio_checksum_gang_verifier(&verifier, bp); else if (checksum == ZIO_CHECKSUM_LABEL) - zio_checksum_label_verifier(&verifier, offset); + zio_checksum_label_verifier(&verifier, + DVA_GET_OFFSET(BP_IDENTITY(bp))); % % label offset is taken from the bp now. % else verifier = bp->blk_cksum; @@ -224,7 +228,6 @@ byteswap_uint64_array(&expected_cksum, sizeof (zio_cksum_t)); } else { - ASSERT(!BP_IS_GANG(bp)); % % the assert is no longer valid as we pass a gang block bp % when verifying checksum of assembled data % expected_cksum = bp->blk_cksum; ci->ci_func[0](data, size, &actual_cksum); } @@ -1248,7 +1251,7 @@ raidz_checksum_verify(const blkptr_t *bp, void *data) { - return (zio_checksum_error(bp, data, 0)); + return (zio_checksum_error(bp, data)); } /* Index: sys/boot/zfs/zfsimpl.c =================================================================== --- sys/boot/zfs/zfsimpl.c (revision 225581) +++ sys/boot/zfs/zfsimpl.c (working copy) @@ -347,7 +347,7 @@ rc = vdev->v_phys_read(vdev, vdev->v_read_priv, offset, buf, psize); if (rc) return (rc); - if (bp && zio_checksum_error(bp, buf, offset)) + if (bp && zio_checksum_error(bp, buf)) % % the bp can still be NULL here: when raidz code reads blocks of data % it doesn't pass any bp, because there is really no bp for that data. % this should be the only case, in all other cases a valid bp must be provided. % return (EIO); return (0); @@ -800,6 +800,7 @@ BP_SET_PSIZE(&bp, sizeof(vdev_phys_t)); BP_SET_CHECKSUM(&bp, ZIO_CHECKSUM_LABEL); BP_SET_COMPRESS(&bp, ZIO_COMPRESS_OFF); + DVA_SET_OFFSET(BP_IDENTITY(&bp), off); % % as described above, the offset is now passed via DVA offset field % ZIO_SET_CHECKSUM(&bp.blk_cksum, off, 0, 0, 0); if (vdev_read_phys(&vtmp, &bp, vdev_label, off, 0)) return (EIO); @@ -941,7 +942,7 @@ BP_SET_COMPRESS(&bp, ZIO_COMPRESS_OFF); ZIO_SET_CHECKSUM(&bp.blk_cksum, off, 0, 0, 0); - if (vdev_read_phys(vdev, NULL, upbuf, off, VDEV_UBERBLOCK_SIZE(vdev))) + if (vdev_read_phys(vdev, &bp, upbuf, off, 0)) % % pass the artificial uberblock bp to vdev_read_phys, so that it % can call zio_checksum_error and verify the checksum % continue; if (up->ub_magic != UBERBLOCK_MAGIC) @@ -974,34 +975,39 @@ } static int -zio_read_gang(spa_t *spa, const blkptr_t *bp, const dva_t *dva, void *buf) +zio_read_gang(spa_t *spa, const blkptr_t *bp, void *buf) { + blkptr_t gbh_bp; zio_gbh_phys_t zio_gb; - vdev_t *vdev; - int vdevid; - off_t offset; + char *pbuf; int i; - vdevid = DVA_GET_VDEV(dva); - offset = DVA_GET_OFFSET(dva); - STAILQ_FOREACH(vdev, &spa->spa_vdevs, v_childlink) - if (vdev->v_id == vdevid) - break; - if (!vdev || !vdev->v_read) + /* Artificial BP for gang block header. */ + gbh_bp = *bp; + BP_SET_PSIZE(&gbh_bp, SPA_GANGBLOCKSIZE); + BP_SET_LSIZE(&gbh_bp, SPA_GANGBLOCKSIZE); + BP_SET_CHECKSUM(&gbh_bp, ZIO_CHECKSUM_GANG_HEADER); + BP_SET_COMPRESS(&gbh_bp, ZIO_COMPRESS_OFF); + for (i = 0; i < SPA_DVAS_PER_BP; i++) + DVA_SET_GANG(&gbh_bp.blk_dva[i], 0); % % the artifical bp for the gang header. % it has PSIZE and LSIZE of SPA_GANGBLOCKSIZE. % cheksum is set ZIO_CHECKSUM_GANG_HEADER, so that zio_checksum_error % does the right thing. % compression is set to ZIO_COMPRESS_OFF, so that zio_read does the right thing. % the gang bit is cleared from the DVAs - we read only the gang header block and % do not want to create endless recursion here. % vdevs and offsets are preserved in the DVAs of the BP. % + + /* Read gang header block using the artificial BP. */ + if (zio_read(spa, &gbh_bp, &zio_gb)) return (EIO); - if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE)) - return (EIO); % % so now smarter zio_read can be used instead of dumber v_read. % Which, again, would have crashed on NULL bp if v_read == vdev_raidz_read % + pbuf = buf; % % keep the original buffer pointer, use pbuf to populate the buffer % for (i = 0; i < SPA_GBH_NBLKPTRS; i++) { blkptr_t *gbp = &zio_gb.zg_blkptr[i]; if (BP_IS_HOLE(gbp)) continue; - if (zio_read(spa, gbp, buf)) + if (zio_read(spa, gbp, pbuf)) return (EIO); - buf = (char*)buf + BP_GET_PSIZE(gbp); + pbuf += BP_GET_PSIZE(gbp); } - + + if (zio_checksum_error(bp, buf)) + return (EIO); % % the original pointer is used to verify checksum of the assembled data % of the gang block. % Note: this is where the gang bp is passed to zio_checksum_error. % return (0); } @@ -1024,46 +1030,41 @@ if (!dva->dva_word[0] && !dva->dva_word[1]) continue; - if (DVA_GET_GANG(dva)) { - error = zio_read_gang(spa, bp, dva, buf); - if (error != 0) - continue; - } else { - vdevid = DVA_GET_VDEV(dva); - offset = DVA_GET_OFFSET(dva); - STAILQ_FOREACH(vdev, &spa->spa_vdevs, v_childlink) { - if (vdev->v_id == vdevid) - break; - } - if (!vdev || !vdev->v_read) - continue; + vdevid = DVA_GET_VDEV(dva); + offset = DVA_GET_OFFSET(dva); + STAILQ_FOREACH(vdev, &spa->spa_vdevs, v_childlink) { + if (vdev->v_id == vdevid) + break; + } + if (!vdev || !vdev->v_read) + continue; - size = BP_GET_PSIZE(bp); + size = BP_GET_PSIZE(bp); + if (vdev->v_read == vdev_raidz_read) { align = 1ULL << vdev->v_top->v_ashift; if (P2PHASE(size, align) != 0) size = P2ROUNDUP(size, align); % % do size rounding up only if v_read == vdev_raidz_read, % because only vdev_raidz_read requires that. % vdev_read_phys requires only 512 byte granularity. % - if (size != BP_GET_PSIZE(bp) || cpfunc != ZIO_COMPRESS_OFF) - pbuf = zfs_alloc(size); - else - pbuf = buf; + } + if (size != BP_GET_PSIZE(bp) || cpfunc != ZIO_COMPRESS_OFF) + pbuf = zfs_alloc(size); + else + pbuf = buf; % % So use temporary buffer only if either the block is compressed or % vdev_raidz_read requires a larger buffer % + if (DVA_GET_GANG(dva)) + error = zio_read_gang(spa, bp, pbuf); + else error = vdev->v_read(vdev, bp, pbuf, offset, size); - if (error == 0) { - if (cpfunc != ZIO_COMPRESS_OFF) { - error = zio_decompress_data(cpfunc, - pbuf, BP_GET_PSIZE(bp), buf, - BP_GET_LSIZE(bp)); - } else if (size != BP_GET_PSIZE(bp)) { - bcopy(pbuf, buf, BP_GET_PSIZE(bp)); - } - } - if (buf != pbuf) - zfs_free(pbuf, size); - if (error != 0) - continue; + if (error == 0) { + if (cpfunc != ZIO_COMPRESS_OFF) + error = zio_decompress_data(cpfunc, pbuf, + BP_GET_PSIZE(bp), buf, BP_GET_LSIZE(bp)); + else if (size != BP_GET_PSIZE(bp)) + bcopy(pbuf, buf, BP_GET_PSIZE(bp)); } % % Decompression is now done for the gang blocks too. % Ditto for the use of larger buffers in the raidz case. % - error = 0; - break; + if (buf != pbuf) + zfs_free(pbuf, size); + if (error == 0) + break; } if (error != 0) printf("ZFS: i/o error - all block copies unavailable\n"); -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 01:36:02 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42222106566B for ; Sat, 17 Sep 2011 01:36:02 +0000 (UTC) (envelope-from jusher71@yahoo.com) Received: from nm23.bullet.mail.ne1.yahoo.com (nm23.bullet.mail.ne1.yahoo.com [98.138.90.86]) by mx1.freebsd.org (Postfix) with SMTP id 0B6A38FC12 for ; Sat, 17 Sep 2011 01:36:01 +0000 (UTC) Received: from [98.138.90.53] by nm23.bullet.mail.ne1.yahoo.com with NNFMP; 17 Sep 2011 01:22:07 -0000 Received: from [98.138.89.193] by tm6.bullet.mail.ne1.yahoo.com with NNFMP; 17 Sep 2011 01:22:07 -0000 Received: from [127.0.0.1] by omp1051.mail.ne1.yahoo.com with NNFMP; 17 Sep 2011 01:22:07 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 44608.31226.bm@omp1051.mail.ne1.yahoo.com Received: (qmail 34818 invoked by uid 60001); 17 Sep 2011 01:22:07 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1316222526; bh=+gg/+DHfWZXrSnFpnohdpiiH0z8BWFYWfwrsFhEx2ec=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=So/xgs0ks21kp0AXNLC3QpFAPaRV66BHmjCTG6sUVV9yQRuXxxSdjm/03YNZki8AsTNkfxA77r1cyCDe+eNIW+0jGYr7zCd3YFRXDZPmfeFQ7kP5SWObh2ceBFf6Woh2uwPhNZObCZp9hqoDB0fuJuSQ8S/OlTKlHkLr2He4KVA= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=W8p1osUGtVAwPZD7//Xlyzw+AKhInL3G3joJo4rReJ6T5Vcqp8kvFmakH8SnN2pxYhHQLOTvDijHJFGBGKSR2vTfaOC9OI8V69jifRL4dDVD8wRepV9KV+jkZjigKN/tf9KA52wvzmhVPlgwwMhOogJcD2aDvcJgdtGzlyGIoHM=; X-YMail-OSG: 0YXLavAVM1lVIh5q9UR8Kqg6rLlXDPhztOpqcy09JADDO5k WgC39LDwP_OhG87QhGPtURic4DzKMo1X7gUomRXEiktxuojUD5VOW0Lgvina LZIZrxveMiesFnqdwq4raN5ryZUDe8yJKldyLQL5F1ZNV4eRSHQErOnsk2pE 1BUVnctC0BWX54H0ikoajwiBoOictAgB0G_41olKtWr9Hq_hWXXZ_5K7eEpP reEqSTMDuAboboqY2xC0HjbgBo.0hQAlSklhOpuz.axXt63YFkDyKpCqKgdO 2BlycbDrDbT46krsPgJmmD5LQG9vfj8wYvX.YMwk7VlCU6B.FHlZSc00UHbw gUmnUis5af28gyeOqnFH3N1QazBltWOTR_1p7KP4X Received: from [199.48.147.46] by web121205.mail.ne1.yahoo.com via HTTP; Fri, 16 Sep 2011 18:22:06 PDT X-Mailer: YahooMailWebService/0.8.114.317681 Message-ID: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> Date: Fri, 16 Sep 2011 18:22:06 -0700 (PDT) From: Jason Usher To: "freebsd-fs@freebsd.org" MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Subject: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jason Usher List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 01:36:02 -0000 =0AHello,=0A=0AI am building my first FreeBSD based ZFS system and am decid= ing on a hardware model.=A0 The overriding requirement is:=0A=0A1) immediat= ely support 48 internal sata3 drives at full bandwidth - every drive has in= dependent path to CPU=0A=0A2) future expansion to support another 48 drives= on an attached JBOD, all of which ALSO have their own independent path to = CPU=0A=0AThe first question is:=A0 how many pcie 2.0 lanes does a motherboa= rd need to run 96 independent sata3 connections ?=A0 Am I correct that this= is extremely important ?=0A=0ANext, I see a lot of implementations done wi= th LSI adaptors - is this as simple as choosing (3) LSI SAS 9201-16i for th= e 48 internal drives and (3) LSI SAS 9201-16e for the external drives ?=A0 = I remember that these cards are supported with mps(4) in FreeBSD, but only = in 9.x (?) - is that still the case, or is that support in 8.2 or later in = 8.3 ?=0A=0ASo I will boot of a pair of mirrored SSDs formatted UFS2 - easy.= =A0 But I would also like to spec and use a ZIL+L2ARC and am not sure where= to go ... the system will be VERY write-biased and use a LOT of inodes - s= o lots of scanning of large dirs with lots of inodes and writing data.=A0 S= omething like 400 million inodes on a filesystem with an average file size = of 150 KB.=0A=0A- can I just skip the l2arc and just add more RAM ?=A0 Woul= dn't the RAM always be faster/better ?=A0 Or do folks build such large L2ar= cs (4x200 GB SSD ?) that it outweighs an extra 32 GB of RAM ?=0A=0A- provid= ed I maintain the free pcie slot(s) and/or free 2.5" drive slots, can I alw= ays just add a ZIL after the fact ?=A0 I'd prefer to skip it for now and sa= ve that complexity for later...=0A=0AThanks very much for any comments/sugg= estions.=0A From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 04:12:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 382911065672 for ; Sat, 17 Sep 2011 04:12:37 +0000 (UTC) (envelope-from boydjd@jbip.net) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id A522D8FC15 for ; Sat, 17 Sep 2011 04:12:36 +0000 (UTC) Received: by fxg9 with SMTP id 9so3121685fxg.13 for ; Fri, 16 Sep 2011 21:12:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jbip.net; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=aa53U9fvIPjpY1Quxw+aPqPeoq5D5Ncmt3IibW6Ma5g=; b=fh8IQ76bt1rV7Ms8B7C1WdeXyKZml7Hal0pa+RI52ZwNGSb/Sk7AjS4sc6bWW/pVuj JqDJAaoTbgE1wQDE5vU6y+UnLQ6+GozMWNxkKGiNbhG7Sh4K0qaBvqoywwVoY+sqlC5m PLlmP91C2/xL6bc50p2XafEdRCsGm7oHFeR18= Received: by 10.223.31.151 with SMTP id y23mr360742fac.48.1316231130080; Fri, 16 Sep 2011 20:45:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.58.133 with HTTP; Fri, 16 Sep 2011 20:45:10 -0700 (PDT) In-Reply-To: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> References: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> From: Joshua Boyd Date: Fri, 16 Sep 2011 23:45:10 -0400 Message-ID: To: Jason Usher Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 04:12:37 -0000 On Fri, Sep 16, 2011 at 9:22 PM, Jason Usher wrote: > > Hello, > > I am building my first FreeBSD based ZFS system and am deciding on a > hardware model. The overriding requirement is: > > 1) immediately support 48 internal sata3 drives at full bandwidth - every > drive has independent path to CPU > > 2) future expansion to support another 48 drives on an attached JBOD, all > of which ALSO have their own independent path to CPU > > The first question is: how many pcie 2.0 lanes does a motherboard need to > run 96 independent sata3 connections ? Am I correct that this is extremely > important ? > > Next, I see a lot of implementations done with LSI adaptors - is this as > simple as choosing (3) LSI SAS 9201-16i for the 48 internal drives and (3) > LSI SAS 9201-16e for the external drives ? I remember that these cards are > supported with mps(4) in FreeBSD, but only in 9.x (?) - is that still the > case, or is that support in 8.2 or later in 8.3 ? > > So I will boot of a pair of mirrored SSDs formatted UFS2 - easy. But I > would also like to spec and use a ZIL+L2ARC and am not sure where to go ... > the system will be VERY write-biased and use a LOT of inodes - so lots of > scanning of large dirs with lots of inodes and writing data. Something like > 400 million inodes on a filesystem with an average file size of 150 KB. > > - can I just skip the l2arc and just add more RAM ? Wouldn't the RAM > always be faster/better ? Or do folks build such large L2arcs (4x200 GB SSD > ?) that it outweighs an extra 32 GB of RAM ? > > - provided I maintain the free pcie slot(s) and/or free 2.5" drive slots, > can I always just add a ZIL after the fact ? I'd prefer to skip it for now > and save that complexity for later... > > Thanks very much for any comments/suggestions. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > I've built something similar, using 3 Supermicro SC933 chassis, 2 HP SAS expanders, 2 AOC-USAS-L8i cards, and 1 card with 2 external ports (I can't remember the exact name, but it's an LSI chipset card). This is a 45 drive capable setup, so smaller than what you're wanting. I'd recommend you get two of these: http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm That gives you 90 drives in 8U. They each have dual port expanders integrated to the backplanes. Then build a separate 1 or 2u box that holds your boot drives/cache drives. In this box put in 2 6Gb cards with external SAS connectors. Something like the 9750-8E, which are 6Gbit/s cards and support drives bigger than 2TB. You'll need to run 8-STABLE, as these cards use the mptsas driver, which isn't in 8-RELEASE last I checked. I don't have any experience with separate cache/log devices, so I can't offer much advice there. -- Joshua Boyd E-mail: boydjd@jbip.net http://www.jbip.net From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 04:13:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE0EC106567A for ; Sat, 17 Sep 2011 04:13:42 +0000 (UTC) (envelope-from boydjd@jbip.net) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4FAED8FC08 for ; Sat, 17 Sep 2011 04:13:42 +0000 (UTC) Received: by fxg9 with SMTP id 9so3122152fxg.13 for ; Fri, 16 Sep 2011 21:13:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jbip.net; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=qe+Tue998WPZ2PSWGCtNmCVu962f027KKYFMzksLo9M=; b=FN/vPQLxJOQ6OBFjd6DkrNfIn0yULCM+29xpemOcO9pjkI+Pwnm7y2+KqEFAu4afNu 5zP9sdpHW45dlOF0WJmgyHakg8YEvqrBtbFK/Z79zPy093gHKBHpdBSrBckrlswegVdA 42fpSuZ68TZ1438aRFUz4C90e58Jafy8dNnjE= Received: by 10.223.49.140 with SMTP id v12mr354664faf.67.1316231352234; Fri, 16 Sep 2011 20:49:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.58.133 with HTTP; Fri, 16 Sep 2011 20:48:52 -0700 (PDT) In-Reply-To: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> References: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> From: Joshua Boyd Date: Fri, 16 Sep 2011 23:48:52 -0400 Message-ID: To: Jason Usher Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 04:13:42 -0000 On Fri, Sep 16, 2011 at 9:22 PM, Jason Usher wrote: > > So I will boot of a pair of mirrored SSDs formatted UFS2 - easy. But I > would also like to spec and use a ZIL+L2ARC and am not sure where to go ... > the system will be VERY write-biased and use a LOT of inodes - so lots of > scanning of large dirs with lots of inodes and writing data. Something like > 400 million inodes on a filesystem with an average file size of 150 KB. > 400 million inodes on a single file system is not going to work out well. It doesn't matter /what/ file system you use, it will end up barfing. I'd recommend creating lots of zfs filesystems on your zpools. With 400 million inodes, I'm HOPING that they're atleast divided into subdirectories, create a zfs for each subdirectory. That'll help a lot. > > - can I just skip the l2arc and just add more RAM ? Wouldn't the RAM > always be faster/better ? Or do folks build such large L2arcs (4x200 GB SSD > ?) that it outweighs an extra 32 GB of RAM ? > More RAM is always faster, but striped SSDs are pretty fast too. I'd max out the RAM on this sort of server, and then add L2ARC devices if needed. > > - provided I maintain the free pcie slot(s) and/or free 2.5" drive slots, > can I always just add a ZIL after the fact ? I'd prefer to skip it for now > and save that complexity for later... > Yes, you can add a ZIL later. > > Thanks very much for any comments/suggestions. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Joshua Boyd E-mail: boydjd@jbip.net http://www.jbip.net From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 05:40:47 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A3FF1065675 for ; Sat, 17 Sep 2011 05:40:47 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 9DA188FC14 for ; Sat, 17 Sep 2011 05:40:46 +0000 (UTC) Received: from julian-mac.elischer.org (home-nat.elischer.org [67.100.89.137]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id p8H5KuiX045650 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 16 Sep 2011 22:20:57 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <4E742E5C.2010900@freebsd.org> Date: Fri, 16 Sep 2011 22:21:32 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.22) Gecko/20110902 Thunderbird/3.1.14 MIME-Version: 1.0 To: Joshua Boyd References: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jason Usher , "freebsd-fs@freebsd.org" Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 05:40:47 -0000 On 9/16/11 8:45 PM, Joshua Boyd wrote: > On Fri, Sep 16, 2011 at 9:22 PM, Jason Usher wrote: > >> Hello, >> >> I am building my first FreeBSD based ZFS system and am deciding on a >> hardware model. The overriding requirement is: >> >> 1) immediately support 48 internal sata3 drives at full bandwidth - every >> drive has independent path to CPU >> >> 2) future expansion to support another 48 drives on an attached JBOD, all >> of which ALSO have their own independent path to CPU >> >> The first question is: how many pcie 2.0 lanes does a motherboard need to >> run 96 independent sata3 connections ? Am I correct that this is extremely >> important ? >> >> Next, I see a lot of implementations done with LSI adaptors - is this as >> simple as choosing (3) LSI SAS 9201-16i for the 48 internal drives and (3) >> LSI SAS 9201-16e for the external drives ? I remember that these cards are >> supported with mps(4) in FreeBSD, but only in 9.x (?) - is that still the >> case, or is that support in 8.2 or later in 8.3 ? >> >> So I will boot of a pair of mirrored SSDs formatted UFS2 - easy. But I >> would also like to spec and use a ZIL+L2ARC and am not sure where to go ... >> the system will be VERY write-biased and use a LOT of inodes - so lots of >> scanning of large dirs with lots of inodes and writing data. Something like >> 400 million inodes on a filesystem with an average file size of 150 KB. >> >> - can I just skip the l2arc and just add more RAM ? Wouldn't the RAM >> always be faster/better ? Or do folks build such large L2arcs (4x200 GB SSD >> ?) that it outweighs an extra 32 GB of RAM ? >> >> - provided I maintain the free pcie slot(s) and/or free 2.5" drive slots, >> can I always just add a ZIL after the fact ? I'd prefer to skip it for now >> and save that complexity for later... >> >> Thanks very much for any comments/suggestions. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > I've built something similar, using 3 Supermicro SC933 chassis, 2 HP SAS > expanders, 2 AOC-USAS-L8i cards, and 1 card with 2 external ports (I can't > remember the exact name, but it's an LSI chipset card). This is a 45 drive > capable setup, so smaller than what you're wanting. > > I'd recommend you get two of these: > > http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm > > That gives you 90 drives in 8U. They each have dual port expanders > integrated to the backplanes. Then build a separate 1 or 2u box that holds > your boot drives/cache drives. In this box put in 2 6Gb cards with external > SAS connectors. Something like the 9750-8E, which are 6Gbit/s cards and > support drives bigger than 2TB. You'll need to run 8-STABLE, as these cards > use the mptsas driver, which isn't in 8-RELEASE last I checked. > > I don't have any experience with separate cache/log devices, so I can't > offer much advice there. > what is it you are trying to achieve? large storage, or high transaction rates? (or both?) I'm biased but I'd put a 160GB zil on a fusion-io card and dedicate 8G of the ram to it's useage. it's remarkable what a 20uSec turnaround time on your metadata can do.. From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 05:45:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4609A106564A for ; Sat, 17 Sep 2011 05:45:51 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id E20358FC16 for ; Sat, 17 Sep 2011 05:45:50 +0000 (UTC) Received: by qyk4 with SMTP id 4so5029218qyk.13 for ; Fri, 16 Sep 2011 22:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=sNEKOnFzZCEpFBNnY6Hcdr0dpVlKmDtzQO92khhxyLI=; b=gIezBUN8DtByTbA68HNusfUFRPNPXomqPOWtNUBEBMB6idhfjll6I+GQ8Q5FiCdu2K hQPeQH6Iq51/XKms8Vf6TJzSUJIyj5CbWSMMpUOIFzHdPPbwGWzV1kRhRk/W4mmKlmkF 86IJMfdLULaJJeW9CLKbgwzQ2mw5JOjqdA/jo= MIME-Version: 1.0 Received: by 10.229.224.149 with SMTP id io21mr151467qcb.81.1316238350061; Fri, 16 Sep 2011 22:45:50 -0700 (PDT) Received: by 10.229.168.132 with HTTP; Fri, 16 Sep 2011 22:45:50 -0700 (PDT) In-Reply-To: <4E742E5C.2010900@freebsd.org> References: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> <4E742E5C.2010900@freebsd.org> Date: Sat, 17 Sep 2011 01:45:50 -0400 Message-ID: From: Rich To: Julian Elischer Content-Type: text/plain; charset=ISO-8859-1 Cc: Jason Usher , "freebsd-fs@freebsd.org" Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 05:45:51 -0000 The 9201-16e or similar would do fine for 6 Gbit SAS rates if you don't want hardware RAID. To get full bandwidth SATA 3 from 48/96 drives, that's 750 MB/s * 8/10 (8 data bytes per 10 bytes transmitted raw - SATA 3 does an 8b10b encoding) ~ 600 MB/s * 48/96 = 28800/57600 MB/s PCIe 2.x is 500 MB/s per lane, so that's 57/114 lanes of PCIe 2.x to do full bandwidth. And that all assumes you have sufficient memory bandwidth anyway. - Rich From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 07:34:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88C5F106564A for ; Sat, 17 Sep 2011 07:34:09 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 126278FC17 for ; Sat, 17 Sep 2011 07:34:08 +0000 (UTC) Received: from [192.92.129.101] ([192.92.129.101]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p8H7Xwa5013234 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 17 Sep 2011 10:34:04 +0300 (EEST) (envelope-from daniel@digsys.bg) Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: text/plain; charset=iso-8859-1 From: Daniel Kalchev In-Reply-To: Date: Sat, 17 Sep 2011 10:34:12 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <72A6ABD6-F6FD-4563-AB3F-6061E3DD9FBF@digsys.bg> References: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> <4E742E5C.2010900@freebsd.org> To: Rich X-Mailer: Apple Mail (2.1244.3) Cc: Jason Usher , "freebsd-fs@freebsd.org" Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 07:34:09 -0000 On Sep 17, 2011, at 08:45 , Rich wrote: > The 9201-16e or similar would do fine for 6 Gbit SAS rates if you > don't want hardware RAID. >=20 > To get full bandwidth SATA 3 from 48/96 drives, that's 750 MB/s * 8/10 > (8 data bytes per 10 bytes transmitted raw - SATA 3 does an 8b10b > encoding) ~ 600 MB/s * 48/96 =3D 28800/57600 MB/s >=20 There is not single magnetic drive on the market that can saturate SATA2 = (300 Mbps), yet. Most can't match even SATA1 (150 MBps). You don't need = that much dedicated bandwidth for drives. If you intend to have 48/96 SSDs, then that is another story, but then I = am doubtful a "PC" architecture can handle that much data either. If you are looking gat IOPS rather than raw throughput, then by all = means consider (more) SSDs. You may also consider using 2.5" SAS drives = that will be much more compact and less power hungry. The LSI2008 = controllers will manage both SATA and SAS drives (at the same time, in = the same zpool). Memory is much more expensive than SSDs for L2ARC and if your workload = permits it (lots of repeated small reads), larger L2ARC will help a lot. = It will also help if you have huge spool or if you enable dedup etc. = Just populate as much RAM as the server can handle and then add L2ARC = (read-optimized). Daniel= From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 12:27:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C867C106566B for ; Sat, 17 Sep 2011 12:27:43 +0000 (UTC) (envelope-from lytboris@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4B3E08FC08 for ; Sat, 17 Sep 2011 12:27:43 +0000 (UTC) Received: by fxg9 with SMTP id 9so3388292fxg.13 for ; Sat, 17 Sep 2011 05:27:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=zO8aGT/HWat0FC7PxgALS+ff2k9EWod0wrEr9CGA0Sg=; b=A0GuITrBrQZbUov4tKfcDgFAp+v+i7Sl3+t/qgvSueuW3UZ1NkttiImzm/2xsAT7Gi UOpCs9qPRCZJ7NvQJvfZe1wFUXdtDk0aTBl31pULJAWY9hAdUN8AZDpsYlGQu9tW6h+u J/KN6Md/ep4LDVTz8C7C4Rx6KcyS8gIPn+zaE= MIME-Version: 1.0 Received: by 10.223.48.211 with SMTP id s19mr1070219faf.33.1316260941554; Sat, 17 Sep 2011 05:02:21 -0700 (PDT) Received: by 10.223.106.15 with HTTP; Sat, 17 Sep 2011 05:02:21 -0700 (PDT) Date: Sat, 17 Sep 2011 16:02:21 +0400 Message-ID: From: Lytochkin Boris To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: [ZFS] starving reads while idle disks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 12:27:43 -0000 Hi. My ZFS RAID10 pool shows suprisingly starving read performance. Being built on 18 15k SAS drives it shows 1-2Mb/s or less when performing reads (tar, rsync). While runnig `tar cf - . >/dev/null', `vmstat -i' shows ~200 interrupts per second for mpt. If I enable scrub on that pool, interrupts bump upto 5k ips resulting 12Mb/s scrub speed and disks busy percentage raises upto 100%. No warnings are shown in logs in both cases. It is 8.2-STABLE built on Jun 9, 2011. System build on Sept 6, 2001 performs the same way. tar stucks in `zio->cv_io)' state for all time. This read speed is too low for such environment for sure. I spent some time trying to dig & fix that but still no luck. So debug/tune suggestions are greatly welcome. >zpool status pool: tank state: ONLINE scan: scrub canceled on Sat Sep 17 13:04:53 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 da4 ONLINE 0 0 0 da12 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 da5 ONLINE 0 0 0 da13 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 da6 ONLINE 0 0 0 da14 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 da7 ONLINE 0 0 0 da15 ONLINE 0 0 0 mirror-6 ONLINE 0 0 0 da8 ONLINE 0 0 0 da16 ONLINE 0 0 0 mirror-7 ONLINE 0 0 0 da9 ONLINE 0 0 0 da17 ONLINE 0 0 0 mirror-8 ONLINE 0 0 0 da10 ONLINE 0 0 0 da18 ONLINE 0 0 0 Running: >tar cf - . >/dev/null load: 0.16 cmd: bsdtar 6360 [zio->io_cv)] 0.88r 0.00u 0.05s 0% 2492k iostat: device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id da0 19.9 0.0 201.5 0.0 0 3.5 6 0 0 0 0 100 da1 17.9 0.0 185.6 0.0 0 7.5 9 da2 20.9 0.0 202.5 0.0 0 3.9 5 da3 34.8 0.0 289.5 0.0 0 4.2 7 da4 1.0 0.0 13.9 0.0 0 8.4 1 da5 17.9 0.0 9.0 0.0 0 0.8 1 da6 59.7 0.0 576.6 0.0 0 4.8 11 da7 99.5 0.0 1235.8 0.0 0 7.2 32 da8 38.8 0.0 404.5 0.0 0 4.3 7 da9 12.9 0.0 157.2 0.0 3 5.6 6 da10 32.8 0.0 425.4 0.0 0 4.8 8 da12 20.9 0.0 37.8 0.0 0 2.1 4 da13 2.0 0.0 27.9 0.0 0 1.8 0 da14 55.7 0.0 463.7 0.0 0 4.2 11 da15 114.4 0.0 1439.7 0.0 0 7.9 34 da16 38.8 0.0 344.3 0.0 0 5.0 9 da17 51.7 0.0 395.0 0.0 2 5.0 12 da18 62.7 0.0 632.3 0.0 0 5.0 14 ZFS-related sysctls: vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 26866176 vfs.zfs.mfu_ghost_metadata_lsize: 523702272 vfs.zfs.mfu_ghost_size: 550568448 vfs.zfs.mfu_data_lsize: 2825181184 vfs.zfs.mfu_metadata_lsize: 158866432 vfs.zfs.mfu_size: 3115283456 vfs.zfs.mru_ghost_data_lsize: 91758592 vfs.zfs.mru_ghost_metadata_lsize: 51310592 vfs.zfs.mru_ghost_size: 143069184 vfs.zfs.mru_data_lsize: 10066080768 vfs.zfs.mru_metadata_lsize: 915701760 vfs.zfs.mru_size: 23198790656 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 7094272 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 1 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.arc_meta_limit: 24730226688 vfs.zfs.arc_meta_used: 15541743536 vfs.zfs.arc_min: 12365113344 vfs.zfs.arc_max: 98920906752 vfs.zfs.dedup.prefetch: 1 vfs.zfs.mdcomp_disable: 0 vfs.zfs.write_limit_override: 0 vfs.zfs.write_limit_inflated: 309157871616 vfs.zfs.write_limit_max: 12881577984 vfs.zfs.write_limit_min: 33554432 vfs.zfs.write_limit_shift: 3 vfs.zfs.no_write_throttle: 1 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.check_hostid: 1 vfs.zfs.recover: 0 vfs.zfs.txg.synctime_ms: 1000 vfs.zfs.txg.timeout: 5 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 96 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 28 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 kstat.zfs.misc.xuio_stats.onloan_read_buf: 0 kstat.zfs.misc.xuio_stats.onloan_write_buf: 0 kstat.zfs.misc.xuio_stats.read_buf_copied: 0 kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0 kstat.zfs.misc.xuio_stats.write_buf_copied: 0 kstat.zfs.misc.xuio_stats.write_buf_nocopy: 28044 kstat.zfs.misc.zfetchstats.hits: 17756 kstat.zfs.misc.zfetchstats.misses: 950 kstat.zfs.misc.zfetchstats.colinear_hits: 0 kstat.zfs.misc.zfetchstats.colinear_misses: 950 kstat.zfs.misc.zfetchstats.stride_hits: 17329 kstat.zfs.misc.zfetchstats.stride_misses: 0 kstat.zfs.misc.zfetchstats.reclaim_successes: 0 kstat.zfs.misc.zfetchstats.reclaim_failures: 950 kstat.zfs.misc.zfetchstats.streams_resets: 1 kstat.zfs.misc.zfetchstats.streams_noresets: 427 kstat.zfs.misc.zfetchstats.bogus_streams: 0 kstat.zfs.misc.arcstats.hits: 152508694 kstat.zfs.misc.arcstats.misses: 2951255 kstat.zfs.misc.arcstats.demand_data_hits: 25279174 kstat.zfs.misc.arcstats.demand_data_misses: 2001333 kstat.zfs.misc.arcstats.demand_metadata_hits: 127229278 kstat.zfs.misc.arcstats.demand_metadata_misses: 946992 kstat.zfs.misc.arcstats.prefetch_data_hits: 0 kstat.zfs.misc.arcstats.prefetch_data_misses: 14 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 242 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 2916 kstat.zfs.misc.arcstats.mru_hits: 13748947 kstat.zfs.misc.arcstats.mru_ghost_hits: 0 kstat.zfs.misc.arcstats.mfu_hits: 138759522 kstat.zfs.misc.arcstats.mfu_ghost_hits: 0 kstat.zfs.misc.arcstats.allocated: 9532012 kstat.zfs.misc.arcstats.deleted: 10 kstat.zfs.misc.arcstats.stolen: 0 kstat.zfs.misc.arcstats.recycle_miss: 0 kstat.zfs.misc.arcstats.mutex_miss: 0 kstat.zfs.misc.arcstats.evict_skip: 0 kstat.zfs.misc.arcstats.evict_l2_cached: 0 kstat.zfs.misc.arcstats.evict_l2_eligible: 0 kstat.zfs.misc.arcstats.evict_l2_ineligible: 2048 kstat.zfs.misc.arcstats.hash_elements: 2959306 kstat.zfs.misc.arcstats.hash_elements_max: 2960693 kstat.zfs.misc.arcstats.hash_collisions: 16094630 kstat.zfs.misc.arcstats.hash_chains: 864066 kstat.zfs.misc.arcstats.hash_chain_max: 11 kstat.zfs.misc.arcstats.p: 49460453376 kstat.zfs.misc.arcstats.c: 98920906752 kstat.zfs.misc.arcstats.c_min: 12365113344 kstat.zfs.misc.arcstats.c_max: 98920906752 kstat.zfs.misc.arcstats.size: 28440168752 kstat.zfs.misc.arcstats.hdr_size: 967688072 kstat.zfs.misc.arcstats.data_size: 26321250304 kstat.zfs.misc.arcstats.other_size: 1151230376 kstat.zfs.misc.arcstats.l2_hits: 0 kstat.zfs.misc.arcstats.l2_misses: 0 kstat.zfs.misc.arcstats.l2_feeds: 0 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_read_bytes: 0 kstat.zfs.misc.arcstats.l2_write_bytes: 0 kstat.zfs.misc.arcstats.l2_writes_sent: 0 kstat.zfs.misc.arcstats.l2_writes_done: 0 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_free_on_write: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 0 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0 kstat.zfs.misc.arcstats.l2_write_in_l2: 0 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 1 kstat.zfs.misc.arcstats.l2_write_full: 0 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0 kstat.zfs.misc.arcstats.l2_write_pios: 0 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0 kstat.zfs.misc.vdev_cache_stats.delegations: 32536 kstat.zfs.misc.vdev_cache_stats.hits: 508797 kstat.zfs.misc.vdev_cache_stats.misses: 267216 From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 15:59:10 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F37A1065670; Sat, 17 Sep 2011 15:59:10 +0000 (UTC) (envelope-from jeremie@le-hen.org) Received: from smtp5-g21.free.fr (unknown [IPv6:2a01:e0c:1:1599::14]) by mx1.freebsd.org (Postfix) with ESMTP id 5BFC58FC19; Sat, 17 Sep 2011 15:59:07 +0000 (UTC) Received: from endor.tataz.chchile.org (unknown [82.233.239.98]) by smtp5-g21.free.fr (Postfix) with ESMTP id 1819CD480D5; Sat, 17 Sep 2011 17:59:01 +0200 (CEST) Received: from felucia.tataz.chchile.org (felucia.tataz.chchile.org [192.168.1.9]) by endor.tataz.chchile.org (Postfix) with ESMTP id 3DDAA34081; Sat, 17 Sep 2011 15:59:00 +0000 (UTC) Received: by felucia.tataz.chchile.org (Postfix, from userid 1000) id 2D0D0A1349; Sat, 17 Sep 2011 15:59:00 +0000 (UTC) Date: Sat, 17 Sep 2011 17:59:00 +0200 From: Jeremie Le Hen To: Martin Matuska Message-ID: <20110917155859.GA8243@felucia.tataz.chchile.org> References: <20110905195458.GA7863@felucia.tataz.chchile.org> <4E65393F.9070401@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E65393F.9070401@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org, Jeremie Le Hen Subject: Re: Difficulties to use ZFS root: ROOT MOUNT ERROR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 15:59:10 -0000 Hi, On Mon, Sep 05, 2011 at 11:03:59PM +0200, Martin Matuska wrote: > On 5. 9. 2011 21:54, Jeremie Le Hen wrote: > > Hi list, > > > > I've followed the instructions documented here: > > http://wiki.freebsd.org/RootOnZFS/ZFSBootPartition > > > > The kernel starts correctly so this rules out any problem regarding > > boot0, zfsboot and ZFS loader. > > > > But when the kernel tries to mount the root filesystem, it fails with > > the following output: > > > > % Trying to mount root from zfs:zroot > > % ROOT MOUNT ERROR: > > % If you have invalid mount options, reboot, and first try the following from > > % the loader prompt: > > % > > % set vfs.root.mountfrom.options=rw > > % > > % and then remove the invalid mount options from /etc/fstab. > > % > > % Loader variables: > > % vfs.root.mountfrom=zfs:zroot > > % vfs.root.mountfrom.options=rw > > > > >From a netboot'd FreeBSD: > > > > # zfs import zroot > > % # zpool get bootfs zroot > > % NAME PROPERTY VALUE SOURCE > > % zroot bootfs zroot local > > % > > % # zfs list -o name,canmount,mountpoint > > % NAME CANMOUNT MOUNTPOINT > > % zroot on legacy > > % zroot/tmp on /tmp > > % zroot/usr on /usr > > % zroot/usr/home on /usr/home > > % zroot/usr/ports on /usr/ports > > % zroot/usr/ports/distfiles on /usr/ports/distfiles > > % zroot/usr/ports/packages on /usr/ports/packages > > % zroot/usr/src on /usr/src > > % zroot/usr/src8 on /usr/src8 > > % zroot/var on /var > > % zroot/var/crash on /var/crash > > % zroot/var/db on /var/db > > % zroot/var/db/pkg on /var/db/pkg > > % zroot/var/empty on /var/empty > > % zroot/var/log on /var/log > > % zroot/var/mail on /var/mail > > % zroot/var/run on /var/run > > % zroot/var/tmp on /var/tmp > > % # zfs export zroot > > > > /boot/zfs/zpool.cache exists in the zroot filesystem: > > > > % # zpool import -R /mnt zroot > > % # zfs set mountpoint=/ zroot > > % # ls -l /mnt/boot/zfs/zpool.cache > > % -rw-r--r-- 1 root wheel 924 Sep 5 07:31 /mnt/boot/zfs/zpool.cache > > % # grep zfs /mnt/boot/loader.conf /mnt/etc/rc.conf > > % /mnt/boot/loader.conf:zfs_load="YES" > > % /mnt/boot/loader.conf:vfs.root.mountfrom="zfs:zroot" > > % /mnt/etc/rc.conf:zfs_enable="YES" > > > > > > Any idea why this error occurs? > > > > Thanks > It might be a problem in the zpool.cache. > If you read the zpool(8) manpage properly, you will find this: > > -R root > Equivalent to "-o cachefile=none,altroot=root" > > If you mount a pool with an alternate root and want to update the > cachefile, you have to explicitly state the cachefile. > (e.g. zpool import -o altroot=/mnt -o cachefile=/tmp/zpool.cache zroot) > Second, you should not have an exported pool for booting (from viewpoint > of target system's zfs.cache). > Third, you don't need a legacy mount for zroot. You can leave it to "/" > but you don't have to. > I personally prefer having everyting one level deeper (e.g. pool/root, > pool/root/var, etc.). > > Therefore I suggest: > zpool import -o altroot=/mnt -o cachefile=/tmp/zpool.cache zroot > zfs set mountpoint=/ zroot > cp /tmp/zpool.cache /mnt/boot/zfs/zpool.cache > shutdown -r now > (as you can see I have not exported the pool) I've just had the opportunity to migrate another server to ZFS. This time I followed your advice by "having everything one level deeper". That is zroot/root is "/". I've modified "zfs:mountfrom" in loader.conf(5) accordingly. obiwan:~# zpool get all zroot NAME PROPERTY VALUE SOURCE zroot size 147G - zroot capacity 2% - zroot altroot /mnt local zroot health ONLINE - zroot guid 12889954819379028468 default zroot version 28 default zroot bootfs zroot/root local zroot delegation on default zroot autoreplace off default zroot cachefile none local zroot failmode wait default zroot listsnapshots off default zroot autoexpand off default zroot dedupditto 0 default zroot dedupratio 1.00x - zroot free 143G - zroot allocated 3.69G - zroot readonly off - obiwan:~# zfs list -o name,mounted,canmount,mountpoint -r zroot | grep -v /jails NAME MOUNTED CANMOUNT MOUNTPOINT zroot no on none zroot/root yes on /mnt zroot/root/root yes on /mnt/root zroot/root/tmp yes on /mnt/tmp zroot/root/usr yes on /mnt/usr zroot/root/usr/local yes on /mnt/usr/local zroot/root/usr/obj yes on /mnt/usr/obj zroot/root/usr/pkgsrc yes on /mnt/usr/pkgsrc zroot/root/usr/pkgsrc/distfiles yes on /mnt/usr/pkgsrc/distfiles zroot/root/usr/ports yes on /mnt/usr/ports zroot/root/usr/ports/distfiles yes on /mnt/usr/ports/distfiles zroot/root/usr/ports/packages yes on /mnt/usr/ports/packages zroot/root/usr/src yes on /mnt/usr/src zroot/root/var yes on /mnt/var zroot/root/var/crash yes on /mnt/var/crash zroot/root/var/db yes on /mnt/var/db zroot/root/var/db/pkg yes on /mnt/var/db/pkg zroot/root/var/empty yes on /mnt/var/empty zroot/root/var/log yes on /mnt/var/log zroot/root/var/mail yes on /mnt/var/mail zroot/root/var/run yes on /mnt/var/run zroot/root/var/tmp yes on /mnt/var/tmp The kernel boots fine, it finds the root filesystem, but fails miserably when running rc.d scripts because deeper datasets are not mounted (/var, /usr, ...). FWIW, I escaped to DDB and typed "show mount". Besides /dev, / was indeed mounted from zoot/root and /tmp was /dev/md0 for an unknown reason. I've been fiddling this this for 3 hours this afternoon without luck. Does anyone have an idea on this please? Thanks. Regards, -- Jeremie Le Hen Men are born free and equal. Later on, they're on their own. Jean Yanne From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 17:54:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1B92106566B for ; Sat, 17 Sep 2011 17:54:38 +0000 (UTC) (envelope-from dpd@bitgravity.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id B27F08FC17 for ; Sat, 17 Sep 2011 17:54:38 +0000 (UTC) Received: by iadk27 with SMTP id k27so5670261iad.13 for ; Sat, 17 Sep 2011 10:54:38 -0700 (PDT) Received: by 10.68.25.104 with SMTP id b8mr1203398pbg.139.1316282077921; Sat, 17 Sep 2011 10:54:37 -0700 (PDT) Received: from 173-13-188-41-sfba.hfc.comcastbusiness.net (173-13-188-41-sfba.hfc.comcastbusiness.net. [173.13.188.41]) by mx.google.com with ESMTPS id ml4sm1664622pbc.0.2011.09.17.10.54.35 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 17 Sep 2011 10:54:36 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: "David P. Discher" In-Reply-To: Date: Sat, 17 Sep 2011 10:54:34 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6B437FA4-B422-4BE7-BDF5-F90717F3865B@bitgravity.com> References: To: Lytochkin Boris X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] starving reads while idle disks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 17:54:39 -0000 Do you see the same read-stravation when writing the tar to a file ? = (possibly outside the zpool). I have anecdotal suspicion that /dev/null has some performance hit of = blocking or locking.=20 -dpd On Sep 17, 2011, at 5:02 AM, Lytochkin Boris wrote: > While runnig `tar cf - . >/dev/null', `vmstat -i' shows ~200 > interrupts per second for mpt. If I enable scrub on that pool, > interrupts bump upto 5k ips resulting 12Mb/s scrub speed and disks > busy percentage raises upto 100%. No warnings are shown in logs in > both cases. From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 18:14:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02546106566C for ; Sat, 17 Sep 2011 18:14:55 +0000 (UTC) (envelope-from lytboris@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8F5A28FC0A for ; Sat, 17 Sep 2011 18:14:53 +0000 (UTC) Received: by fxg9 with SMTP id 9so3624875fxg.13 for ; Sat, 17 Sep 2011 11:14:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=D7pX2ZciewCoDqC/df7VmKv4UcMsVm9AX9JR2B9Coxo=; b=nfonVJa/D1rSDBnq5pHeJCo4vwYd1auDk3vWi07QZrf/BzIE3+CeMUdtqs2JKZhf79 0aSt+qmSVpdNM904EMpcDyX5KtFlRpRomq0sUYgyskYFjrh4X8U+aa4AWIzWmk9U5rA3 Lkqy0gg+q+Wlk8l6Lv3aRhCMPmsaRWKEuCyCk= MIME-Version: 1.0 Received: by 10.223.17.3 with SMTP id q3mr1593897faa.71.1316283292674; Sat, 17 Sep 2011 11:14:52 -0700 (PDT) Received: by 10.223.106.15 with HTTP; Sat, 17 Sep 2011 11:14:52 -0700 (PDT) In-Reply-To: <6B437FA4-B422-4BE7-BDF5-F90717F3865B@bitgravity.com> References: <6B437FA4-B422-4BE7-BDF5-F90717F3865B@bitgravity.com> Date: Sat, 17 Sep 2011 22:14:52 +0400 Message-ID: From: Lytochkin Boris To: "David P. Discher" Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] starving reads while idle disks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 18:14:55 -0000 Hi. > Do you see the same read-stravation when writing the tar to a file ? (possibly outside the zpool). Yep. > I have anecdotal suspicion that /dev/null has some performance hit of blocking or locking. No-no. Every program that tries to read from ZFS faces this issue actually. I found something more interesting. Lets presume I have 10 big dirs in . Issuing tar|dd command on separate dirs (so spawning 10 "threads") simultaneously will result in 10x faster reads cumulatively (I saw 15Mb/s in zpool iostat), and disk load may be as high as 70%. So I think there is some read throttling thing that limits read speed per read(). But still no clue where to find this bottleneck. -- Wbr, Boris. From owner-freebsd-fs@FreeBSD.ORG Sat Sep 17 19:58:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 36EFD1065673 for ; Sat, 17 Sep 2011 19:58:18 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id D50848FC13 for ; Sat, 17 Sep 2011 19:58:17 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p8HJwGJ7008633; Sat, 17 Sep 2011 14:58:16 -0500 (CDT) Date: Sat, 17 Sep 2011 14:58:16 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jason Usher In-Reply-To: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> Message-ID: References: <1316222526.31565.YahooMailNeo@web121205.mail.ne1.yahoo.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-649254455-1316289496=:1973" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sat, 17 Sep 2011 14:58:16 -0500 (CDT) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS obn FreeBSD hardware model for 48 or 96 sata3 paths... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2011 19:58:18 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---559023410-649254455-1316289496=:1973 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT On Fri, 16 Sep 2011, Jason Usher wrote: > > So I will boot of a pair of mirrored SSDs formatted UFS2 - easy.  > But I would also like to spec and use a ZIL+L2ARC and am not sure > where to go ... the system will be VERY write-biased and use a LOT > of inodes - so lots of scanning of large dirs with lots of inodes > and writing data.  Something like 400 million inodes on a filesystem > with an average file size of 150 KB. 150KB is a relatively small file size given that the default zfs blocksize is 128KB. With so many files you should definitely max out RAM first before using SSDs as a l2arc. It is important to recognize that the ARC cache is not populated until data has been read. The cache does not help unless the data has been accessed several times. You will want to make sure that all metada and directories are cached in RAM. Depending on how the files are used/accessed you might even want to intentionally disable caching of file data. Are the writes expected to be synchronous writes, or are they asynchronous? Are the writes expected to be primarily sequential (e.g. whole file), or is data accessed/updated in place? Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ ---559023410-649254455-1316289496=:1973--