From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 08:38:40 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E8A51065674; Sun, 19 Jun 2011 08:38:40 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 082A38FC13; Sun, 19 Jun 2011 08:38:40 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5J8cdrO076385; Sun, 19 Jun 2011 08:38:39 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5J8cd6Y076381; Sun, 19 Jun 2011 08:38:39 GMT (envelope-from jh) Date: Sun, 19 Jun 2011 08:38:39 GMT Message-Id: <201106190838.p5J8cd6Y076381@freefall.freebsd.org> To: mjacob@freebsd.org, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/106030: [ufs] [panic] panic in ufs from geom when a dead disk is invalidated X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 08:38:40 -0000 Synopsis: [ufs] [panic] panic in ufs from geom when a dead disk is invalidated State-Changed-From-To: feedback->closed State-Changed-By: jh State-Changed-When: Sun Jun 19 08:38:39 UTC 2011 State-Changed-Why: Feedback timeout. http://www.freebsd.org/cgi/query-pr.cgi?pr=106030 From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 08:45:02 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E72A11065675; Sun, 19 Jun 2011 08:45:02 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id B992B8FC15; Sun, 19 Jun 2011 08:45:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5J8j2iu084590; Sun, 19 Jun 2011 08:45:02 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5J8j2vF084586; Sun, 19 Jun 2011 08:45:02 GMT (envelope-from jh) Date: Sun, 19 Jun 2011 08:45:02 GMT Message-Id: <201106190845.p5J8j2vF084586@freefall.freebsd.org> To: michael.reynolds@gmail.com, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/116170: [panic] Kernel panic when mounting /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 08:45:03 -0000 Synopsis: [panic] Kernel panic when mounting /tmp State-Changed-From-To: feedback->closed State-Changed-By: jh State-Changed-When: Sun Jun 19 08:45:02 UTC 2011 State-Changed-Why: It's unclear if this can be still reproduced. http://www.freebsd.org/cgi/query-pr.cgi?pr=116170 From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 09:21:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5959F106564A for ; Sun, 19 Jun 2011 09:21:18 +0000 (UTC) (envelope-from patrick.proniewski@univ-lyon2.fr) Received: from smtp.univ-lyon2.fr (smtp.univ-lyon2.fr [159.84.143.21]) by mx1.freebsd.org (Postfix) with ESMTP id 9E9248FC12 for ; Sun, 19 Jun 2011 09:21:17 +0000 (UTC) Received: from ru.univ-lyon2.fr (localhost [127.0.0.1]) by smtp.univ-lyon2.fr (Postfix) with ESMTP id 80E4914D978 for ; Sun, 19 Jun 2011 11:03:50 +0200 (CEST) X-Virus-Scanned: amavisd-new at univ-lyon2.fr Received: from amavis.at.univ-lyon2.fr ([127.0.0.1]) by ru.univ-lyon2.fr (smtp.univ-lyon2.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z-eltMHguZRd for ; Sun, 19 Jun 2011 11:03:48 +0200 (CEST) Received: from co3 (co3.univ-lyon2.fr [159.84.143.188]) by smtp.univ-lyon2.fr (Postfix) with ESMTP for ; Sun, 19 Jun 2011 11:03:48 +0200 (CEST) Received: from [10.48.130.125] ([92.90.21.1]) by co3.univ-lyon2.fr for ;Sun, 19 Jun 2011 11:03:53 +0200 (CEST) From: Patrick Proniewski Content-Type: multipart/signed; boundary=Apple-Mail-2--545461825; protocol="application/pkcs7-signature"; micalg=sha1 Date: Sun, 19 Jun 2011 11:03:39 +0200 Message-Id: <5084282.22648.1308474239008.JavaMail.root@co3> To: FreeBSD Filesystems Mime-Version: 1.0 X-Mailer: Apple Mail (2.1084) X-ContactOffice-Account: main:2117681 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS, noexec and snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 09:21:18 -0000 --Apple-Mail-2--545461825 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Hello, I'm using ZFS with periodic snapshot creation, so users can easily "go = back in time" in their data by browsing .zfs/snapshot/. Every ZFS volume is made with noexec, but I've just find out that the = automount of .zfs/snapshot/* is not made with the noexec option. Is there something that needs explicit configuration? # zfs list -t all -r -o mountpoint,name,exec tank/user/foobar MOUNTPOINT NAME EXEC /user/foobar tank/user/foobar off - tank/user/foobar@weekly.3 on - tank/user/foobar@weekly.2 on - tank/user/foobar@weekly.1 on - tank/user/foobar@weekly.0 on # zfs get all tank/user NAME PROPERTY VALUE SOURCE tank/user type filesystem - tank/user creation Tue Feb 22 14:17 2011 - tank/user used 26.8G - tank/user available 93.5G - tank/user referenced 188K - tank/user compressratio 1.08x - tank/user mounted yes - tank/user quota none default tank/user reservation none default tank/user recordsize 128K default tank/user mountpoint /user local tank/user sharenfs off default tank/user checksum on default tank/user compression gzip inherited from = tank tank/user atime on default tank/user devices on default tank/user exec off inherited from = tank tank/user setuid on default tank/user readonly off default tank/user jailed off default tank/user snapdir hidden default tank/user aclmode groupmask default tank/user aclinherit restricted default tank/user canmount on default tank/user shareiscsi off default tank/user xattr off temporary tank/user copies 1 default tank/user version 4 - tank/user utf8only off - tank/user normalization none - tank/user casesensitivity sensitive - tank/user vscan off default tank/user nbmand off default tank/user sharesmb off default tank/user refquota none default tank/user refreservation none default tank/user primarycache all default tank/user secondarycache all default tank/user usedbysnapshots 0 - tank/user usedbydataset 188K - tank/user usedbychildren 26.8G - tank/user usedbyrefreservation 0 - Patrick PRONIEWSKI --=20 Administrateur Syst=E8me - DSI - Universit=E9 Lumi=E8re Lyon 2 --Apple-Mail-2--545461825-- From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 14:20:12 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E26171065670 for ; Sun, 19 Jun 2011 14:20:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C7FF88FC18 for ; Sun, 19 Jun 2011 14:20:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5JEKBY6091398 for ; Sun, 19 Jun 2011 14:20:11 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5JEKBIQ091397; Sun, 19 Jun 2011 14:20:11 GMT (envelope-from gnats) Date: Sun, 19 Jun 2011 14:20:11 GMT Message-Id: <201106191420.p5JEKBIQ091397@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Cc: Subject: Re: kern/154930: [zfs] cannot delete/unlink file from full volume -> ENOSPC X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: soralx@cydem.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 14:20:12 -0000 The following reply was made to PR kern/154930; it has been noted by GNATS. From: To: , Cc: Subject: Re: kern/154930: [zfs] cannot delete/unlink file from full volume -> ENOSPC Date: Sun, 19 Jun 2011 06:54:26 -0700 All, I encountered a similar snag, only worse, with no solution. The server has a 7 TB ZFS volume, which consists of 5 WD2001FASS connected to PERC 5/i in RAID5 (originally there were 4 disks, but the RAID was expanded by adding another disk and rebuilding). On June 12th (the date of OS rebuild, both world and kernel; the last update was on June 6), the pool was at v15, and had 12 GB free. On June 14, the pool already had 0 bytes free (found out from periodic daily run output email). It is unknown whether the FS was accessed at all during these two days; however, `find ./ -newerBt 2011-06-10 -print` returns just one small file dated 2011-06-11, and `find ./ -newermt 2011-06-10` returns ~20 files with total size <1 GB (BTW, these are backups, and nobody could have touched them, so it's a mystery why they have a recent modified time). So, the question is: where could have 12 GB suddenly disappear to? Further, on June 18th, the pool was mounted when a `zpool upgrade tst` command was issued. The upgrade to v28 succeeded, but then I found out that deleting files, large or small, is impossible: # rm ./qemu0.raw rm: ./qemu0.raw: No space left on device # truncate -s0 ./qemu0.raw truncate: ./qemu0.raw: No space left on device # cat /dev/null > ./qemu0.raw ./qemu0.raw: No space left on device. Snapshots, compression, or deduplication have never been used on the volume. Also, these messages appear in dmesg: Solaris: WARNING: metaslab_free_dva(): bad DVA 0:5978620460544 Solaris: WARNING: metaslab_free_dva(): bad DVA 0:5978620461568 Solaris: WARNING: metaslab_free_dva(): bad DVA 0:5993168926208 Contrary to what the pool's name might suggest, this is not test storage, but has valuable data on it. Help! >Environment: System: FreeBSD cydem.org 8.2-STABLE FreeBSD 8.2-STABLE #0: Sun Jun 12 07:55:32 PDT 2011 soralx@cydem.org:/usr/obj/usr/src/sys/CYDEM amd64 `df`: Filesystem 1K-blocks Used Avail Capacity Mounted on tst 7650115271 7650115271 0 100% /stor1-tst `zpool list`: NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tst 5.44T 5.36T 80.6G 98% 1.00x ONLINE - `zfs list`: NAME USED AVAIL REFER MOUNTPOINT tst 7.12T 0 7.12T /stor1-tst `zpool status`: pool: tst state: ONLINE scan: scrub in progress since Sat Jun 18 06:32:37 2011 1.82T scanned out of 5.36T at 350M/s, 2h56m to go 0 repaired, 34.02% done config: NAME STATE READ WRITE CKSUM tst ONLINE 0 0 0 mfid1 ONLINE 0 0 0 errors: No known data errors [this completed with 'scrub repaired 0 in 5h47m with 0 errors' at 133% done, i.e. it scrubbed all 7.12 TB]. -- [SorAlx] ridin' VN2000 Classic LT From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 18:42:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C4CA106566B for ; Sun, 19 Jun 2011 18:42:58 +0000 (UTC) (envelope-from mlmichael70@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id D33BD8FC12 for ; Sun, 19 Jun 2011 18:42:57 +0000 (UTC) Received: by wyb33 with SMTP id 33so2839166wyb.13 for ; Sun, 19 Jun 2011 11:42:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=/7+5oSVtaadT0Vwp9Z2lcinBOfsknWd0Vs6/PM1MdyI=; b=tZd+N7Tg4CfiiBH+MUMCvX9tMkTnOA4gnR+jOQTVlNuG6PwBdy0U4ANET3zBqZb8CR ZpyXtetr0fhfWK2OIQxaz6etucEbUstFLqk13f7mKDcxDy8Gz5N0IoE0uNaOZikmRw+e Au0LDxubU2fi1PQ1EweIJFOlhP1AR1HefmcBE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=gLVSpr2BITEgvz2CL6RPGU2tReB8OaBNxQgOa4jrYzVY1M4lTHCuwz1nQ4YLOiTKfl Id7E1zo4thSUhUhhf0dZPK6aZPdxxVv5xo6u+SO6L222UYK1DAvSWAT1FlMkY3PhXtPG ZvlgDazyqU1R5VKWdiDyv54H8JDBs2IJnIxds= Received: by 10.227.182.74 with SMTP id cb10mr4120890wbb.48.1308507452381; Sun, 19 Jun 2011 11:17:32 -0700 (PDT) Received: from prime.nonspace (nat66.mia.three.co.uk [217.171.129.66]) by mx.google.com with ESMTPS id eq4sm2573416wbb.37.2011.06.19.11.17.31 (version=SSLv3 cipher=OTHER); Sun, 19 Jun 2011 11:17:31 -0700 (PDT) Message-ID: <4DFE3D38.1040207@gmail.com> Date: Sun, 19 Jun 2011 19:17:28 +0100 From: Michael User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110606 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <5084282.22648.1308474239008.JavaMail.root@co3> In-Reply-To: <5084282.22648.1308474239008.JavaMail.root@co3> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS, noexec and snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 18:42:58 -0000 On 19/06/2011 10:03, Patrick Proniewski wrote: > > Every ZFS volume is made with noexec, but I've just find out that the automount of .zfs/snapshot/* is not made with the noexec option. > Just two days ago I was wondering why some of my snapshots are not visible in .zfs/snapshot/ after setting snapdir=visible. All of given datasets have the noexec property set on. I guess that is the answer then. Michael From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 21:11:43 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9330106566C; Sun, 19 Jun 2011 21:11:43 +0000 (UTC) (envelope-from roberto@keltia.freenix.fr) Received: from keltia.net (centre.keltia.net [IPv6:2a01:240:fe5c::41]) by mx1.freebsd.org (Postfix) with ESMTP id 713658FC15; Sun, 19 Jun 2011 21:11:43 +0000 (UTC) Received: from lonrach.keltia.net (lonrach.keltia.net [IPv6:2a01:240:fe5c:0:d69a:20ff:fed0:3a83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: roberto) by keltia.net (Postfix/TLS) with ESMTPSA id 8D8C5FF06; Sun, 19 Jun 2011 23:11:40 +0200 (CEST) Date: Sun, 19 Jun 2011 23:11:39 +0200 From: Ollivier Robert To: freebsd-fs@freebsd.org, "Justin T. Gibbs" , fs@FreeBSD.org Message-ID: <20110619211138.GB64389@lonrach.keltia.net> References: <4DF7C9E6.1030800@scsiguy.com> <20110615120024.GH1975@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110615120024.GH1975@garage.freebsd.pl> X-Operating-System: MacOS X / MBP 4,1 - FreeBSD 8.0 / T3500-E5520 Nehalem User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Subject: Re: [CFR][ZFS] Show "previous device location" for removed vdevs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 21:11:43 -0000 According to Pawel Jakub Dawidek: > > reports the device GUID and a "device was at" message as a user aid. This > > patch provides the same behavior when a device is removed post zpool > > mount/import. > > Sounds good. So have we officially forked ZFS on freebsd or not? -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 21:11:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9330106566C; Sun, 19 Jun 2011 21:11:43 +0000 (UTC) (envelope-from roberto@keltia.freenix.fr) Received: from keltia.net (centre.keltia.net [IPv6:2a01:240:fe5c::41]) by mx1.freebsd.org (Postfix) with ESMTP id 713658FC15; Sun, 19 Jun 2011 21:11:43 +0000 (UTC) Received: from lonrach.keltia.net (lonrach.keltia.net [IPv6:2a01:240:fe5c:0:d69a:20ff:fed0:3a83]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: roberto) by keltia.net (Postfix/TLS) with ESMTPSA id 8D8C5FF06; Sun, 19 Jun 2011 23:11:40 +0200 (CEST) Date: Sun, 19 Jun 2011 23:11:39 +0200 From: Ollivier Robert To: freebsd-fs@freebsd.org, "Justin T. Gibbs" , fs@FreeBSD.org Message-ID: <20110619211138.GB64389@lonrach.keltia.net> References: <4DF7C9E6.1030800@scsiguy.com> <20110615120024.GH1975@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110615120024.GH1975@garage.freebsd.pl> X-Operating-System: MacOS X / MBP 4,1 - FreeBSD 8.0 / T3500-E5520 Nehalem User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Subject: Re: [CFR][ZFS] Show "previous device location" for removed vdevs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 21:11:43 -0000 According to Pawel Jakub Dawidek: > > reports the device GUID and a "device was at" message as a user aid. This > > patch provides the same behavior when a device is removed post zpool > > mount/import. > > Sounds good. So have we officially forked ZFS on freebsd or not? -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From owner-freebsd-fs@FreeBSD.ORG Sun Jun 19 21:54:36 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3194106566C; Sun, 19 Jun 2011 21:54:36 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id 65CC28FC16; Sun, 19 Jun 2011 21:54:36 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 32A4C179D3E; Sun, 19 Jun 2011 23:54:35 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id pyuKRrgnmp21; Sun, 19 Jun 2011 23:54:32 +0200 (CEST) Received: from [10.9.8.1] (chello085216231078.chello.sk [85.216.231.78]) by mail.vx.sk (Postfix) with ESMTPSA id 39747179D34; Sun, 19 Jun 2011 23:54:30 +0200 (CEST) Message-ID: <4DFE7015.8020205@FreeBSD.org> Date: Sun, 19 Jun 2011 23:54:29 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: Matthias Andree References: <201102221550.p1MFo9Ld054161@freefall.freebsd.org> In-Reply-To: <201102221550.p1MFo9Ld054161@freefall.freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=windows-1250 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@FreeBSD.org Subject: Re: kern/154930: [zfs] cannot delete/unlink file from full volume -> ENOSPC X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 21:54:36 -0000 Some more information on this: Here is a OpenSolaris forums entry: http://opensolaris.org/jive/thread.jspa?threadID=62037 This is described in Illumos issues as well: https://www.illumos.org/issues/412 This is quite a serious problem of the ZFS implementation - to delete a file or a snapshot, you may need free space! I am quoting Garett D'Amore: "The problem is that with copy-on-write, when you delete a file you must create a new copy of the meta data which means a new tree, ultimately. Its not possible to avoid this, but it is possible that we should ultimately be able to have a reserve, and only allow operations which will ultimately remove data once the pool is reduced to nothing more than this reserve." Before we have a "reserve" implementation, the only workaround I found is creating an empty dataset with some reserved space (which you can free in a case of emergency), e.g.: zfs create -o mountpoint=none -o reservation=10M tank/reserve Dòa 22.02.2011 16:50, Matthias Andree wrote / napísal(a): > The following reply was made to PR kern/154930; it has been noted by GNATS. > > From: Matthias Andree > To: Martin Matuska > Cc: bug-followup@FreeBSD.org > Subject: Re: kern/154930: [zfs] cannot delete/unlink file from full volume > -> ENOSPC > Date: Tue, 22 Feb 2011 16:06:47 +0100 > > Am 22.02.2011 15:30, schrieb Martin Matuska: > > I was unable to reproduce your problem. > > > > But I was able to reproduce a different situation: > > - on a dataset with one or more snapshots I am unable to delete files > > (ENOSPC) if the dataset got full. > > > > If this is your case, then: > > - deleting files does not unlink them from the snapshot. > > - you must first delete a specific snapshot (or all snapshots linking > > the file) to free space. > > Hi Martin, > > no snapshots were ever used on the zpools or zfs volumes -- I had > checked that previously. Only truncation of a 20 M file would allow me > to delete files. > > Best regards > Matthias > > -- > Matthias Andree > ports committer > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-fs@FreeBSD.ORG Mon Jun 20 06:12:58 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06CBC106564A for ; Mon, 20 Jun 2011 06:12:58 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id 95F1B8FC16 for ; Mon, 20 Jun 2011 06:12:57 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id C3840181D7C; Mon, 20 Jun 2011 08:12:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 30yMbWWY2ubR; Mon, 20 Jun 2011 08:12:51 +0200 (CEST) Received: from [10.9.8.1] (chello085216231078.chello.sk [85.216.231.78]) by mail.vx.sk (Postfix) with ESMTPSA id AA050181D5D; Mon, 20 Jun 2011 08:12:51 +0200 (CEST) Message-ID: <4DFEE4E3.2010509@FreeBSD.org> Date: Mon, 20 Jun 2011 08:12:51 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: "Justin T. Gibbs" References: <4DF7C406.1080903@scsiguy.com> In-Reply-To: <4DF7C406.1080903@scsiguy.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: fs@FreeBSD.org Subject: Re: [CFR][ZFS] Show removed devices by GUID in zpool output. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2011 06:12:58 -0000 I agree to this change, too. Dňa 14.06.2011 22:26, Justin T. Gibbs wrote / napísal(a): > The current behavior of zpool_vdev_name() is to report the vdev path > (e.g. /dev/da0) unless > a vdev has the ZPOOL_CONFIG_NOT_PRESENT attribute set. This attribute > is only set when > a vdev is not found during import/mount of a pool. The attached patch > also displays a vdev > by GUID if it cannot be opened post import or is marked removed (e.g. > via a GEOM orphan > event). > > The main motivation for this change is that vdev paths are not unique to > a physical leaf vdev. > It is easy to get into a situation where you need to "detach /dev/da0" > event though da0 is > an active member of the same pool in which a "previous da0" was once > removed. With > zpool_vdev_name() reporting the GUID, the user is equipped to provide an > unambiguous > command that represents their desired action. -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-fs@FreeBSD.ORG Mon Jun 20 11:07:01 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C99E1065674 for ; Mon, 20 Jun 2011 11:07:01 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5B52C8FC26 for ; Mon, 20 Jun 2011 11:07:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5KB71MY098104 for ; Mon, 20 Jun 2011 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5KB70KO098102 for freebsd-fs@FreeBSD.org; Mon, 20 Jun 2011 11:07:00 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 20 Jun 2011 11:07:00 GMT Message-Id: <201106201107.p5KB70KO098102@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2011 11:07:01 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/157929 fs [nfs] NFS slow read o kern/157728 fs [zfs] zfs (v28) incremental receive may leave behind t o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156933 fs [zfs] ZFS receive after read on readonly=on filesystem o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs f kern/130133 fs [panic] [zfs] 'kmem_map too small' caused by make clea o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi f kern/126703 fs [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/120210 fs [zfs] [panic] reboot after panic: solaris assert: arc_ o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 231 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 20 12:34:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 304481065670 for ; Mon, 20 Jun 2011 12:34:37 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E7F898FC13 for ; Mon, 20 Jun 2011 12:34:36 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAJE9/02DaFvO/2dsb2JhbABTG4Quowm3PZA1gSuDdYEKBJFekB0 X-IronPort-AV: E=Sophos;i="4.65,394,1304308800"; d="scan'208";a="124543029" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 20 Jun 2011 08:34:36 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 24C4BB3F0E; Mon, 20 Jun 2011 08:34:36 -0400 (EDT) Date: Mon, 20 Jun 2011 08:34:36 -0400 (EDT) From: Rick Macklem To: John Message-ID: <651692596.790474.1308573276137.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110610125939.GA69616@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: New NFS server stress test hang X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2011 12:34:37 -0000 John De wrote: > ----- Rick Macklem's Original Message ----- > > John De wrote: > > > ----- Rick Macklem's Original Message ----- > > > > John De wrote: > > > > > Hi, > > > > > > > > > > We've been running some stress tests of the new nfs server. > > > > > The system is at r222531 (head), 9 clients, two mounts each > > > > > to the server: > > > > > > > > > > mount_nfs -o > > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=2 > > > > > ${servera}:/vol/datsrc /c/$servera/vol/datsrc > > > > > mount_nfs -o > > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=0 > > > > > ${servera}:/vol/datgen /c/$servera/vol/datgen > > > > > > > > > > > > > > > The system is still up & responsive, simply no nfs services > > > > > are working. All (200) threads appear to be active, but not > > > > > doing anything. The debugger is not compiled into this kernel. > > > > > We can run any other tracing commands desired. We can also > > > > > rebuild the kernel with the debugger enabled for any kernel > > > > > debugging needed. > > > > > > > > > > --- long logs deleted --- > > > > > > > > How about a: > > > > ps axHlww <-- With the "H" we'll see what the nfsd server > > > > threads > > > > are up to > > > > procstat -kka > > > > > > > > Oh, and a couple of nfsstats a few seconds apart. It's what the > > > > counts > > > > are changing by that might tell us what is going on. (You can > > > > use > > > > "-z" > > > > to zero them out, if you have an nfsstat built from recent > > > > sources.) > > > > > > > > Also, does a new NFS mount attempt against the server do > > > > anything? > > > > > > > > Thanks in advance for help with this, rick > > > > > > Hi Rick, > > > > > > Here's the output. In general, the nfsd processes appear to be in > > > either nfsrvd_getcache(35 instances) or nfsrvd_updatecache(164) > > > sleeping on > > > "nfssrc". The server numbers don't appear to be moving. A > > > showmount > > > from a > > > client system works, but a mount does not (see below). > > > > Please try the attached patch and let me know if it helps. When I > > looked > > I found several places where the rc_flag variable was being fiddled > > without the > > mutex held. I suspect one of these resulted in the RC_LOCKED flag > > not > > getting cleared, so all the threads got stuck waiting on it. > > > > The patch is at: > > http://people.freebsd.org/~rmacklem/cache.patch > > in case it gets eaten by the list handler. > > Thanks for digging into this, rick > > Hi Rick, > > Patch applied. The system has been up and running for about > 16 hours now and so far it's still handling the load quite nicely. > > last pid: 15853; load averages: 5.36, 4.64, 4.48 up 0+16:08:16 > 08:48:07 > 72 processes: 7 running, 65 sleeping > CPU: % user, % nice, % system, % interrupt, % idle > Mem: 22M Active, 3345M Inact, 79G Wired, 9837M Buf, 11G Free > Swap: > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 2049 root 26 52 0 10052K 1712K CPU3 3 97:21 942.24% nfsd > > I'll followup again in 24 hours with another status. > > Any performance related numbers/knobs we can provide that might > be of interest? > > Thanks Rick. > > -John Just fyi, the patch has been committed to head and unless there are problems, will be in stable/8 in a couple of weeks. Thanks for helping with this. Please let me know if you have more problems, rick From owner-freebsd-fs@FreeBSD.ORG Mon Jun 20 19:17:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 634D3106566B for ; Mon, 20 Jun 2011 19:17:06 +0000 (UTC) (envelope-from patrick.proniewski@univ-lyon2.fr) Received: from smtp.univ-lyon2.fr (smtp.univ-lyon2.fr [159.84.143.21]) by mx1.freebsd.org (Postfix) with ESMTP id 21AE48FC17 for ; Mon, 20 Jun 2011 19:17:05 +0000 (UTC) Received: from ru.univ-lyon2.fr (localhost [127.0.0.1]) by smtp.univ-lyon2.fr (Postfix) with ESMTP id 75E0414D909 for ; Mon, 20 Jun 2011 21:17:05 +0200 (CEST) X-Virus-Scanned: amavisd-new at univ-lyon2.fr Received: from amavis.at.univ-lyon2.fr ([127.0.0.1]) by ru.univ-lyon2.fr (smtp.univ-lyon2.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9KkrYrRCv56q for ; Mon, 20 Jun 2011 21:17:04 +0200 (CEST) Received: from co4 (co4.univ-lyon2.fr [159.84.143.67]) by smtp.univ-lyon2.fr (Postfix) with ESMTP for ; Mon, 20 Jun 2011 21:17:04 +0200 (CEST) Received: from [10.250.65.79] ([80.125.173.129]) by co4.univ-lyon2.fr for ;Mon, 20 Jun 2011 21:16:59 +0200 (CEST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 From: Patrick Proniewski In-Reply-To: <43CFBAB7-9383-4D18-A2FF-061766637CE7@univ-lyon2.fr> Date: Mon, 20 Jun 2011 21:16:49 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <16735164.54848.1308597423064.JavaMail.root@co4> References: <43CFBAB7-9383-4D18-A2FF-061766637CE7@univ-lyon2.fr> To: FreeBSD Filesystems X-Mailer: Apple Mail (2.1084) X-ContactOffice-Account: main:2117681 X-Mailman-Approved-At: Mon, 20 Jun 2011 19:44:18 +0000 Subject: Re: ZFS, noexec and snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2011 19:17:06 -0000 Hello, Following Micheal's reply, I realise my english is not as clear as I wish := ) > On 19/06/2011 10:03, Patrick Proniewski wrote: >>=20 >> Every ZFS volume is made with noexec, but I've just find out that the au= tomount of .zfs/snapshot/* is not made with the noexec option. >>=20 >=20 > Just two days ago I was wondering why some of my snapshots are not=20 > visible in .zfs/snapshot/ after setting snapdir=3Dvisible. All of given= =20 > datasets have the noexec property set on. > I guess that is the answer then. >=20 > Michael What I intended to say is: Automount of .zfs/snapshot/* works, but snapshots are mounted without the o= ption "noexec", despite the fact that the property should be inherited from= parents (i think). Well, if you rely on "noexec" as a security feature, just don't use snapsho= ts, because it looks like snapshots are always mounted with "exec =3D on" Patrick PRONIEWSKI --=20 Administrateur Syst=E8me - DSI - Universit=E9 Lumi=E8re Lyon 2 From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 03:16:02 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5C1A1065675 for ; Tue, 21 Jun 2011 03:16:02 +0000 (UTC) (envelope-from james@jlauser.net) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8BAF38FC14 for ; Tue, 21 Jun 2011 03:16:02 +0000 (UTC) Received: by gxk28 with SMTP id 28so2385773gxk.13 for ; Mon, 20 Jun 2011 20:16:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jlauser.net; s=google; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=wW3fUh9Dzt+F9W6xGDwrTtD+sCxLvwkkcxcQLwXBOww=; b=l0F0n8Eo0BuD7zEnd7ODqXKtGOL/09/sk1gLQyEeR/IisO5WT3+zxYKWYaD997tM+s 6GrabgOSnNUieBOs/LmbzPxdU6WtyhARUb4fPnWP+SElm1h96Jrf6F2+C/gZRkepEIuh WI898eaHoain41xiiqVzcK65xZV+lDDZXFhxI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=jlauser.net; s=google; h=mime-version:date:message-id:subject:from:to:content-type; b=T8Z4DbesigXPra7lUPsP2pz/FhsPl63qUx+3ps1CKSvmI+rNEkt6PsWy4nmVuHKWdw B+U4IWJE1nj/+PHTNp82gpxkxTE3vrXxHUNbFJ/RfgIgqfdZNd1EHtahAOxbzjqQF5iS a6jbOuixnhjaneh4Os4AA1vqBCMrXbxSbbDCg= Received: by 10.91.201.4 with SMTP id d4mr6731400agq.41.1308624373056; Mon, 20 Jun 2011 19:46:13 -0700 (PDT) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx.google.com with ESMTPS id c63sm401399yhe.18.2011.06.20.19.46.11 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 20 Jun 2011 19:46:12 -0700 (PDT) Received: by gwb15 with SMTP id 15so1248195gwb.13 for ; Mon, 20 Jun 2011 19:46:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.32.67 with SMTP id n43mr3209959yha.161.1308624371390; Mon, 20 Jun 2011 19:46:11 -0700 (PDT) Received: by 10.236.202.234 with HTTP; Mon, 20 Jun 2011 19:46:11 -0700 (PDT) Date: Mon, 20 Jun 2011 22:46:11 -0400 Message-ID: From: "James L. Lauser" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFSv28 Dedup question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 03:16:02 -0000 I have a (hopefully) simple question about ZFSv28 deduplication that is not very clearly answered in any of the documentation I've found. If I enable dedupe on a dataset, will newly written data dedupe against old data that was written before dedupe was enabled? If not, is it worth forcing the data to be rewritten so that it makes it into the appropriate tables? -- James L. Lauser james@jlauser.net http://jlauser.net/ From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 04:55:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 582AE1065670 for ; Tue, 21 Jun 2011 04:55:14 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0E6AD8FC0C for ; Tue, 21 Jun 2011 04:55:13 +0000 (UTC) Received: by gxk28 with SMTP id 28so2412027gxk.13 for ; Mon, 20 Jun 2011 21:55:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=gCT1CIVQzf1cFBJXpeCF8qclBdYgc0OUTiZ96SeV5pc=; b=lZD46eCh1bb3O2S0Fg/XMCHVQDysf4cTdQ3ssVIaSyoZt/7IxeiFnPsmpD0wpNWzdW ME9UcWuQLDMtpjCanbU8bP85CzU57ff+TQPqBbcTLPyoo0hzB/2kCWJKEfnN2SPEbwbC 3nd+D+x+FA+y/DFzzsI7I5fcGLNWg6m9wfMUw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Go8Mi/vuiIkzfIv4M63CHV1rcC0qgK+BkzkYWz0zFMM1YERsM6/+0AcV8FiWY/nK31 qMrJM9u1ou8JHoYkLiIe1CtWyspQyiuejqvD4Y6fSW7CxgeoCGB728awU29QykBLIAu6 7HbbjKgCHw1077+kqg1mEtV9JGgrqZ6JcldAU= MIME-Version: 1.0 Received: by 10.91.66.15 with SMTP id t15mr6596776agk.141.1308632113185; Mon, 20 Jun 2011 21:55:13 -0700 (PDT) Received: by 10.90.65.18 with HTTP; Mon, 20 Jun 2011 21:55:13 -0700 (PDT) In-Reply-To: References: Date: Mon, 20 Jun 2011 21:55:13 -0700 Message-ID: From: Freddie Cash To: "James L. Lauser" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFSv28 Dedup question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 04:55:14 -0000 On Mon, Jun 20, 2011 at 7:46 PM, James L. Lauser wrote: > I have a (hopefully) simple question about ZFSv28 deduplication that is not > very clearly answered in any of the documentation I've found. > > If I enable dedupe on a dataset, will newly written data dedupe against old > data that was written before dedupe was enabled? If not, is it worth > forcing the data to be rewritten so that it makes it into the appropriate > tables? > > No. Dedupe only applies to newly written data. The DDT (dedupe table) only contains entries for data written after the "dedup" property is set on a filesystem. The only way to dedupe existing data is to copy/delete it, or "zfs send|zfs recv" it, or otherwise re-write it. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 10:51:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB8641065677 for ; Tue, 21 Jun 2011 10:51:42 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id B77FD8FC08 for ; Tue, 21 Jun 2011 10:51:40 +0000 (UTC) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5L8lBoO003556 for ; Tue, 21 Jun 2011 18:47:11 +1000 Received: from server.vk2pj.dyndns.org (c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103]) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5L8kovw024987 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 21 Jun 2011 18:46:52 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id p5L8kn12068893; Tue, 21 Jun 2011 18:46:49 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id p5L8kjfu068892; Tue, 21 Jun 2011 18:46:45 +1000 (EST) (envelope-from peter) Date: Tue, 21 Jun 2011 18:46:45 +1000 From: Peter Jeremy To: Kirk McKusick Message-ID: <20110621084645.GA68018@server.vk2pj.dyndns.org> References: <20110617153415.GA92803@testsoekris.hotsoft.nl> <201106171842.p5HIgQjn018296@chez.mckusick.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lrZ03NoBR/3+SXJZ" Content-Disposition: inline In-Reply-To: <201106171842.p5HIgQjn018296@chez.mckusick.com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Jeff Roberson Subject: Re: SU+J: negative used diskspace (for a while) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 10:51:43 -0000 --lrZ03NoBR/3+SXJZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Jun-17 11:42:26 -0700, Kirk McKusick wrote: >> Date: Fri, 17 Jun 2011 17:34:15 +0200 >> From: Hans Ottevanger >> So it takes more than a minute before the disk space is back to "normal" >> values. >We used to account for deleted blocks at the instant that they were >removed. This accounting was rather complex, so as part of doing >SU+J, Jeff simplified it. Under the simplification, the removal is >not accounted for until part way through the removal process. The >result is that you now get these false negative block counts until >the blocks have been partially reclaimed. If this behavior causes >enough trouble, Jeff might be convinced that the more accurate block >accounting is necessary. Negative values may also impact NFS clients - though just limiting the reported used space to 0 should avoid them getting too upset. That said, whilst I haven't seen negative used values, ZFS and Solaris UFS also take an extended period before 'df' reports correct values (several minutes for Solaris UFS). In the case of ZFS, even 'du' can report incorrect information for a while. --=20 Peter Jeremy --lrZ03NoBR/3+SXJZ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk4AWnUACgkQ/opHv/APuIecHgCdEPSaZIABGsfWByKiNV6jlg2A 7SoAoMNHZQsYAjdgV+R9cs/vQ/udpXBX =Ddxw -----END PGP SIGNATURE----- --lrZ03NoBR/3+SXJZ-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 12:32:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39817106566B for ; Tue, 21 Jun 2011 12:32:58 +0000 (UTC) (envelope-from james@jlauser.net) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id DECC78FC14 for ; Tue, 21 Jun 2011 12:32:57 +0000 (UTC) Received: by gyf3 with SMTP id 3so1035033gyf.13 for ; Tue, 21 Jun 2011 05:32:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jlauser.net; s=google; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=/s6fuUHexj95a2CcA/qdyKM0Tn+V9U0Nm6/6ZdXqi6A=; b=Drip7gIlxM/XL5Eh4F0soAYzmdXmWdywzq8cASf/8/lQsTGneP54cIRarGqgheLXKK tvZV9tDgCVhzki9e1H3FYnRUQmhOXNR+0O1swEdizJTIrRN020XEFcY65PMnqeyHjKQQ rHwCONygkKUkGw8cjFAzpoX55l/nzOnCYV/K4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=jlauser.net; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=eA/eLigRzdVovvL4DQfuM9N6PNXOVYxg2AgmoiH3UYYH+XHg3taENirnAahqB+YWmA h+x5EWNnhh+4u0Aw0SoLHyDzmGnDytl8ssPOKqn1IG+2538p2UpracxGin2+09/SE2oi MswI7uOw4uTOXTrpl6xgVTVHm+XsaGEKJcw1Y= Received: by 10.151.103.18 with SMTP id f18mr5517297ybm.347.1308659576893; Tue, 21 Jun 2011 05:32:56 -0700 (PDT) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx.google.com with ESMTPS id w70sm4220489yhl.45.2011.06.21.05.32.55 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 21 Jun 2011 05:32:55 -0700 (PDT) Received: by yic13 with SMTP id 13so3566372yic.13 for ; Tue, 21 Jun 2011 05:32:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.193.105 with SMTP id j69mr5393383yhn.496.1308659570991; Tue, 21 Jun 2011 05:32:50 -0700 (PDT) Received: by 10.236.202.234 with HTTP; Tue, 21 Jun 2011 05:32:50 -0700 (PDT) In-Reply-To: References: Date: Tue, 21 Jun 2011 08:32:50 -0400 Message-ID: From: "James L. Lauser" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFSv28 Dedup question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 12:32:58 -0000 Thanks for the info. This is what I suspected, but just wanted to verify. Given this, then the 1.11x ratio shown for dedup in 'zpool list' means that of the data that was written since dedup was enabled, it deduped at a ratio of 1.11:1? -- James L. Lauser james@jlauser.net http://jlauser.net/ On Tue, Jun 21, 2011 at 12:55 AM, Freddie Cash wrote: > On Mon, Jun 20, 2011 at 7:46 PM, James L. Lauser wrote: > >> I have a (hopefully) simple question about ZFSv28 deduplication that is >> not >> very clearly answered in any of the documentation I've found. >> >> If I enable dedupe on a dataset, will newly written data dedupe against >> old >> data that was written before dedupe was enabled? If not, is it worth >> forcing the data to be rewritten so that it makes it into the appropriate >> tables? >> >> No. Dedupe only applies to newly written data. The DDT (dedupe table) > only contains entries for data written after the "dedup" property is set on > a filesystem. > > The only way to dedupe existing data is to copy/delete it, or "zfs send|zfs > recv" it, or otherwise re-write it. > -- > Freddie Cash > fjwcash@gmail.com > From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 15:43:31 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 470D31065673 for ; Tue, 21 Jun 2011 15:43:31 +0000 (UTC) (envelope-from will@firepipe.net) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id E12758FC23 for ; Tue, 21 Jun 2011 15:43:30 +0000 (UTC) Received: by wyb33 with SMTP id 33so4588673wyb.13 for ; Tue, 21 Jun 2011 08:43:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.120.201 with SMTP id p51mr3148046weh.89.1308669378425; Tue, 21 Jun 2011 08:16:18 -0700 (PDT) Received: by 10.216.12.8 with HTTP; Tue, 21 Jun 2011 08:16:18 -0700 (PDT) Date: Tue, 21 Jun 2011 09:16:18 -0600 Message-ID: From: Will Andrews To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: bumping mount path lengths in struct statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 15:43:31 -0000 Hi, struct statfs contains the following: 90 char f_mntfromname[MNAMELEN]; /* mounted filesystem */ 91 char f_mntonname[MNAMELEN]; /* directory on which mounted */ Where MNAMELEN is, currently, 88. These limit the length of the path that a filesystem can be mounted to. This is enforced by kern/vfs_mount.c:vfs_donmount(). This limit seems archaic, especially given use cases like virtualization (large filesystem structures to support underlying VMs), builds (which often make extensive use of chroot with nullfs/NFS), ZFS, snapshots, etc. Does anyone object to bumping MNAMELEN to 1024 (PATH_MAX/MAXPATHLEN)? Or some other reasonably large value? --Will. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 19:16:35 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A92EB10656A4 for ; Tue, 21 Jun 2011 19:16:35 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id 927558FC15 for ; Tue, 21 Jun 2011 19:16:35 +0000 (UTC) Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta09.emeryville.ca.mail.comcast.net with comcast id yj9t1g0041vN32cA9jGZeU; Tue, 21 Jun 2011 19:16:33 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta22.emeryville.ca.mail.comcast.net with comcast id yjGF1g0171t3BNj8ijGLHs; Tue, 21 Jun 2011 19:16:20 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id D4055102C36; Tue, 21 Jun 2011 12:16:26 -0700 (PDT) Date: Tue, 21 Jun 2011 12:16:26 -0700 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20110621191626.GA99204@icarus.home.lan> References: <20110618005124.GA43568@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110618005124.GA43568@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, mav@freebsd.org Subject: Re: MFC: graid(8) (RAID GEOM) support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 19:16:35 -0000 On Fri, Jun 17, 2011 at 05:51:24PM -0700, Jeremy Chadwick wrote: > Sorry for the cross-post, but I thought both lists would want to know > about this. > > Looks like mav@ just committed this ~17 hours ago: > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c > > Those who have historically wanted to use Intel MatrixRAID (now called > Intel RST (Rapid Storage Technology)), but haven't due to the severe > issues/risks with ataraid(4), will probably be very interested in > this commit. I know I am! > > I plan on stress-testing the Intel support on a 2-disk system with > RAID-1 enabled, and will document my experiences, procedures, etc... > > Thanks, mav@ and imp@ ! > > I'll be sending another mail momentarily asking about USB memory stick > image building, since to accomplish the above, I want to do a > "bare-bones" install on our test system (e.g. enable Intel RAID, set up > 2 disks in a RAID-1 mirror, boot a USB memory stick that contains this > latest RELENG_8 build, and do sysinstall, etc.. the normal way). > > > ===================================================================== > MFC r219974, r220209, r220210, r220790: > Add new RAID GEOM class, that is going to replace ataraid(4) in supporting > various BIOS-based software RAIDs. Unlike ataraid(4) this implementation > does not depend on legacy ata(4) subsystem and can be used with any disk > drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4) > with `options ATA_CAM`). To make code more readable and extensible, this > implementation follows modular design, including core part and two sets > of modules, implementing support for different metadata formats and RAID > levels. > > Support for such popular metadata formats is now implemented: > Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage. > > Such RAID levels are now supported: > RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT. > > For all of these RAID levels and metadata formats this class supports > full cycle of volume operations: reading, writing, creation, deletion, > disk removal and insertion, rebuilding, dirty shutdown detection > and resynchronization, bad sector recovery, faulty disks tracking, > hot-spare disks. For Intel and Promise formats there is support multiple > volumes per disk set. > > Look graid(8) manual page for additional details. > > Co-authored by: imp > Sponsored by: Cisco Systems, Inc. and iXsystems, Inc. > ===================================================================== By the way, it doesn't look like the graid(8) man page is being brought in to the base system on either of the two RELENG_8 systems I've rebuilt in the past few days. I'm thinking /usr/src/sbin/geom/class/raid/graid.8 isn't being noticed as a man page. /usr/src/sbin/geom/class/raid/Makefile doesn't have MAN8=graid.8 in it, is that the problem? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 19:21:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4D351065689 for ; Tue, 21 Jun 2011 19:21:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 1995D8FC14 for ; Tue, 21 Jun 2011 19:21:52 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p5LJLlXR096684 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 21 Jun 2011 22:21:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p5LJLlQr080914; Tue, 21 Jun 2011 22:21:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p5LJLlPE080913; Tue, 21 Jun 2011 22:21:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 21 Jun 2011 22:21:47 +0300 From: Kostik Belousov To: Will Andrews Message-ID: <20110621192147.GJ48734@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yrl79+UYnlG4XqVU" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: bumping mount path lengths in struct statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 19:21:53 -0000 --yrl79+UYnlG4XqVU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 21, 2011 at 09:16:18AM -0600, Will Andrews wrote: > Hi, >=20 > struct statfs contains the following: >=20 > 90 char f_mntfromname[MNAMELEN]; /* mounted filesystem */ > 91 char f_mntonname[MNAMELEN]; /* directory on which mou= nted */ >=20 > Where MNAMELEN is, currently, 88. These limit the length of the path > that a filesystem can be mounted to. This is enforced by > kern/vfs_mount.c:vfs_donmount(). This limit seems archaic, especially > given use cases like virtualization (large filesystem structures to > support underlying VMs), builds (which often make extensive use of > chroot with nullfs/NFS), ZFS, snapshots, etc. Does anyone object to > bumping MNAMELEN to 1024 (PATH_MAX/MAXPATHLEN)? Or some other > reasonably large value? There is nothing inherently wrong with bumping the length. But the work required is probably more then you estimated. The cause is the ABI breakage. For sure, you will need to provide the shims for compat syscalls. Unfortunately, this is not enough. Even quick look over our tree shows that struct statfs is used in API by several base libraries. Look e.g. at the getmntinfo(3). Libc would need shims too, at least. You will need to do the ABI analisys of the whole system, provide the shims for the symbol-versioned libraries, and bump so version for unversioned. --yrl79+UYnlG4XqVU Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4A70sACgkQC3+MBN1Mb4huPwCdH5mvpBYrrnFL7Ds70PC35t7B MucAoOw3rzemUoPU/vbDoYQpl4751Rn+ =6KW5 -----END PGP SIGNATURE----- --yrl79+UYnlG4XqVU-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 20:22:32 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15ED0106572C for ; Tue, 21 Jun 2011 20:22:32 +0000 (UTC) (envelope-from bsd@vink.pl) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id A15EE8FC19 for ; Tue, 21 Jun 2011 20:22:31 +0000 (UTC) Received: by bwz12 with SMTP id 12so340850bwz.13 for ; Tue, 21 Jun 2011 13:22:30 -0700 (PDT) Received: by 10.204.141.205 with SMTP id n13mr2894756bku.198.1308686353868; Tue, 21 Jun 2011 12:59:13 -0700 (PDT) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx.google.com with ESMTPS id k16sm5631690bks.1.2011.06.21.12.59.13 (version=SSLv3 cipher=OTHER); Tue, 21 Jun 2011 12:59:13 -0700 (PDT) Received: by bwz12 with SMTP id 12so319272bwz.13 for ; Tue, 21 Jun 2011 12:59:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.83.73 with SMTP id e9mr2052305bkl.118.1308686353160; Tue, 21 Jun 2011 12:59:13 -0700 (PDT) Received: by 10.204.79.83 with HTTP; Tue, 21 Jun 2011 12:59:12 -0700 (PDT) Date: Tue, 21 Jun 2011 21:59:12 +0200 Message-ID: From: Wiktor Niesiobedzki To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 Subject: ZFS L2ARC hit ratio X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 20:22:32 -0000 Hi, I've recently migrated my 8.2 box to recent stable: FreeBSD kadlubek.vink.pl 8.2-STABLE FreeBSD 8.2-STABLE #22: Tue Jun 7 03:43:29 CEST 2011 root@kadlubek:/usr/obj/usr/src/sys/KADLUB i386 And upgraded my ZFS/ZPOOL to newest versions. Though through my monitoring I've noticed some declination in L2ARC hit ratio (server is not busy, so it doesn't look that suspicious). I've made some tests today and I guess, that there might be some problem: I've did the following on cold cache: sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.l2_hits kstat.zfs.misc.arcstats.misses kstat.zfs.misc.arcstats.l2_misses && cat 4gb_file>/dev/null && sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.l2_hits kstat.zfs.misc.arcstats.misses kstat.zfs.misc.arcstats.l2_misses And after computing the differences I've got: kstat.zfs.misc.arcstats.hits 1213775 kstat.zfs.misc.arcstats.l2_hits 21 kstat.zfs.misc.arcstats.misses 37364 kstat.zfs.misc.arcstats.l2_misses 37343 That's pretty normal. After that, I've noticed the growth in L2ARC usage by 4gb, but, when I do the same operation again, the results are worrying: kstat.zfs.misc.arcstats.hits 1188662 kstat.zfs.misc.arcstats.l2_hits 305 kstat.zfs.misc.arcstats.misses 36933 kstat.zfs.misc.arcstats.l2_misses 36628 +/- the same. I've did some gstating during these tests, and I've noticed around 2 reads per second from my cache device accounting for about 32kb per second. Not that much. My first guess, is that for some reason, we claim that L2ARC record is outdated and thus not using it at all. Any clues, why L2ARC isn't kicking in this situation at all? I notice some substantial (like 5-10%) hits from L2ARC during the cronjobs though, but this simple scenario is just failing... For the record below are some other details: %zfs get all tank NAME PROPERTY VALUE SOURCE tank type filesystem - tank creation Sat Dec 5 3:37 2009 - tank used 572G - tank available 343G - tank referenced 441G - tank compressratio 1.00x - tank mounted yes - tank quota none default tank reservation none default tank recordsize 128K default tank mountpoint /tank default tank sharenfs off default tank checksum on default tank compression off default tank atime off local tank devices on default tank exec on default tank setuid on default tank readonly off default tank jailed off default tank snapdir hidden default tank aclinherit restricted default tank canmount on default tank xattr off temporary tank copies 1 default tank version 5 - tank utf8only off - tank normalization none - tank casesensitivity sensitive - tank vscan off default tank nbmand off default tank sharesmb off default tank refquota none default tank refreservation none default tank primarycache all default tank secondarycache all default tank usedbysnapshots 0 - tank usedbydataset 441G - tank usedbychildren 131G - tank usedbyrefreservation 0 - tank logbias latency default tank dedup off default tank mlslabel - tank sync standard default %zpool status tank pool: tank state: ONLINE scan: scrub repaired 0 in 7h23m with 0 errors on Wed Jun 15 07:53:29 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ad6.eli ONLINE 0 0 0 ad8.eli ONLINE 0 0 0 ad10.eli ONLINE 0 0 0 cache gptid/7644bfda-e141-11de-951e-004063f2d074 ONLINE 0 0 0 errors: No known data errors Cheers, Wiktor Niesiobedzki From owner-freebsd-fs@FreeBSD.ORG Tue Jun 21 22:15:54 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1600106572B for ; Tue, 21 Jun 2011 22:15:54 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8D08A8FC1A for ; Tue, 21 Jun 2011 22:15:54 +0000 (UTC) Received: by yxl31 with SMTP id 31so134327yxl.13 for ; Tue, 21 Jun 2011 15:15:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=oEwDToavuo0HgkJYBXhUdG4WUBULozwlzPW0sZS0zBg=; b=rB0wNfKs9/ZiIadsbZtiqjHAjhk48XRuXi1tNNiSxQyBe5Hjp9mrKn6HWZd6m27cRL QFNF3BZkPgk6x6AN1oyUbYBiW5I2SyO484htVA+yZ/AKPJfS5bWPJrA259N47bb2Phct fbLr6QkW+20jpbEMv3r5ISGzhxhCRy2gU9gYE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=ewgaR+//4bkz0ruoKR9Jq5e9S0LyzwX5no+gB59uHJTTj+km33slSz5LuC6UvxqqvA h5mMrIEZI3+Qlfs3re5dozZ7LKSD6EhGFx2xF89EEWv9W8cdrWrQ3OAebx6VqM5wN3Xf iiJUG/NHZsRN1pKr/QzNz5I8WQDoPD6BwrV1U= MIME-Version: 1.0 Received: by 10.236.180.98 with SMTP id i62mr10296548yhm.403.1308694553546; Tue, 21 Jun 2011 15:15:53 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.236.61.73 with HTTP; Tue, 21 Jun 2011 15:15:53 -0700 (PDT) In-Reply-To: References: Date: Tue, 21 Jun 2011 15:15:53 -0700 X-Google-Sender-Auth: nx082PDTpXsRHEAmpyt-zILCVQc Message-ID: From: Artem Belevich To: Wiktor Niesiobedzki Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS L2ARC hit ratio X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2011 22:15:54 -0000 On Tue, Jun 21, 2011 at 12:59 PM, Wiktor Niesiobedzki wrote: > Hi, > > I've recently migrated my 8.2 box to recent stable: > FreeBSD kadlubek.vink.pl 8.2-STABLE FreeBSD 8.2-STABLE #22: Tue Jun =A07 > 03:43:29 CEST 2011 =A0 =A0 root@kadlubek:/usr/obj/usr/src/sys/KADLUB =A0i= 386 > > And upgraded my ZFS/ZPOOL to newest versions. Though through my > monitoring I've noticed some declination in L2ARC hit ratio (server is > not busy, so it doesn't look that suspicious). I've made some tests > today and I guess, that there might be some problem: > > I've did the following on cold cache: > sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.l2_hits > kstat.zfs.misc.arcstats.misses kstat.zfs.misc.arcstats.l2_misses && > cat 4gb_file>/dev/null && sysctl kstat.zfs.misc.arcstats.hits > kstat.zfs.misc.arcstats.l2_hits kstat.zfs.misc.arcstats.misses > kstat.zfs.misc.arcstats.l2_misses > > And after computing the differences I've got: > kstat.zfs.misc.arcstats.hits =A0 =A01213775 > kstat.zfs.misc.arcstats.l2_hits 21 > kstat.zfs.misc.arcstats.misses =A037364 > kstat.zfs.misc.arcstats.l2_misses =A0 =A0 =A0 37343 > > That's pretty normal. After that, I've noticed the growth in L2ARC > usage by 4gb, but, when I do the same operation again, the results are > worrying: > kstat.zfs.misc.arcstats.hits =A0 =A01188662 > kstat.zfs.misc.arcstats.l2_hits 305 > kstat.zfs.misc.arcstats.misses =A036933 > kstat.zfs.misc.arcstats.l2_misses =A0 =A0 =A0 36628 > > +/- the same. > > I've did some gstating during these tests, and I've noticed around 2 > reads per second from my cache device accounting for about 32kb per > second. Not that much. > > My first guess, is that for some reason, we claim that L2ARC record is > outdated and thus not using it at all. > > Any clues, why L2ARC isn't kicking in this situation at all? I notice > some substantial (like 5-10%) hits from L2ARC during the cronjobs > though, but this simple scenario is just failing... > > For the record below are some other details: > %zfs get all tank > NAME =A0PROPERTY =A0 =A0 =A0 =A0 =A0 =A0 =A0VALUE =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0SOURCE > tank =A0type =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0filesystem =A0 =A0 =A0 = =A0 =A0 =A0 - > tank =A0creation =A0 =A0 =A0 =A0 =A0 =A0 =A0Sat Dec =A05 =A03:37 2009 =A0= - > tank =A0used =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0572G =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 - > tank =A0available =A0 =A0 =A0 =A0 =A0 =A0 343G =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 - > tank =A0referenced =A0 =A0 =A0 =A0 =A0 =A0441G =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 - > tank =A0compressratio =A0 =A0 =A0 =A0 1.00x =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0- > tank =A0mounted =A0 =A0 =A0 =A0 =A0 =A0 =A0 yes =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0- > tank =A0quota =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 default > tank =A0reservation =A0 =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 default > tank =A0recordsize =A0 =A0 =A0 =A0 =A0 =A0128K =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 default > tank =A0mountpoint =A0 =A0 =A0 =A0 =A0 =A0/tank =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0default > tank =A0sharenfs =A0 =A0 =A0 =A0 =A0 =A0 =A0off =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0default > tank =A0checksum =A0 =A0 =A0 =A0 =A0 =A0 =A0on =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 default > tank =A0compression =A0 =A0 =A0 =A0 =A0 off =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0default > tank =A0atime =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 off =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0local > tank =A0devices =A0 =A0 =A0 =A0 =A0 =A0 =A0 on =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 default > tank =A0exec =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0on =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 default > tank =A0setuid =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0on =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 default > tank =A0readonly =A0 =A0 =A0 =A0 =A0 =A0 =A0off =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0default > tank =A0jailed =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0off =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0default > tank =A0snapdir =A0 =A0 =A0 =A0 =A0 =A0 =A0 hidden =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 default > tank =A0aclinherit =A0 =A0 =A0 =A0 =A0 =A0restricted =A0 =A0 =A0 =A0 =A0 = =A0 default > tank =A0canmount =A0 =A0 =A0 =A0 =A0 =A0 =A0on =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 default > tank =A0xattr =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 off =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0temporary > tank =A0copies =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0default > tank =A0version =A0 =A0 =A0 =A0 =A0 =A0 =A0 5 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0- > tank =A0utf8only =A0 =A0 =A0 =A0 =A0 =A0 =A0off =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0- > tank =A0normalization =A0 =A0 =A0 =A0 none =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 - > tank =A0casesensitivity =A0 =A0 =A0 sensitive =A0 =A0 =A0 =A0 =A0 =A0 =A0= - > tank =A0vscan =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 off =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0default > tank =A0nbmand =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0off =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0default > tank =A0sharesmb =A0 =A0 =A0 =A0 =A0 =A0 =A0off =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0default > tank =A0refquota =A0 =A0 =A0 =A0 =A0 =A0 =A0none =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 default > tank =A0refreservation =A0 =A0 =A0 =A0none =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 default > tank =A0primarycache =A0 =A0 =A0 =A0 =A0all =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0default > tank =A0secondarycache =A0 =A0 =A0 =A0all =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0default > tank =A0usedbysnapshots =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0- > tank =A0usedbydataset =A0 =A0 =A0 =A0 441G =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 - > tank =A0usedbychildren =A0 =A0 =A0 =A0131G =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 - > tank =A0usedbyrefreservation =A00 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0- > tank =A0logbias =A0 =A0 =A0 =A0 =A0 =A0 =A0 latency =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0default > tank =A0dedup =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 off =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0default > tank =A0mlslabel =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 - > tank =A0sync =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0standard =A0 =A0 =A0 =A0 = =A0 =A0 =A0 default > > %zpool status tank > =A0pool: tank > =A0state: ONLINE > =A0scan: scrub repaired 0 in 7h23m with 0 errors on Wed Jun 15 07:53:29 2= 011 > config: > > =A0NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM > =A0tank =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 raidz1-0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 ad6.eli =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 ad8.eli =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 ad10.eli =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0cache > =A0 gptid/7644bfda-e141-11de-951e-004063f2d074 =A0ONLINE =A0 =A0 =A0 0 = =A0 =A0 0 =A0 =A0 0 > > errors: No known data errors > > > Cheers, > > Wiktor Niesiobedzki > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > L2ARC is filled with items evicted from ARC. The catch is that L2ARC writes are intentionally throttled. When L2ARC is empty writes happen at a higher rate, but it's still intentionally low so that read-optimized cache device does not wear out too soon. The bottom line is that not all the data spilled out of ARC ends up in L2ARC on the first try. Re-run your experiment again and you would probably see some improvement in L2ARC hit rates. You can use following sysctls that control L2ARC write speed: vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 Word of caution -- before you tweak this, do check total amount of writes your SSD can handle and how long it would take for L2ARC writes to write that much. I've recently discovered that on one of my boxes 160GB X-25M (G2) ended up at it's official limit in about three months. --Artem From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 04:51:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B791310657B5; Wed, 22 Jun 2011 04:51:56 +0000 (UTC) (envelope-from bsd@vink.pl) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 121668FC12; Wed, 22 Jun 2011 04:51:55 +0000 (UTC) Received: by bwz12 with SMTP id 12so671641bwz.13 for ; Tue, 21 Jun 2011 21:51:54 -0700 (PDT) Received: by 10.204.153.20 with SMTP id i20mr146657bkw.208.1308718314670; Tue, 21 Jun 2011 21:51:54 -0700 (PDT) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx.google.com with ESMTPS id v6sm161598bkf.11.2011.06.21.21.51.53 (version=SSLv3 cipher=OTHER); Tue, 21 Jun 2011 21:51:54 -0700 (PDT) Received: by bwz12 with SMTP id 12so671623bwz.13 for ; Tue, 21 Jun 2011 21:51:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.3.196 with SMTP id 4mr159493bko.188.1308718313469; Tue, 21 Jun 2011 21:51:53 -0700 (PDT) Received: by 10.204.152.212 with HTTP; Tue, 21 Jun 2011 21:51:53 -0700 (PDT) In-Reply-To: References: Date: Wed, 22 Jun 2011 06:51:53 +0200 Message-ID: From: Wiktor Niesiobedzki To: Artem Belevich Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS L2ARC hit ratio X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 04:51:56 -0000 2011/6/22 Artem Belevich : > > L2ARC is filled with items evicted from ARC. The catch is that L2ARC > writes are intentionally throttled. When L2ARC is empty writes happen > at a higher rate, but it's still intentionally low so that > read-optimized cache device does not wear out too soon. The bottom > line is that not all the data spilled out of ARC ends up in L2ARC on > the first try. Re-run your experiment again and you would probably see > some improvement in L2ARC hit rates. I've run the experiment 3 times with no extent. Funny thing is: - in first run, I see a lot of write activity against cache device - in second run, I see no write activity against cache device, nor read activity So my guess is, that anyhow, ZFS cache layer knows, that this file is *there*, though it decides not to serve it from L2ARC... Cheers, Wiktor From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 09:26:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16715106568D for ; Wed, 22 Jun 2011 09:26:41 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 967B68FC14 for ; Wed, 22 Jun 2011 09:26:40 +0000 (UTC) Received: by fxm11 with SMTP id 11so735151fxm.13 for ; Wed, 22 Jun 2011 02:26:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:message-id:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=nG0JdpL+ad7KuN2c50pBu8vpY+pdnGgeUyOl/AAiPPc=; b=KptT3WXOg6iL26L+xy8+sVozZPFHbJNbLzjX0mOzwmOgwAJQVCCKeoOU6ULybQXxsS bnbEpjCpGO4cuXOkUuKl9CSdB6RRw5XhGrBmH0QbXy4iGznpsDiYBz94hWXFRqmDvava qaQo3Kdjn2VZLTwQoAgWwqTpgN12W/CJjsov4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=o7wsiRC3LvJ8opSJ6BMFz1HL+QO/lHXuhpj63ci8RIe2TVkw3bS+lbISBM35ofcCjM Yo7+q3seIJ0zXmSHvJh0hyRFetX5mwJq/QB+XYIu3I/gGe64K8vLHxjqa6nVzfaZYhzl krpihcZCAsaBK/WkVukIM2miFvNBBo1t8+p3Q= Received: by 10.223.7.150 with SMTP id d22mr499322fad.17.1308733434425; Wed, 22 Jun 2011 02:03:54 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id e16sm186838fak.17.2011.06.22.02.03.52 (version=SSLv3 cipher=OTHER); Wed, 22 Jun 2011 02:03:53 -0700 (PDT) Sender: Alexander Motin Message-ID: <4E01AFBA.809@FreeBSD.org> Date: Wed, 22 Jun 2011 12:02:50 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Jeremy Chadwick References: <20110618005124.GA43568@icarus.home.lan> <20110621191626.GA99204@icarus.home.lan> In-Reply-To: <20110621191626.GA99204@icarus.home.lan> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: MFC: graid(8) (RAID GEOM) support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 09:26:41 -0000 Jeremy Chadwick wrote: > On Fri, Jun 17, 2011 at 05:51:24PM -0700, Jeremy Chadwick wrote: >> Sorry for the cross-post, but I thought both lists would want to know >> about this. >> >> Looks like mav@ just committed this ~17 hours ago: >> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c >> >> Those who have historically wanted to use Intel MatrixRAID (now called >> Intel RST (Rapid Storage Technology)), but haven't due to the severe >> issues/risks with ataraid(4), will probably be very interested in >> this commit. I know I am! >> >> I plan on stress-testing the Intel support on a 2-disk system with >> RAID-1 enabled, and will document my experiences, procedures, etc... >> >> Thanks, mav@ and imp@ ! >> >> I'll be sending another mail momentarily asking about USB memory stick >> image building, since to accomplish the above, I want to do a >> "bare-bones" install on our test system (e.g. enable Intel RAID, set up >> 2 disks in a RAID-1 mirror, boot a USB memory stick that contains this >> latest RELENG_8 build, and do sysinstall, etc.. the normal way). >> >> >> ===================================================================== >> MFC r219974, r220209, r220210, r220790: >> Add new RAID GEOM class, that is going to replace ataraid(4) in supporting >> various BIOS-based software RAIDs. Unlike ataraid(4) this implementation >> does not depend on legacy ata(4) subsystem and can be used with any disk >> drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4) >> with `options ATA_CAM`). To make code more readable and extensible, this >> implementation follows modular design, including core part and two sets >> of modules, implementing support for different metadata formats and RAID >> levels. >> >> Support for such popular metadata formats is now implemented: >> Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage. >> >> Such RAID levels are now supported: >> RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT. >> >> For all of these RAID levels and metadata formats this class supports >> full cycle of volume operations: reading, writing, creation, deletion, >> disk removal and insertion, rebuilding, dirty shutdown detection >> and resynchronization, bad sector recovery, faulty disks tracking, >> hot-spare disks. For Intel and Promise formats there is support multiple >> volumes per disk set. >> >> Look graid(8) manual page for additional details. >> >> Co-authored by: imp >> Sponsored by: Cisco Systems, Inc. and iXsystems, Inc. >> ===================================================================== > > By the way, it doesn't look like the graid(8) man page is being brought > in to the base system on either of the two RELENG_8 systems I've rebuilt > in the past few days. > > I'm thinking /usr/src/sbin/geom/class/raid/graid.8 isn't being noticed > as a man page. > > /usr/src/sbin/geom/class/raid/Makefile doesn't have MAN8=graid.8 in it, > is that the problem? I've just rebuilt my test 8-STABLE system and it installed graid(8). -- Alexander Motin From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 10:37:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9AD50106564A for ; Wed, 22 Jun 2011 10:37:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA11.westchester.pa.mail.comcast.net (qmta11.westchester.pa.mail.comcast.net [76.96.59.211]) by mx1.freebsd.org (Postfix) with ESMTP id 0C5388FC1E for ; Wed, 22 Jun 2011 10:37:05 +0000 (UTC) Received: from omta01.westchester.pa.mail.comcast.net ([76.96.62.11]) by QMTA11.westchester.pa.mail.comcast.net with comcast id yyYv1g0010EZKEL5Byd61Z; Wed, 22 Jun 2011 10:37:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.westchester.pa.mail.comcast.net with comcast id yyd41g00m1t3BNj3Myd5yL; Wed, 22 Jun 2011 10:37:06 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 42587102C36; Wed, 22 Jun 2011 03:37:03 -0700 (PDT) Date: Wed, 22 Jun 2011 03:37:03 -0700 From: Jeremy Chadwick To: Alexander Motin Message-ID: <20110622103703.GA14901@icarus.home.lan> References: <20110618005124.GA43568@icarus.home.lan> <20110621191626.GA99204@icarus.home.lan> <4E01AFBA.809@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E01AFBA.809@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: MFC: graid(8) (RAID GEOM) support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 10:37:06 -0000 On Wed, Jun 22, 2011 at 12:02:50PM +0300, Alexander Motin wrote: > Jeremy Chadwick wrote: > > On Fri, Jun 17, 2011 at 05:51:24PM -0700, Jeremy Chadwick wrote: > >> Sorry for the cross-post, but I thought both lists would want to know > >> about this. > >> > >> Looks like mav@ just committed this ~17 hours ago: > >> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c > >> > >> Those who have historically wanted to use Intel MatrixRAID (now called > >> Intel RST (Rapid Storage Technology)), but haven't due to the severe > >> issues/risks with ataraid(4), will probably be very interested in > >> this commit. I know I am! > >> > >> I plan on stress-testing the Intel support on a 2-disk system with > >> RAID-1 enabled, and will document my experiences, procedures, etc... > >> > >> Thanks, mav@ and imp@ ! > >> > >> I'll be sending another mail momentarily asking about USB memory stick > >> image building, since to accomplish the above, I want to do a > >> "bare-bones" install on our test system (e.g. enable Intel RAID, set up > >> 2 disks in a RAID-1 mirror, boot a USB memory stick that contains this > >> latest RELENG_8 build, and do sysinstall, etc.. the normal way). > >> > >> > >> ===================================================================== > >> MFC r219974, r220209, r220210, r220790: > >> Add new RAID GEOM class, that is going to replace ataraid(4) in supporting > >> various BIOS-based software RAIDs. Unlike ataraid(4) this implementation > >> does not depend on legacy ata(4) subsystem and can be used with any disk > >> drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4) > >> with `options ATA_CAM`). To make code more readable and extensible, this > >> implementation follows modular design, including core part and two sets > >> of modules, implementing support for different metadata formats and RAID > >> levels. > >> > >> Support for such popular metadata formats is now implemented: > >> Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage. > >> > >> Such RAID levels are now supported: > >> RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT. > >> > >> For all of these RAID levels and metadata formats this class supports > >> full cycle of volume operations: reading, writing, creation, deletion, > >> disk removal and insertion, rebuilding, dirty shutdown detection > >> and resynchronization, bad sector recovery, faulty disks tracking, > >> hot-spare disks. For Intel and Promise formats there is support multiple > >> volumes per disk set. > >> > >> Look graid(8) manual page for additional details. > >> > >> Co-authored by: imp > >> Sponsored by: Cisco Systems, Inc. and iXsystems, Inc. > >> ===================================================================== > > > > By the way, it doesn't look like the graid(8) man page is being brought > > in to the base system on either of the two RELENG_8 systems I've rebuilt > > in the past few days. > > > > I'm thinking /usr/src/sbin/geom/class/raid/graid.8 isn't being noticed > > as a man page. > > > > /usr/src/sbin/geom/class/raid/Makefile doesn't have MAN8=graid.8 in it, > > is that the problem? > > I've just rebuilt my test 8-STABLE system and it installed graid(8). Hmm, there must be something I'm missing either in the base system or the kernel or both. Does this kernel module and/or bits and pieces not get built unless it's included strictly in the kernel? Below is one of the two systems, looking for both graid* and geom_raid*. There's the old geom_raid3 stuff there, and the source bits/pieces for the new graid(8), but nothing seems built (including kernel module) for the new graid(8). If you'd like I can rm -fr /usr/src/* ; rm -fr /var/db/sup/src-all and then re-download source from an official cvsup mirror (I've been using cvsup9.freebsd.org for both boxes). icarus# uname -a FreeBSD icarus.home.lan 8.2-STABLE FreeBSD 8.2-STABLE #0: Fri Jun 17 18:01:45 PDT 2011 root@icarus.home.lan:/usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64 amd64 icarus# find /usr -name "graid*" -ls 3211128 8 -r--r--r-- 1 root wheel 2521 Jun 17 18:25 /usr/share/man/man8/graid3.8.gz 169318 16 -rw-r--r-- 1 root wheel 6390 Aug 3 2009 /usr/src/sbin/geom/class/raid3/graid3.8 169624 16 -rw-r--r-- 1 root wheel 8126 Jun 16 23:59 /usr/src/sbin/geom/class/raid/graid.8 921430 8 -rw-r--r-- 1 root wheel 2521 Jun 17 17:51 /usr/obj/usr/src/sbin/geom/class/raid3/graid3.8.gz 3369372 4 drwxr-xr-x 2 root wheel 512 May 3 03:58 /usr/ports/sysutils/graid5 icarus# icarus# find /boot -name "graid*" -ls icarus# icarus# find /usr -name "geom_raid*" -ls 169317 20 -rw-r--r-- 1 root wheel 9257 Jan 18 21:13 /usr/src/sbin/geom/class/raid3/geom_raid3.c 165265 8 -rw-r--r-- 1 root wheel 2992 Jun 16 23:59 /usr/src/sbin/geom/class/raid/geom_raid.c 259652 4 drwxr-xr-x 2 root wheel 512 Jun 6 06:28 /usr/src/sys/modules/geom/geom_raid3 285292 4 drwxr-xr-x 2 root wheel 512 Jun 17 17:17 /usr/src/sys/modules/geom/geom_raid 262798 4 drwxr-xr-x 2 root wheel 512 Jun 6 06:29 /usr/src/tools/regression/geom_raid3 921428 48 -rw-r--r-- 1 root wheel 22760 Jun 17 17:51 /usr/obj/usr/src/sbin/geom/class/raid3/geom_raid3.So 921431 64 -rwxr-xr-x 1 root wheel 32207 Jun 17 17:51 /usr/obj/usr/src/sbin/geom/class/raid3/geom_raid3.so 1014175 4 drwxr-xr-x 2 root wheel 512 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3 1015257 736 -rw-r--r-- 1 root wheel 359456 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3/geom_raid3.ko.debug 1015258 640 -rw-r--r-- 1 root wheel 304432 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3/geom_raid3.ko.symbols 1015259 272 -rw-r--r-- 1 root wheel 137448 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3/geom_raid3.ko icarus# icarus# find /boot -name "geom_raid*" -ls 94943 272 -r-xr-xr-x 1 root wheel 137448 Jun 17 18:24 /boot/kernel/geom_raid3.ko 94944 640 -r-xr-xr-x 1 root wheel 304432 Jun 17 18:24 /boot/kernel/geom_raid3.ko.symbols 71074 272 -r-xr-xr-x 1 root wheel 137448 Jun 6 05:35 /boot/kernel.old/geom_raid3.ko 71075 640 -r-xr-xr-x 1 root wheel 304432 Jun 6 05:35 /boot/kernel.old/geom_raid3.ko.symbols -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 10:42:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56690106564A for ; Wed, 22 Jun 2011 10:42:52 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id D13F28FC15 for ; Wed, 22 Jun 2011 10:42:51 +0000 (UTC) Received: by fxm11 with SMTP id 11so793710fxm.13 for ; Wed, 22 Jun 2011 03:42:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=6BcFyEl+lcSzZs5gOgEuEmqIoEH3MNlVucDmWOqt6gQ=; b=Qf3UdGZOuJEAwlFwi2swLZaaWBXr7xJGVZdqNDB5D82weh6i5369EtsPp3cUL0qQHw RBVJURUu0eUvsrYqg0colTrRPL7zR17Z/A2PaW7aJow5lCiea7XGaMG36QG62aGojMwo FxW2edS81HT8CrCFTz5xBrFFtCUrkS/HOQwdI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=TQG6vS48FRngwVZ3gDgB3wysgfF8+rHKNGzrl7VjGQM09NsAbBX/LIzVXvbSNR02JM UUX6nCoG92xTTBDV+D3QccJQmQ7IndvAsZqfiJBYAdvjEIHqS/pFbop+Pn/JGUsCDDGx VtHuJjJFfBJjW+qsrDaAX+/Xtnhvwy9SEhy6o= Received: by 10.223.77.92 with SMTP id f28mr645139fak.37.1308737619429; Wed, 22 Jun 2011 03:13:39 -0700 (PDT) Received: from localhost (lan-78-157-92-5.vln.skynet.lt [78.157.92.5]) by mx.google.com with ESMTPS id q14sm225546faa.27.2011.06.22.03.13.37 (version=SSLv3 cipher=OTHER); Wed, 22 Jun 2011 03:13:38 -0700 (PDT) Date: Wed, 22 Jun 2011 13:13:28 +0300 From: Gleb Kurtsou To: Kostik Belousov Message-ID: <20110622101328.GA19866@tops> References: <20110621192147.GJ48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20110621192147.GJ48734@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: bumping mount path lengths in struct statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 10:42:52 -0000 On (21/06/2011 22:21), Kostik Belousov wrote: > On Tue, Jun 21, 2011 at 09:16:18AM -0600, Will Andrews wrote: > > Hi, > > > > struct statfs contains the following: > > > > 90 char f_mntfromname[MNAMELEN]; /* mounted filesystem */ > > 91 char f_mntonname[MNAMELEN]; /* directory on which mounted */ > > > > Where MNAMELEN is, currently, 88. These limit the length of the path > > that a filesystem can be mounted to. This is enforced by > > kern/vfs_mount.c:vfs_donmount(). This limit seems archaic, especially > > given use cases like virtualization (large filesystem structures to > > support underlying VMs), builds (which often make extensive use of > > chroot with nullfs/NFS), ZFS, snapshots, etc. Does anyone object to > > bumping MNAMELEN to 1024 (PATH_MAX/MAXPATHLEN)? Or some other > > reasonably large value? > > There is nothing inherently wrong with bumping the length. But the > work required is probably more then you estimated. The cause is the > ABI breakage. For sure, you will need to provide the shims for compat > syscalls. Unfortunately, this is not enough. > > Even quick look over our tree shows that struct statfs is used in API by > several base libraries. Look e.g. at the getmntinfo(3). Libc would need > shims too, at least. > > You will need to do the ABI analisys of the whole system, provide the > shims for the symbol-versioned libraries, and bump so version for > unversioned. I could do it as part of my 64-bit ino_t GSoC project. So should I change MNAMELEN to 1024? What about MFSNAMELEN, it's now 16. I think removing or increasing size of f_charspare field might also be a good idea. Thanks, Gleb. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 11:01:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23F7E106566B for ; Wed, 22 Jun 2011 11:01:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id B3F8D8FC12 for ; Wed, 22 Jun 2011 11:01:00 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p5MB0sGE091764 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 22 Jun 2011 14:00:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p5MB0rSA014022; Wed, 22 Jun 2011 14:00:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p5MB0rrT014021; Wed, 22 Jun 2011 14:00:53 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 22 Jun 2011 14:00:53 +0300 From: Kostik Belousov To: Gleb Kurtsou Message-ID: <20110622110053.GL48734@deviant.kiev.zoral.com.ua> References: <20110621192147.GJ48734@deviant.kiev.zoral.com.ua> <20110622101328.GA19866@tops> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="S61UZXTAKvsJ60P5" Content-Disposition: inline In-Reply-To: <20110622101328.GA19866@tops> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: bumping mount path lengths in struct statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 11:01:01 -0000 --S61UZXTAKvsJ60P5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 22, 2011 at 01:13:28PM +0300, Gleb Kurtsou wrote: > On (21/06/2011 22:21), Kostik Belousov wrote: > > On Tue, Jun 21, 2011 at 09:16:18AM -0600, Will Andrews wrote: > > > Hi, > > >=20 > > > struct statfs contains the following: > > >=20 > > > 90 char f_mntfromname[MNAMELEN]; /* mounted filesystem= */ > > > 91 char f_mntonname[MNAMELEN]; /* directory on which= mounted */ > > >=20 > > > Where MNAMELEN is, currently, 88. These limit the length of the path > > > that a filesystem can be mounted to. This is enforced by > > > kern/vfs_mount.c:vfs_donmount(). This limit seems archaic, especially > > > given use cases like virtualization (large filesystem structures to > > > support underlying VMs), builds (which often make extensive use of > > > chroot with nullfs/NFS), ZFS, snapshots, etc. Does anyone object to > > > bumping MNAMELEN to 1024 (PATH_MAX/MAXPATHLEN)? Or some other > > > reasonably large value? > >=20 > > There is nothing inherently wrong with bumping the length. But the > > work required is probably more then you estimated. The cause is the > > ABI breakage. For sure, you will need to provide the shims for compat > > syscalls. Unfortunately, this is not enough. > >=20 > > Even quick look over our tree shows that struct statfs is used in API by > > several base libraries. Look e.g. at the getmntinfo(3). Libc would need > > shims too, at least. > >=20 > > You will need to do the ABI analisys of the whole system, provide the > > shims for the symbol-versioned libraries, and bump so version for > > unversioned. >=20 > I could do it as part of my 64-bit ino_t GSoC project. So should I > change MNAMELEN to 1024? What about MFSNAMELEN, it's now 16. > I think removing or increasing size of f_charspare field might also be a > good idea. I think it may be good to have this change done. I do think that this is=20 of much lower importance then 64bit ino_t. Also, if done, I prefer to have struct statfs change be separate from ino_t change. That said, statfs change should be much smaller then ino_t, thus easier to review, allowing it to be committed before ino_t. Increasing f_fstypenamehave no obvious benefits, do we expect to have files= ystem types with the names longer then 16 bytes ? I do not think that having much spare fields in struct statfs is definitely good. The structure will become quite large as is after MNAMELEN bump, and the balance should be provided between application memory usage and future extensibility. --S61UZXTAKvsJ60P5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4By2UACgkQC3+MBN1Mb4gArQCfZMdNIQKAO2X6a1jyYKo7x3UL kqwAnRX2UYQ2m58WuP7ODDg4jgcK1wg1 =Ja89 -----END PGP SIGNATURE----- --S61UZXTAKvsJ60P5-- From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 11:42:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F39B4106566B for ; Wed, 22 Jun 2011 11:42:27 +0000 (UTC) (envelope-from frimik@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id AD6938FC1A for ; Wed, 22 Jun 2011 11:42:27 +0000 (UTC) Received: by vxg33 with SMTP id 33so721047vxg.13 for ; Wed, 22 Jun 2011 04:42:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=dp+0NpaiQzACfmOz1W8ZUMZmoKnJaLtd351LPf+xnlg=; b=pPX7gQ9Jy418EmYy5WbIuNpy/nAe08gh/I+KOvRW2xs6b369JVg2IzPE75r5eZNlfr Zcj8VfazPh7BtMNEjV3+dPt8m+CDPxNoQFU5InV23OQarCNgbx/GvaPQhD+/tE/g5oXg dv1xF6m4k6h7mxlaPpv66UZO2xrRLv295w7F8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=RkEzV6aRmmu5JVdxcYqw9zNnOV4fIVgG/RSH04pA+J5wy/VVn6n00PjgLkT1TZcW+B 4D+AABIm1ZV2REINjqg7gGnmMIvsuvTTeFpETg4mZ8jPJyJAO0eRJylTfyQEMhnA+bFJ KM/YzEsOlg7515I6bWL98xTO5vl5s0Wy561xA= MIME-Version: 1.0 Received: by 10.220.147.201 with SMTP id m9mr183723vcv.264.1308741505209; Wed, 22 Jun 2011 04:18:25 -0700 (PDT) Received: by 10.220.100.138 with HTTP; Wed, 22 Jun 2011 04:18:25 -0700 (PDT) In-Reply-To: References: Date: Wed, 22 Jun 2011 13:18:25 +0200 Message-ID: From: Mikael Fridh To: Wiktor Niesiobedzki Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS L2ARC hit ratio X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 11:42:28 -0000 On Wed, Jun 22, 2011 at 6:51 AM, Wiktor Niesiobedzki wrote: > 2011/6/22 Artem Belevich : >> >> L2ARC is filled with items evicted from ARC. The catch is that L2ARC >> writes are intentionally throttled. When L2ARC is empty writes happen >> at a higher rate, but it's still intentionally low so that >> read-optimized cache device does not wear out too soon. The bottom >> line is that not all the data spilled out of ARC ends up in L2ARC on >> the first try. Re-run your experiment again and you would probably see >> some improvement in L2ARC hit rates. > > I've run the experiment 3 times with no extent. Funny thing is: > - in first run, I see a lot of write activity against cache device > - in second run, I see no write activity against cache device, nor read activity What about read activity from vdevs? > So my guess is, that anyhow, ZFS cache layer knows, that this file is > *there*, though it decides not to serve it from L2ARC... Have you considered, that it is mostly served from ARC? Or even from the underlying vdevs? How much RAM have you got? And thus; How big is your ARC :) -- Mikael From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 00:20:52 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84934106564A for ; Thu, 23 Jun 2011 00:20:52 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp8.server.rpi.edu (smtp8.server.rpi.edu [128.113.2.228]) by mx1.freebsd.org (Postfix) with ESMTP id 292378FC16 for ; Thu, 23 Jun 2011 00:20:51 +0000 (UTC) Received: from gilead.netel.rpi.edu (gilead.netel.rpi.edu [128.113.124.121]) by smtp8.server.rpi.edu (8.13.1/8.13.1) with ESMTP id p5MNJv2D018201; Wed, 22 Jun 2011 19:19:57 -0400 Message-ID: <4E027897.8080700@FreeBSD.org> Date: Wed, 22 Jun 2011 19:19:51 -0400 From: Garance A Drosehn User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4 MIME-Version: 1.0 To: Gleb Kurtsou References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> In-Reply-To: <20110120124108.GA32866@tops.skynet.lt> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Bayes-Prob: 0.0001 (Score 0) X-RPI-SA-Score: 1.50 (*) [Hold at 12.00] COMBINED_FROM,RATWARE_GECKO_BUILD X-CanItPRO-Stream: outgoing X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.228 Cc: freebsd-fs@FreeBSD.org Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 00:20:52 -0000 On 1/20/11 7:41 AM, Gleb Kurtsou wrote: > I've updated the patch. New version is available here: > https://github.com/downloads/glk/freebsd-ino64/freebsd-ino64-patch-2011-01-20.tgz > > Changelog: > * Add fts, ftw, nftw compat shims in libc > * Place libc compat shims in separate files, don't hack original > implementations. > * Fix dump/restore > * Use ino_t in UFS code (suggested by Kirk McKusick) > * Keep ufs_ino_t (32 bit) for boot2 not to increase size > Sorry for replying to an older message, but a reply made in a different thread reminded me about this project... Also, I may have asked this before. In fact, I'm almost sure that I started a reply to this back in Jan/Feb, but my email client claims I never replied to this topic... Are you increasing only the size of ino_t, or could you also look at increasing the size of dev_t? (just curious...) -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 06:43:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75A451065677; Thu, 23 Jun 2011 06:43:50 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id C3B8C8FC0A; Thu, 23 Jun 2011 06:43:49 +0000 (UTC) Received: by fxm11 with SMTP id 11so1627235fxm.13 for ; Wed, 22 Jun 2011 23:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=meFfmz2qbXsnHPJzJKUf5mpXgd6uRgjD1Aw0tzx7Jv8=; b=MCbDvSn+E/DbZ4YjVdXqL8nGNFDtLPQfjCA5rqFx17GFkmlKzfzRMnDri32w9oEhDz JrULLd8/xTO0CDznR+aH/BkxfOpWChhC4EW7XaYEJZvAULhqBM4CRUiRCI8tutlpxMcJ rtx9dlCHR4mZm11QUETPfXGvukY050NG8FRSM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=kVXaxeDSdeHilm/+kIw43JFHzQknNaTLI27QYaxUGfkFklho/M3jTlDF9KZ/Ly3nyv 9df6C2NGefQUWe8FgrYcPE4h1D+WEZP8xkhRkXDEwjdcoN1HhxsEf0jF4Yc3AihesyF3 iNSoO5hF1YNWiS0W3Kenjg+eSCI7LcZyyV8PA= Received: by 10.223.145.24 with SMTP id b24mr2150987fav.89.1308811428281; Wed, 22 Jun 2011 23:43:48 -0700 (PDT) Received: from localhost (lan-78-157-92-5.vln.skynet.lt [78.157.92.5]) by mx.google.com with ESMTPS id e16sm773166fak.17.2011.06.22.23.43.45 (version=SSLv3 cipher=OTHER); Wed, 22 Jun 2011 23:43:47 -0700 (PDT) Date: Thu, 23 Jun 2011 09:43:33 +0300 From: Gleb Kurtsou To: Garance A Drosehn Message-ID: <20110623064333.GA2823@tops> References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <4E027897.8080700@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org, Matthew Fleming Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 06:43:50 -0000 On (22/06/2011 19:19), Garance A Drosehn wrote: > On 1/20/11 7:41 AM, Gleb Kurtsou wrote: > > I've updated the patch. New version is available here: > > https://github.com/downloads/glk/freebsd-ino64/freebsd-ino64-patch-2011-01-20.tgz > > > > Changelog: > > * Add fts, ftw, nftw compat shims in libc > > * Place libc compat shims in separate files, don't hack original > > implementations. > > * Fix dump/restore > > * Use ino_t in UFS code (suggested by Kirk McKusick) > > * Keep ufs_ino_t (32 bit) for boot2 not to increase size > > > Sorry for replying to an older message, but a reply made in a different > thread reminded me about this project... > > Also, I may have asked this before. In fact, I'm almost sure that I started > a reply to this back in Jan/Feb, but my email client claims I never replied > to this topic... > > Are you increasing only the size of ino_t, or could you also look at > increasing the size of dev_t? (just curious...) Sure. Incorporating as much of similar changes as possible is good. I've added Kostik and Matthew to CC list, it's for them to decide. dev_t on other OSes: NetBSD - uint64_t DragonFly - uint32_t Darwin - __int32_t OpenSolaris - ulong_t Linux - __u32 Considering this I think 3rd party software is not ready for such change. Major/minor mapping to dev_t will get more complicated. And the most important question: what would you want it for? As far as I can see major/minor numbers are ignored nowadays, major is zero, minor increases independently of device type: crw-r--r-- 1 root wheel 0, 31 Jun 23 09:00 acpi crw-r----- 1 root operator 0, 74 Jun 23 09:00 ada0 crw-r----- 1 root operator 0, 75 Jun 23 09:00 ada0s1 crw-r----- 1 root operator 0, 76 Jun 23 09:00 ada0s2 crw-r----- 1 root operator 0, 77 Jun 23 09:00 ada0s3 crw-r----- 1 root operator 0, 81 Jun 23 09:00 ada0s3a crw-r----- 1 root operator 0, 82 Jun 23 09:00 ada0s3b crw-r----- 1 root operator 0, 83 Jun 23 09:00 ada0s3d crw-r----- 1 root operator 0, 78 Jun 23 09:00 ada0s4 crw------- 1 root operator 0, 30 Jun 23 09:00 ata crw------- 1 root wheel 0, 34 Jun 23 09:00 atkbd0 crw------- 1 root wheel 0, 25 Jun 23 09:01 bpf lrwxr-xr-x 1 root wheel 3 Jun 23 09:00 bpf0 -> bpf crw-rw-rw- 1 root wheel 0, 37 Jun 23 09:00 bpsm0 crw-r----- 1 root operator 0, 73 Jun 23 09:00 cd0 crw------- 1 root wheel 0, 6 Jun 23 09:01 console crw------- 1 root wheel 0, 54 Jun 23 09:00 consolectl crw-rw-rw- 1 root wheel 0, 17 Jun 23 09:00 ctty crw------- 1 root wheel 0, 5 Jun 23 09:00 devctl cr-------- 1 root wheel 0, 70 Jun 23 09:00 devstat crw-rw-rw- 1 root wheel 0, 112 Jun 23 09:01 dsp0.0 crw-rw-rw- 1 root wheel 0, 111 Jun 23 09:01 dsp1.0 crw-rw-rw- 1 root wheel 0, 110 Jun 23 09:01 dsp2.0 crw-rw-rw- 1 root wheel 0, 109 Jun 23 09:01 dsp3.0 crw-rw-rw- 1 root wheel 0, 108 Jun 23 09:01 dsp4.0 lrwxr-xr-x 1 root wheel 15 Jun 23 09:01 dumpdev -> /dev/label/swap dr-xr-xr-x 2 root wheel 512 Jun 23 09:00 ext2fs dr-xr-xr-x 2 root wheel 512 Jun 23 09:00 fd crw------- 1 root wheel 0, 19 Jun 23 09:00 fido crw-r----- 1 root operator 0, 3 Jun 23 09:00 geom.ctl crw------- 1 root wheel 0, 16 Jun 23 09:00 io lrwxr-xr-x 1 root wheel 6 Jun 23 09:00 kbd0 -> atkbd0 crw------- 1 root wheel 0, 7 Jun 23 09:00 klog crw-r----- 1 root kmem 0, 21 Jun 23 09:00 kmem dr-xr-xr-x 2 root wheel 512 Jun 23 09:00 label lrwxr-xr-x 1 root wheel 12 Jun 23 09:01 log -> /var/run/log crw------- 1 root wheel 0, 63 Jun 23 09:00 mdctl crw-r----- 1 root kmem 0, 20 Jun 23 09:00 mem crw-rw-rw- 1 root wheel 0, 14 Jun 23 09:00 midistat crw-rw-rw- 1 root wheel 0, 58 Jun 23 09:00 mixer0 crw-rw-rw- 1 root wheel 0, 59 Jun 23 09:00 mixer1 crw-rw-rw- 1 root wheel 0, 60 Jun 23 09:00 mixer2 crw-rw-rw- 1 root wheel 0, 61 Jun 23 09:00 mixer3 crw-rw-rw- 1 root wheel 0, 62 Jun 23 09:00 mixer4 crw------- 1 root kmem 0, 18 Jun 23 09:00 nfslock dr-xr-xr-x 2 root wheel 512 Jun 23 09:00 ntfs crw-rw-rw- 1 root wheel 0, 22 Jun 23 09:33 null crw-rw-rw- 1 root wheel 0, 32 Jun 23 09:00 nvidia0 crw-rw-rw- 1 root wheel 0, 33 Jun 23 09:00 nvidiactl crw------- 1 root operator 0, 71 Jun 23 09:00 pass0 crw------- 1 root operator 0, 72 Jun 23 09:00 pass1 crw-r--r-- 1 root wheel 0, 24 Jun 23 09:00 pci crw------- 1 root wheel 0, 103 Jun 23 09:01 pf crw-rw-rw- 1 root wheel 0, 36 Jun 23 09:00 psm0 crw-rw-rw- 1 root wheel 0, 27 Jun 23 09:00 ptmx dr-xr-xr-x 2 root wheel 512 Jun 23 09:11 pts crw-rw-rw- 1 root wheel 0, 28 Jun 23 09:01 random cr--r--r-- 1 root wheel 0, 4 Jun 23 09:00 sndstat lrwxr-xr-x 1 root wheel 4 Jun 23 09:00 stderr -> fd/2 lrwxr-xr-x 1 root wheel 4 Jun 23 09:00 stdin -> fd/0 lrwxr-xr-x 1 root wheel 4 Jun 23 09:00 stdout -> fd/1 crw------- 1 root wheel 0, 15 Jun 23 09:00 sysmouse crw------- 1 root wheel 0, 38 Jun 23 09:01 ttyv0 crw------- 1 root wheel 0, 39 Jun 23 09:01 ttyv1 crw------- 1 root wheel 0, 40 Jun 23 09:01 ttyv2 crw------- 1 root wheel 0, 41 Jun 23 09:01 ttyv3 crw------- 1 root wheel 0, 42 Jun 23 09:01 ttyv4 crw------- 1 root wheel 0, 43 Jun 23 09:01 ttyv5 crw------- 1 root wheel 0, 44 Jun 23 09:01 ttyv6 crw------- 1 root wheel 0, 45 Jun 23 09:01 ttyv7 crw------- 1 root wheel 0, 46 Jun 23 09:00 ttyv8 crw------- 1 root wheel 0, 47 Jun 23 09:00 ttyv9 crw------- 1 root wheel 0, 48 Jun 23 09:00 ttyva crw------- 1 root wheel 0, 49 Jun 23 09:00 ttyvb crw------- 1 root wheel 0, 50 Jun 23 09:00 ttyvc crw------- 1 root wheel 0, 51 Jun 23 09:00 ttyvd crw------- 1 root wheel 0, 52 Jun 23 09:00 ttyve crw------- 1 root wheel 0, 53 Jun 23 09:00 ttyvf dr-xr-xr-x 2 root wheel 512 Jun 23 09:00 ufs lrwxr-xr-x 1 root wheel 9 Jun 23 09:00 ugen0.1 -> usb/0.1.0 lrwxr-xr-x 1 root wheel 9 Jun 23 09:00 ugen0.2 -> usb/0.2.0 lrwxr-xr-x 1 root wheel 9 Jun 23 09:00 ugen1.1 -> usb/1.1.0 lrwxr-xr-x 1 root wheel 9 Jun 23 09:00 ugen1.2 -> usb/1.2.0 lrwxr-xr-x 1 root wheel 9 Jun 23 09:00 ugen1.3 -> usb/1.3.0 crw-r--r-- 1 root operator 0, 98 Jun 23 09:00 ums0 lrwxr-xr-x 1 root wheel 6 Jun 23 09:00 urandom -> random dr-xr-xr-x 2 root wheel 512 Jun 23 09:00 usb crw-r--r-- 1 root operator 0, 56 Jun 23 09:00 usbctl crw------- 1 root wheel 0, 85 Jun 23 09:01 vboxnetctl crw------- 1 root operator 0, 57 Jun 23 09:00 xpt0 crw-rw-rw- 1 root wheel 0, 23 Jun 23 09:00 zero crw-rw-rw- 1 root operator 0, 55 Jun 23 09:00 zfs Thanks, Gleb. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 08:11:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08ED91065670 for ; Thu, 23 Jun 2011 08:11:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 8317C8FC0A for ; Thu, 23 Jun 2011 08:11:47 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p5N8Be9f016800 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 23 Jun 2011 11:11:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p5N8Bexu041362; Thu, 23 Jun 2011 11:11:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p5N8BeQf041361; Thu, 23 Jun 2011 11:11:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 23 Jun 2011 11:11:40 +0300 From: Kostik Belousov To: Gleb Kurtsou Message-ID: <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> <20110623064333.GA2823@tops> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5ggoG3uILOsAbQb2" Content-Disposition: inline In-Reply-To: <20110623064333.GA2823@tops> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 08:11:48 -0000 --5ggoG3uILOsAbQb2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: > On (22/06/2011 19:19), Garance A Drosehn wrote: > > On 1/20/11 7:41 AM, Gleb Kurtsou wrote: > > > I've updated the patch. New version is available here: > > > https://github.com/downloads/glk/freebsd-ino64/freebsd-ino64-patch-20= 11-01-20.tgz > > > > > > Changelog: > > > * Add fts, ftw, nftw compat shims in libc > > > * Place libc compat shims in separate files, don't hack original > > > implementations. > > > * Fix dump/restore > > > * Use ino_t in UFS code (suggested by Kirk McKusick) > > > * Keep ufs_ino_t (32 bit) for boot2 not to increase size > > > =20 > > Sorry for replying to an older message, but a reply made in a different > > thread reminded me about this project... > >=20 > > Also, I may have asked this before. In fact, I'm almost sure that I st= arted > > a reply to this back in Jan/Feb, but my email client claims I never rep= lied > > to this topic... > >=20 > > Are you increasing only the size of ino_t, or could you also look at > > increasing the size of dev_t? (just curious...) >=20 > Sure. Incorporating as much of similar changes as possible is good. > I've added Kostik and Matthew to CC list, it's for them to decide. >=20 > dev_t on other OSes: > NetBSD - uint64_t > DragonFly - uint32_t > Darwin - __int32_t=20 > OpenSolaris - ulong_t > Linux - __u32 >=20 > Considering this I think 3rd party software is not ready for such > change. >=20 > Major/minor mapping to dev_t will get more complicated. >=20 > And the most important question: what would you want it for? As far as I Indeed, this is the right question. > can see major/minor numbers are ignored nowadays, major is zero, minor > increases independently of device type: This is only because you have too little /dev nodes. Look at the definitions of the major/minor in sys/types.h. --5ggoG3uILOsAbQb2 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4C9TsACgkQC3+MBN1Mb4hOUACeJkrSB7qRjgbCOx3ky68kn+Be xg8Anj0HolhhVzm0KYtLOiroNlVeodqg =82h2 -----END PGP SIGNATURE----- --5ggoG3uILOsAbQb2-- From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 09:18:08 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C395106566C for ; Thu, 23 Jun 2011 09:18:08 +0000 (UTC) (envelope-from vnik@arqa.ru) Received: from grom.arqa.ru (grom.sicex.ru [193.178.135.38]) by mx1.freebsd.org (Postfix) with ESMTP id 847738FC1A for ; Thu, 23 Jun 2011 09:18:06 +0000 (UTC) Received: from maik.arqa.ru (maik [192.168.47.115]) by grom.arqa.ru (8.14.3/8.14.3) with ESMTP id p5N91qXb012769 for ; Thu, 23 Jun 2011 16:01:52 +0700 (NOVST) (envelope-from vnik@arqa.ru) Received: from [127.0.0.1] (pc-vnik.arqa.ru [192.168.40.162]) by maik.arqa.ru (8.14.4/8.14.4) with ESMTP id p5N91q5B057168 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 23 Jun 2011 16:01:52 +0700 (NOVST) (envelope-from vnik@arqa.ru) Message-ID: <4E0300FE.9000205@arqa.ru> Date: Thu, 23 Jun 2011 16:01:50 +0700 From: Nikitin Vitaly User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1.16) Gecko/20101125 Thunderbird/3.0.11 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0284], KAS30/Release X-SpamTest-Info: Not protected X-Anti-Virus: Kaspersky Anti-Virus for Linux Mail Server 5.6.39/RELEASE, bases: 20110623 #5620914, check: 20110623 notchecked Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: problem with copying data on hast-device X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 09:18:08 -0000 Hello! I seem to have some trouble with hastd, I hope you can help me in this. I set up hastd successfully on two nodes for testing. Everything seems to work fine, I set up hast.conf, do the role part, newfs and mount. On the first node I set up primary role. On the second node I set up secondary role. Problem: When I start copying about 300-400GB data from hdd to the hast mount device, the whole system freezes, in that time gstat showed hast/[devicename] %busy 99.9 No error msg in log files found. The problem repeats even if I copy data from the network or with help of rsync, on the hast mount device. I try to switch off hastd service on the second node, but this isn't take effect. Problem starts randomly. Sometimes, the copying small data is done, but more not. Please help me, if you can. Environment: 8.2-RELEASE FreeBSD 8.2-RELEASE -- C уважением Ðикитин Ð’.Ð’., тел. 430 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 09:27:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0F471065675 for ; Thu, 23 Jun 2011 09:27:36 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id B804B8FC1B for ; Thu, 23 Jun 2011 09:27:36 +0000 (UTC) Received: from omta20.emeryville.ca.mail.comcast.net ([76.96.30.87]) by qmta03.emeryville.ca.mail.comcast.net with comcast id zMR41g0021smiN4A3MTagK; Thu, 23 Jun 2011 09:27:34 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.emeryville.ca.mail.comcast.net with comcast id zMUV1g00J1t3BNj8gMUVyn; Thu, 23 Jun 2011 09:28:30 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 10B06102C36; Thu, 23 Jun 2011 02:27:35 -0700 (PDT) Date: Thu, 23 Jun 2011 02:27:35 -0700 From: Jeremy Chadwick To: Alexander Motin Message-ID: <20110623092735.GA2464@icarus.home.lan> References: <20110618005124.GA43568@icarus.home.lan> <20110621191626.GA99204@icarus.home.lan> <4E01AFBA.809@FreeBSD.org> <20110622103703.GA14901@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110622103703.GA14901@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: MFC: graid(8) (RAID GEOM) support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 09:27:37 -0000 On Wed, Jun 22, 2011 at 03:37:03AM -0700, Jeremy Chadwick wrote: > On Wed, Jun 22, 2011 at 12:02:50PM +0300, Alexander Motin wrote: > > Jeremy Chadwick wrote: > > > On Fri, Jun 17, 2011 at 05:51:24PM -0700, Jeremy Chadwick wrote: > > >> Sorry for the cross-post, but I thought both lists would want to know > > >> about this. > > >> > > >> Looks like mav@ just committed this ~17 hours ago: > > >> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c > > >> > > >> Those who have historically wanted to use Intel MatrixRAID (now called > > >> Intel RST (Rapid Storage Technology)), but haven't due to the severe > > >> issues/risks with ataraid(4), will probably be very interested in > > >> this commit. I know I am! > > >> > > >> I plan on stress-testing the Intel support on a 2-disk system with > > >> RAID-1 enabled, and will document my experiences, procedures, etc... > > >> > > >> Thanks, mav@ and imp@ ! > > >> > > >> I'll be sending another mail momentarily asking about USB memory stick > > >> image building, since to accomplish the above, I want to do a > > >> "bare-bones" install on our test system (e.g. enable Intel RAID, set up > > >> 2 disks in a RAID-1 mirror, boot a USB memory stick that contains this > > >> latest RELENG_8 build, and do sysinstall, etc.. the normal way). > > >> > > >> > > >> ===================================================================== > > >> MFC r219974, r220209, r220210, r220790: > > >> Add new RAID GEOM class, that is going to replace ataraid(4) in supporting > > >> various BIOS-based software RAIDs. Unlike ataraid(4) this implementation > > >> does not depend on legacy ata(4) subsystem and can be used with any disk > > >> drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4) > > >> with `options ATA_CAM`). To make code more readable and extensible, this > > >> implementation follows modular design, including core part and two sets > > >> of modules, implementing support for different metadata formats and RAID > > >> levels. > > >> > > >> Support for such popular metadata formats is now implemented: > > >> Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage. > > >> > > >> Such RAID levels are now supported: > > >> RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT. > > >> > > >> For all of these RAID levels and metadata formats this class supports > > >> full cycle of volume operations: reading, writing, creation, deletion, > > >> disk removal and insertion, rebuilding, dirty shutdown detection > > >> and resynchronization, bad sector recovery, faulty disks tracking, > > >> hot-spare disks. For Intel and Promise formats there is support multiple > > >> volumes per disk set. > > >> > > >> Look graid(8) manual page for additional details. > > >> > > >> Co-authored by: imp > > >> Sponsored by: Cisco Systems, Inc. and iXsystems, Inc. > > >> ===================================================================== > > > > > > By the way, it doesn't look like the graid(8) man page is being brought > > > in to the base system on either of the two RELENG_8 systems I've rebuilt > > > in the past few days. > > > > > > I'm thinking /usr/src/sbin/geom/class/raid/graid.8 isn't being noticed > > > as a man page. > > > > > > /usr/src/sbin/geom/class/raid/Makefile doesn't have MAN8=graid.8 in it, > > > is that the problem? > > > > I've just rebuilt my test 8-STABLE system and it installed graid(8). > > Hmm, there must be something I'm missing either in the base system or > the kernel or both. Does this kernel module and/or bits and pieces not > get built unless it's included strictly in the kernel? > > Below is one of the two systems, looking for both graid* and geom_raid*. > There's the old geom_raid3 stuff there, and the source bits/pieces for > the new graid(8), but nothing seems built (including kernel module) for > the new graid(8). > > If you'd like I can rm -fr /usr/src/* ; rm -fr /var/db/sup/src-all and > then re-download source from an official cvsup mirror (I've been using > cvsup9.freebsd.org for both boxes). > > icarus# uname -a > FreeBSD icarus.home.lan 8.2-STABLE FreeBSD 8.2-STABLE #0: Fri Jun 17 18:01:45 PDT 2011 root@icarus.home.lan:/usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64 amd64 > icarus# find /usr -name "graid*" -ls > 3211128 8 -r--r--r-- 1 root wheel 2521 Jun 17 18:25 /usr/share/man/man8/graid3.8.gz > 169318 16 -rw-r--r-- 1 root wheel 6390 Aug 3 2009 /usr/src/sbin/geom/class/raid3/graid3.8 > 169624 16 -rw-r--r-- 1 root wheel 8126 Jun 16 23:59 /usr/src/sbin/geom/class/raid/graid.8 > 921430 8 -rw-r--r-- 1 root wheel 2521 Jun 17 17:51 /usr/obj/usr/src/sbin/geom/class/raid3/graid3.8.gz > 3369372 4 drwxr-xr-x 2 root wheel 512 May 3 03:58 /usr/ports/sysutils/graid5 > icarus# > icarus# find /boot -name "graid*" -ls > icarus# > icarus# find /usr -name "geom_raid*" -ls > 169317 20 -rw-r--r-- 1 root wheel 9257 Jan 18 21:13 /usr/src/sbin/geom/class/raid3/geom_raid3.c > 165265 8 -rw-r--r-- 1 root wheel 2992 Jun 16 23:59 /usr/src/sbin/geom/class/raid/geom_raid.c > 259652 4 drwxr-xr-x 2 root wheel 512 Jun 6 06:28 /usr/src/sys/modules/geom/geom_raid3 > 285292 4 drwxr-xr-x 2 root wheel 512 Jun 17 17:17 /usr/src/sys/modules/geom/geom_raid > 262798 4 drwxr-xr-x 2 root wheel 512 Jun 6 06:29 /usr/src/tools/regression/geom_raid3 > 921428 48 -rw-r--r-- 1 root wheel 22760 Jun 17 17:51 /usr/obj/usr/src/sbin/geom/class/raid3/geom_raid3.So > 921431 64 -rwxr-xr-x 1 root wheel 32207 Jun 17 17:51 /usr/obj/usr/src/sbin/geom/class/raid3/geom_raid3.so > 1014175 4 drwxr-xr-x 2 root wheel 512 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3 > 1015257 736 -rw-r--r-- 1 root wheel 359456 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3/geom_raid3.ko.debug > 1015258 640 -rw-r--r-- 1 root wheel 304432 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3/geom_raid3.ko.symbols > 1015259 272 -rw-r--r-- 1 root wheel 137448 Jun 17 18:00 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid3/geom_raid3.ko > icarus# > icarus# find /boot -name "geom_raid*" -ls > 94943 272 -r-xr-xr-x 1 root wheel 137448 Jun 17 18:24 /boot/kernel/geom_raid3.ko > 94944 640 -r-xr-xr-x 1 root wheel 304432 Jun 17 18:24 /boot/kernel/geom_raid3.ko.symbols > 71074 272 -r-xr-xr-x 1 root wheel 137448 Jun 6 05:35 /boot/kernel.old/geom_raid3.ko > 71075 640 -r-xr-xr-x 1 root wheel 304432 Jun 6 05:35 /boot/kernel.old/geom_raid3.ko.symbols A follow-up to this issue. I went ahead with the following: rm -fr /usr/obj rm -fr /usr/src/* rm -fr /var/db/sup/src-all csup -4 -L 2 -h cvsup10.freebsd.org /usr/share/examples/cvsup/stable-supfile <...put my kernel config back into in /sys/amd64/conf...> cd /usr/src make -j4 buildworld && make -j4 buildkernel And here's what I've in /usr/obj. Note that the find statement is looking for graid* and geom_raid* while excluding raid3 stuff: icarus# find /usr/obj \( -name "graid*" -or -name "geom_raid*" \) -and \! -name "*raid3*" -ls 921216 16 -rw-r--r-- 1 root wheel 6600 Jun 23 01:10 /usr/obj/usr/src/sbin/geom/class/raid/geom_raid.So 921219 8 -rw-r--r-- 1 root wheel 2952 Jun 23 01:10 /usr/obj/usr/src/sbin/geom/class/raid/graid.8.gz 921223 44 -rwxr-xr-x 1 root wheel 20913 Jun 23 01:10 /usr/obj/usr/src/sbin/geom/class/raid/geom_raid.so 991578 4 drwxr-xr-x 2 root wheel 1024 Jun 23 02:04 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid 992668 2368 -rw-r--r-- 1 root wheel 1181256 Jun 23 02:04 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid/geom_raid.ko.debug 992669 2048 -rw-r--r-- 1 root wheel 1029944 Jun 23 02:04 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid/geom_raid.ko.symbols 992670 640 -rw-r--r-- 1 root wheel 307696 Jun 23 02:04 /usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64/modules/usr/src/sys/modules/geom/geom_raid/geom_raid.ko This looks a *lot* better. Note that I went with cvsup10 instead of cvsup9; cvsup9 appears to be down right now (I've tried from other hosts), which makes me wonder if somehow that server is "out of whack" in some way. icarus# telnet cvsup9.freebsd.org 5999 Trying 208.83.20.166... ^C Sorry for the noise, Alexander! -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 15:23:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D6B5106566B for ; Thu, 23 Jun 2011 15:23:10 +0000 (UTC) (envelope-from hans@beastielabs.net) Received: from mail.beastielabs.net (beasties.demon.nl [82.161.3.114]) by mx1.freebsd.org (Postfix) with ESMTP id B8BA08FC0C for ; Thu, 23 Jun 2011 15:23:08 +0000 (UTC) Received: from testsoekris.hotsoft.nl (localhost [127.0.0.1]) by mail.beastielabs.net (8.14.4/8.14.4) with ESMTP id p5NEjehU021251; Thu, 23 Jun 2011 16:45:40 +0200 (CEST) (envelope-from hans@testsoekris.hotsoft.nl) Received: (from hans@localhost) by testsoekris.hotsoft.nl (8.14.4/8.14.4/Submit) id p5NEjcIP021250; Thu, 23 Jun 2011 16:45:38 +0200 (CEST) (envelope-from hans) Date: Thu, 23 Jun 2011 16:45:38 +0200 From: Hans Ottevanger To: Peter Jeremy Message-ID: <20110623144538.GA20990@testsoekris.hotsoft.nl> References: <20110617153415.GA92803@testsoekris.hotsoft.nl> <201106171842.p5HIgQjn018296@chez.mckusick.com> <20110621084645.GA68018@server.vk2pj.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110621084645.GA68018@server.vk2pj.dyndns.org> User-Agent: Mutt/1.4.2.3i Cc: Kirk McKusick , freebsd-fs@freebsd.org, Jeff Roberson Subject: Re: SU+J: negative used diskspace (for a while) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 15:23:10 -0000 On Tue, Jun 21, 2011 at 06:46:45PM +1000, Peter Jeremy wrote: > On 2011-Jun-17 11:42:26 -0700, Kirk McKusick wrote: > >> Date: Fri, 17 Jun 2011 17:34:15 +0200 > >> From: Hans Ottevanger > >> So it takes more than a minute before the disk space is back to "normal" > >> values. > > >We used to account for deleted blocks at the instant that they were > >removed. This accounting was rather complex, so as part of doing > >SU+J, Jeff simplified it. Under the simplification, the removal is > >not accounted for until part way through the removal process. The > >result is that you now get these false negative block counts until > >the blocks have been partially reclaimed. If this behavior causes > >enough trouble, Jeff might be convinced that the more accurate block > >accounting is necessary. > > Negative values may also impact NFS clients - though just limiting the > reported used space to 0 should avoid them getting too upset. > It appears that negative values occur only when the filesystem is almost empty (like in my tests). Otherwise the values are just way too low for a while. > That said, whilst I haven't seen negative used values, ZFS and > Solaris UFS also take an extended period before 'df' reports > correct values (several minutes for Solaris UFS). In the case > of ZFS, even 'du' can report incorrect information for a while. > If I remember well, in "the old days", when Soft Updates was fairly new, it also had such issues in the first release, but it was repaired soon afterwards. Of course it is a real advantage to see the results immediately when you are cleaning out filesystems, especially when you are in a hurry. I just verified that the current issue was quite recently introduced with commit r222958. Before that 'df' reported changes promptly and correctly, even with Journaling enabled. As an answer to my original message to -current@ Peter Holm has assured me (http://lists.freebsd.org/pipermail/freebsd-current/2011-June/025318.html) that this is a known issue that is on Jeff's TODO list. All I can say to the UFS+J guys is: keep up the good work. Journaling is a very cool addition to UFS, amongst others a real time saver at boot time after system crashes. Just a few years ago I saw people writing that adding journaling to UFS was not really possible and it would probably never happen. Well, here it is, though I understand that it was a bit more difficult than foreseen and still needs a lot of work 8-). Thanks, guys. Kind regards, Hans Ottevanger From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 15:50:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C873106566C for ; Thu, 23 Jun 2011 15:50:56 +0000 (UTC) (envelope-from pho@holm.cc) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.freebsd.org (Postfix) with SMTP id 018378FC0A for ; Thu, 23 Jun 2011 15:50:55 +0000 (UTC) Received: (qmail 9510 invoked from network); 23 Jun 2011 15:24:14 -0000 Received: from 87.58.145.224 (HELO x2.osted.lan) (87.58.145.224) by relay01.pair.com with SMTP; 23 Jun 2011 15:24:14 -0000 X-pair-Authenticated: 87.58.145.224 Received: from x2.osted.lan (localhost [127.0.0.1]) by x2.osted.lan (8.14.4/8.14.4) with ESMTP id p5NFOCW2048769; Thu, 23 Jun 2011 17:24:12 +0200 (CEST) (envelope-from pho@x2.osted.lan) Received: (from pho@localhost) by x2.osted.lan (8.14.4/8.14.4/Submit) id p5NFOC1A048768; Thu, 23 Jun 2011 17:24:12 +0200 (CEST) (envelope-from pho) Date: Thu, 23 Jun 2011 17:24:12 +0200 From: Peter Holm To: Hans Ottevanger Message-ID: <20110623152412.GA48574@x2.osted.lan> References: <20110617153415.GA92803@testsoekris.hotsoft.nl> <201106171842.p5HIgQjn018296@chez.mckusick.com> <20110621084645.GA68018@server.vk2pj.dyndns.org> <20110623144538.GA20990@testsoekris.hotsoft.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110623144538.GA20990@testsoekris.hotsoft.nl> User-Agent: Mutt/1.4.2.3i Cc: Kirk McKusick , freebsd-fs@freebsd.org, Jeff Roberson Subject: Re: SU+J: negative used diskspace (for a while) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 15:50:56 -0000 On Thu, Jun 23, 2011 at 04:45:38PM +0200, Hans Ottevanger wrote: > On Tue, Jun 21, 2011 at 06:46:45PM +1000, Peter Jeremy wrote: > > On 2011-Jun-17 11:42:26 -0700, Kirk McKusick wrote: > > >> Date: Fri, 17 Jun 2011 17:34:15 +0200 > > >> From: Hans Ottevanger > > >> So it takes more than a minute before the disk space is back to "normal" > > >> values. > > > > >We used to account for deleted blocks at the instant that they were > > >removed. This accounting was rather complex, so as part of doing > > >SU+J, Jeff simplified it. Under the simplification, the removal is > > >not accounted for until part way through the removal process. The > > >result is that you now get these false negative block counts until > > >the blocks have been partially reclaimed. If this behavior causes > > >enough trouble, Jeff might be convinced that the more accurate block > > >accounting is necessary. > > > > Negative values may also impact NFS clients - though just limiting the > > reported used space to 0 should avoid them getting too upset. > > > > It appears that negative values occur only when the filesystem is almost > empty (like in my tests). Otherwise the values are just way too low for > a while. > > > That said, whilst I haven't seen negative used values, ZFS and > > Solaris UFS also take an extended period before 'df' reports > > correct values (several minutes for Solaris UFS). In the case > > of ZFS, even 'du' can report incorrect information for a while. > > > > If I remember well, in "the old days", when Soft Updates was fairly new, > it also had such issues in the first release, but it was repaired soon > afterwards. > > Of course it is a real advantage to see the results immediately when you > are cleaning out filesystems, especially when you are in a hurry. > > I just verified that the current issue was quite recently introduced with > commit r222958. Before that 'df' reported changes promptly and correctly, > even with Journaling enabled. > > As an answer to my original message to -current@ Peter Holm has assured me > (http://lists.freebsd.org/pipermail/freebsd-current/2011-June/025318.html) > that this is a known issue that is on Jeff's TODO list. > Ah, well my exact wording was "This is a known issue and I believe that it is on Jeff's to-do list". So I'm afraid I can only assure you that the problem is known :) - Peter From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 21:07:05 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 36A3B1065678 for ; Thu, 23 Jun 2011 21:07:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id AFEC38FC1C for ; Thu, 23 Jun 2011 21:07:04 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5NL6xEB011795 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 24 Jun 2011 07:07:01 +1000 Date: Fri, 24 Jun 2011 07:06:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kostik Belousov In-Reply-To: <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> Message-ID: <20110624054322.V1086@besplex.bde.org> References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> <20110623064333.GA2823@tops> <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 21:07:05 -0000 On Thu, 23 Jun 2011, Kostik Belousov wrote: > On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: >> On (22/06/2011 19:19), Garance A Drosehn wrote: >>> On 1/20/11 7:41 AM, Gleb Kurtsou wrote: >>>> I've updated the patch. New version is available here: >>>> https://github.com/downloads/glk/freebsd-ino64/freebsd-ino64-patch-2011-01-20.tgz >>>> >>>> Changelog: >>>> * Add fts, ftw, nftw compat shims in libc >>>> * Place libc compat shims in separate files, don't hack original >>>> implementations. >>>> * Fix dump/restore >>>> * Use ino_t in UFS code (suggested by Kirk McKusick) Of course in must not use ino_t in the parts of ffs related to the on-disk inode. Your patch does this, but I wonder if converts from the disk inode to ino_t too early in some places. C's type system is too weak to find wrong conversions easily. On an old system, I once use funky types like double or a pointer for at least mode_t to find all the places that assumed mode_t to be an int. This helped find all the places that assumed it to be an int of a particular size. >>>> * Keep ufs_ino_t (32 bit) for boot2 not to increase size >>>> >>> Sorry for replying to an older message, but a reply made in a different >>> thread reminded me about this project... >>> >>> Also, I may have asked this before. In fact, I'm almost sure that I started >>> a reply to this back in Jan/Feb, but my email client claims I never replied >>> to this topic... >>> >>> Are you increasing only the size of ino_t, or could you also look at >>> increasing the size of dev_t? (just curious...) >> >> Sure. Incorporating as much of similar changes as possible is good. Increasing the size of dev_t would be negatively good. Even when the minor number was meaningful and was abused to encode device control sparsely, 4 billion devices is thousands of times as many as needed. Without the sparse mapping, it is millions as many as needed. Reducing it back to 16 bits like it was in FreeBSD-1 would be good, but would break portability. Finding all the places that assume that it is 32 bits and changing them to uint32_t would be good. ffs is already partly correct here (unlike for ino_t). Its di_rdev is di_db[0], and di_db is either ufs1_daddr_t (int32_t) or ufs2_daddr_t (int64_t). Thus the on-disk type is already independent of dev_t. But this is only the start of being correct. ffs does blind assignments to and from va_rdev to dev_t's, and suffer overflows if the types are different. I hope the new ino_t code doesn't do blind assignments. Since opening of device nodes on ffs file systems is no longer supported, the device numbers in di_rdev are only used for compatibility: - mknod() still works to create specified device numbers, provided they fit in a 32-bit dev_t (strictly, 32-bit ones don't fit since ufs1_daddr_t only has 31 value bits, but the overflow for blind assigment of the 32nd value bit is benign on all supported arches). So you can still back up your FreeBSD-4 /dev or maybe your Linux /dev on an ffs file system. - mknod() is still abused by badsect(8) to encode bad sector numbers in di_rdev. This may even still work for ffs1. It is broken for ffs2 by the type mismatch, and the blind assignments result in the error not being detected (ffs2 has 64-bit sector numbers, and its di_rdev can encode these, but mknod() can only pass 32-bit device numbers). FreeBSD-1 had the same problem with 16-bit device numbers not being able to encode 32-bit sector numbers. I hoped I fixed badsect(8) enough to detect all cases where the blind assignment will fail. >> I've added Kostik and Matthew to CC list, it's for them to decide. >> >> dev_t on other OSes: >> NetBSD - uint64_t >> DragonFly - uint32_t >> Darwin - __int32_t >> OpenSolaris - ulong_t >> Linux - __u32 >> >> Considering this I think 3rd party software is not ready for such >> change. Well, it should be ready, since the size depends on the O/S. Suppose a NetBSD system actually uses 64-bit device numbers. FreeBSD cannot support this now, so it should give an error for an attempt to back up a NetBSD /dev, but the blind assignments may break this. ulong_t on Solaris might give the same problem on 64-bit machines, but I guess ulong_t is actually an obfuscation of uint32_t. >> Major/minor mapping to dev_t will get more complicated. >> >> And the most important question: what would you want it for? As far as I > Indeed, this is the right question. > >> can see major/minor numbers are ignored nowadays, major is zero, minor >> increases independently of device type: > This is only because you have too little /dev nodes. How can he have >= 4G /dev nodes to test this? :-) Ah, I think I see: for devfs, the major number is normally 0, and minor numbers don't encode anything and are allocated sequentially and may differ across boots. But there are only 24 minor number bits according to major/minor, so the major must change from 0 to 1 on the 2**24 ~= 16 millionth device or earlier (I think actually on the 2**8th = 256th device, due to the encoding of major/minor being for compatibility with 16-bit dev_t). > Look at the definitions of the major/minor in sys/types.h. These are only for compatibility. Even expanding dev_t would break this compatibility. The types of breakage are easier to see for reducing dev_t back to 16 bits. Then for devfs, the major number should change from 0 to 1 on the 256'th device, but nothing should break until the 65536th device; the major/minor split that is still displayed by ls(1) is meaningless. For non-devfs, things like backing up OtherOS's /dev or even your own /dev to an ffs file system will break on the 65536th device; anything depending on the encoding of minor numbers or the major/minor split will break on the 256th minor, but I can't see how anything in FreeBSD can reasonably depend (dynamically) on this encoding or split -- the device number is just an index for an actual device, and you can't do anything with it in a device node except copy the node. Bruce From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 22:06:04 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A44E1065672 for ; Thu, 23 Jun 2011 22:06:04 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp8.server.rpi.edu (smtp8.server.rpi.edu [128.113.2.228]) by mx1.freebsd.org (Postfix) with ESMTP id 47A338FC08 for ; Thu, 23 Jun 2011 22:06:03 +0000 (UTC) Received: from gilead.netel.rpi.edu (gilead.netel.rpi.edu [128.113.124.121]) by smtp8.server.rpi.edu (8.13.1/8.13.1) with ESMTP id p5NM628f001167; Thu, 23 Jun 2011 18:06:02 -0400 Message-ID: <4E03B8C4.6040800@FreeBSD.org> Date: Thu, 23 Jun 2011 18:05:56 -0400 From: Garance A Drosehn User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4 MIME-Version: 1.0 To: Kostik Belousov References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> <20110623064333.GA2823@tops> <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> In-Reply-To: <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Bayes-Prob: 0.0001 (Score 0) X-RPI-SA-Score: 1.50 (*) [Hold at 12.00] COMBINED_FROM,RATWARE_GECKO_BUILD X-CanItPRO-Stream: outgoing X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.228 Cc: freebsd-fs@FreeBSD.org, Robert Watson Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 22:06:04 -0000 On 6/23/11 4:11 AM, Kostik Belousov wrote: > On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: > >> On (22/06/2011 19:19), Garance A Drosehn wrote: >> >>> Sorry for replying to an older message, but a reply made in a different >>> thread reminded me about this project... >>> >>> Also, I may have asked this before. In fact, I'm almost sure that I started >>> a reply to this back in Jan/Feb, but my email client claims I never replied >>> to this topic... >>> >>> Are you increasing only the size of ino_t, or could you also look at >>> increasing the size of dev_t? (just curious...) >>> >> Sure. Incorporating as much of similar changes as possible is good. >> I've added Kostik and Matthew to CC list, it's for them to decide. >> >> dev_t on other OSes: >> NetBSD - uint64_t >> DragonFly - uint32_t >> Darwin - __int32_t >> OpenSolaris - ulong_t >> Linux - __u32 >> >> Considering this I think 3rd party software is not ready for such >> change. >> >> Major/minor mapping to dev_t will get more complicated. >> >> And the most important question: what would you want it for? [...] >> > Indeed, this is the right question. > Consider the thread "Increasing the size of dev_t and ino_t" from freebsd-arch in 2002: http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html In particular, this message by Robert Watson: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch I just participated in an online conference for OpenAFS, and while it isn't exactly taking the world by storm, I keep thinking it would be useful if FreeBSD could map individual AFS volumes to unique dev_t identifiers. And given the way AFS is implemented (as a global filesystem with many cells all reachable at the same time), and given the way most sites deploy AFS (with thousands or tens-of-thousands of individual AFS volumes *per site*), that adds up to a lot of values for dev_t. The upcoming release of OpenAFS should include a working and pretty stable AFS client for FreeBSD, so having a larger dev_t would have a more immediate application than it did back in 2002. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-fs@FreeBSD.ORG Thu Jun 23 22:26:35 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0EAE4106566C; Thu, 23 Jun 2011 22:26:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 9B77B8FC0A; Thu, 23 Jun 2011 22:26:34 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p5NMQVjq033221 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 24 Jun 2011 01:26:31 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p5NMQU7o034185; Fri, 24 Jun 2011 01:26:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p5NMQUqM034184; Fri, 24 Jun 2011 01:26:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 24 Jun 2011 01:26:30 +0300 From: Kostik Belousov To: Garance A Drosehn Message-ID: <20110623222630.GU48734@deviant.kiev.zoral.com.ua> References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> <20110623064333.GA2823@tops> <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> <4E03B8C4.6040800@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Q5gW42QHvCgEka9W" Content-Disposition: inline In-Reply-To: <4E03B8C4.6040800@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, Robert Watson Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 22:26:35 -0000 --Q5gW42QHvCgEka9W Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > On 6/23/11 4:11 AM, Kostik Belousov wrote: > >On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: > > =20 > >>On (22/06/2011 19:19), Garance A Drosehn wrote: > >> =20 > >>>Sorry for replying to an older message, but a reply made in a different > >>>thread reminded me about this project... > >>> > >>>Also, I may have asked this before. In fact, I'm almost sure that I= =20 > >>>started > >>>a reply to this back in Jan/Feb, but my email client claims I never=20 > >>>replied > >>>to this topic... > >>> > >>>Are you increasing only the size of ino_t, or could you also look at > >>>increasing the size of dev_t? (just curious...) > >>> =20 > >>Sure. Incorporating as much of similar changes as possible is good. > >>I've added Kostik and Matthew to CC list, it's for them to decide. > >> > >>dev_t on other OSes: > >> NetBSD - uint64_t > >> DragonFly - uint32_t > >> Darwin - __int32_t > >> OpenSolaris - ulong_t > >> Linux - __u32 > >> > >>Considering this I think 3rd party software is not ready for such > >>change. > >> > >>Major/minor mapping to dev_t will get more complicated. > >> > >>And the most important question: what would you want it for? [...] > >> =20 > >Indeed, this is the right question. > > =20 > Consider the thread "Increasing the size of dev_t and ino_t" from > freebsd-arch in 2002: >=20 > http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-a= rch.html >=20 > In particular, this message by Robert Watson: >=20 > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D139853+0+archive/2002/free= bsd-arch/20020317.freebsd-arch >=20 > I just participated in an online conference for OpenAFS, and while it > isn't exactly taking the world by storm, I keep thinking it would be > useful if FreeBSD could map individual AFS volumes to unique dev_t > identifiers. And given the way AFS is implemented (as a global filesystem > with many cells all reachable at the same time), and given the way most > sites deploy AFS (with thousands or tens-of-thousands of individual AFS > volumes *per site*), that adds up to a lot of values for dev_t. >=20 > The upcoming release of OpenAFS should include a working and pretty > stable AFS client for FreeBSD, so having a larger dev_t would have a > more immediate application than it did back in 2002. Am I right that the issue is the uniqueness of the dev_t for each AFS volume, as reported by stat(2) ? Shouldn't the AFS client synthesize the dev_t for each new volume mounted ? It seems that the current 32bit dev_t would be enough, since I do not expect to see hundreds of thousands of mounts on an single system. Please note that we do not guarantee dev_t stability across reboots even for real devices. --Q5gW42QHvCgEka9W Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4DvZYACgkQC3+MBN1Mb4jJKwCgntG+O/kNjo24Tkj/LMIqSa7K C3kAoLhNjLI2rKZH7o5kfdV8Utv1CDzM =P8nv -----END PGP SIGNATURE----- --Q5gW42QHvCgEka9W-- From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 00:23:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9D56106566B; Fri, 24 Jun 2011 00:23:51 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 3BED88FC15; Fri, 24 Jun 2011 00:23:50 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAFPYA06DaFvO/2dsb2JhbABShEmjXYhzsVyQcYErg3iBCgSRcpAq X-IronPort-AV: E=Sophos;i="4.65,416,1304308800"; d="scan'208";a="128858867" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 23 Jun 2011 20:23:49 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0D7BCB3F3B; Thu, 23 Jun 2011 20:23:50 -0400 (EDT) Date: Thu, 23 Jun 2011 20:23:50 -0400 (EDT) From: Rick Macklem To: Kostik Belousov Message-ID: <1437987696.1010265.1308875030014.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110623222630.GU48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Robert Watson , Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 00:23:51 -0000 Kostik Belousov wrote: > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > > On 6/23/11 4:11 AM, Kostik Belousov wrote: > > >On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: > > > > > >>On (22/06/2011 19:19), Garance A Drosehn wrote: > > >> > > >>>Sorry for replying to an older message, but a reply made in a > > >>>different > > >>>thread reminded me about this project... > > >>> > > >>>Also, I may have asked this before. In fact, I'm almost sure that > > >>>I > > >>>started > > >>>a reply to this back in Jan/Feb, but my email client claims I > > >>>never > > >>>replied > > >>>to this topic... > > >>> > > >>>Are you increasing only the size of ino_t, or could you also look > > >>>at > > >>>increasing the size of dev_t? (just curious...) > > >>> > > >>Sure. Incorporating as much of similar changes as possible is > > >>good. > > >>I've added Kostik and Matthew to CC list, it's for them to decide. > > >> > > >>dev_t on other OSes: > > >> NetBSD - uint64_t > > >> DragonFly - uint32_t > > >> Darwin - __int32_t > > >> OpenSolaris - ulong_t > > >> Linux - __u32 > > >> > > >>Considering this I think 3rd party software is not ready for such > > >>change. > > >> > > >>Major/minor mapping to dev_t will get more complicated. > > >> > > >>And the most important question: what would you want it for? [...] > > >> > > >Indeed, this is the right question. > > > > > Consider the thread "Increasing the size of dev_t and ino_t" from > > freebsd-arch in 2002: > > > > http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html > > > > In particular, this message by Robert Watson: > > > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch > > > > I just participated in an online conference for OpenAFS, and while > > it > > isn't exactly taking the world by storm, I keep thinking it would be > > useful if FreeBSD could map individual AFS volumes to unique dev_t > > identifiers. And given the way AFS is implemented (as a global > > filesystem > > with many cells all reachable at the same time), and given the way > > most > > sites deploy AFS (with thousands or tens-of-thousands of individual > > AFS > > volumes *per site*), that adds up to a lot of values for dev_t. > > > > The upcoming release of OpenAFS should include a working and pretty > > stable AFS client for FreeBSD, so having a larger dev_t would have a > > more immediate application than it did back in 2002. > Am I right that the issue is the uniqueness of the dev_t for each > AFS volume, as reported by stat(2) ? > > Shouldn't the AFS client synthesize the dev_t for each new volume > mounted ? It seems that the current 32bit dev_t would be enough, > since I do not expect to see hundreds of thousands of mounts > on an single system. > I think the main concern is making sure that the value is not the same as what another mount already has. That's why mnt_stat.f_fsid is synthesized for NFS, I think? > Please note that we do not guarantee dev_t stability across reboots > even > for real devices. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 00:33:41 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63CC310656DC; Fri, 24 Jun 2011 00:33:41 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.freebsd.org (Postfix) with ESMTP id 3AA478FC15; Fri, 24 Jun 2011 00:33:41 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5O0Xf6b072091; Fri, 24 Jun 2011 00:33:41 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5O0XfB9072087; Fri, 24 Jun 2011 00:33:41 GMT (envelope-from linimon) Date: Fri, 24 Jun 2011 00:33:41 GMT Message-Id: <201106240033.p5O0XfB9072087@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/158231: [nullfs] panic on unmounting nullfs mounted over ufs on usb stick that got detached X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 00:33:41 -0000 Old Synopsis: panic on unmounting nullfs mounted over ufs on usb stick that got detached New Synopsis: [nullfs] panic on unmounting nullfs mounted over ufs on usb stick that got detached Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jun 24 00:33:29 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=158231 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 01:55:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 134541065676; Fri, 24 Jun 2011 01:55:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 94D3C8FC0A; Fri, 24 Jun 2011 01:55:03 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5O1sxKO013487 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 24 Jun 2011 11:55:00 +1000 Date: Fri, 24 Jun 2011 11:54:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <1437987696.1010265.1308875030014.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110624113436.H2118@besplex.bde.org> References: <1437987696.1010265.1308875030014.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Robert Watson , Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 01:55:04 -0000 On Thu, 23 Jun 2011, Rick Macklem wrote: > Kostik Belousov wrote: >> On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: >>> On 6/23/11 4:11 AM, Kostik Belousov wrote: >>>> On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: >>>> >>>>> On (22/06/2011 19:19), Garance A Drosehn wrote: >>>>> >>>>>> Sorry for replying to an older message, but a reply made in a >>>>>> different >>>>>> thread reminded me about this project... >>>>>> >>>>>> Also, I may have asked this before. In fact, I'm almost sure that >>>>>> I >>>>>> started >>>>>> a reply to this back in Jan/Feb, but my email client claims I >>>>>> never >>>>>> replied >>>>>> to this topic... >>>>>> >>>>>> Are you increasing only the size of ino_t, or could you also look >>>>>> at >>>>>> increasing the size of dev_t? (just curious...) >>>>>> >>>>> Sure. Incorporating as much of similar changes as possible is >>>>> good. >>>>> I've added Kostik and Matthew to CC list, it's for them to decide. >>>>> >>>>> dev_t on other OSes: >>>>> NetBSD - uint64_t >>>>> DragonFly - uint32_t >>>>> Darwin - __int32_t >>>>> OpenSolaris - ulong_t >>>>> Linux - __u32 >>>>> >>>>> Considering this I think 3rd party software is not ready for such >>>>> change. >>>>> >>>>> Major/minor mapping to dev_t will get more complicated. >>>>> >>>>> And the most important question: what would you want it for? [...] >>>>> >>>> Indeed, this is the right question. >>>> >>> Consider the thread "Increasing the size of dev_t and ino_t" from >>> freebsd-arch in 2002: >>> >>> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html >>> >>> In particular, this message by Robert Watson: >>> >>> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch >>> >>> I just participated in an online conference for OpenAFS, and while >>> it >>> isn't exactly taking the world by storm, I keep thinking it would be >>> useful if FreeBSD could map individual AFS volumes to unique dev_t >>> identifiers. And given the way AFS is implemented (as a global >>> filesystem >>> with many cells all reachable at the same time), and given the way >>> most >>> sites deploy AFS (with thousands or tens-of-thousands of individual >>> AFS >>> volumes *per site*), that adds up to a lot of values for dev_t. >>> >>> The upcoming release of OpenAFS should include a working and pretty >>> stable AFS client for FreeBSD, so having a larger dev_t would have a >>> more immediate application than it did back in 2002. >> Am I right that the issue is the uniqueness of the dev_t for each >> AFS volume, as reported by stat(2) ? >> >> Shouldn't the AFS client synthesize the dev_t for each new volume >> mounted ? It seems that the current 32bit dev_t would be enough, >> since I do not expect to see hundreds of thousands of mounts >> on an single system. >> > I think the main concern is making sure that the value is not the > same as what another mount already has. That's why mnt_stat.f_fsid > is synthesized for NFS, I think? > >> Please note that we do not guarantee dev_t stability across reboots >> even >> for real devices. mnt_stat.f_fsid is generated from the dev_t, and tries to give stability across reboots. Otherwise, IIRC, nfs mounts break if the server is rebooted. Not only the dev_t part, but other things in f_fsid, depend on the order of initialization, but the ids usually end up the same if you don't reconfigure much on the server. f_fsid also has a problem with uniqeness, but that is mainly because it wants to be unique when truncated to a 16-bit dev_t. dev_t is only 16 bits in some versions of Linux, including in the FreeBSD i386 Linux emulator (I can see traces of 32-bit dev_t in Linux-2.6.10 but not in the FreeBSD emulator). I hope AFS ids could be implemented like fsids and not need to literally match foreign ids, but if they are synthesized then they might be harder than fsids to keep invariant across reboots. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 07:17:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3F98106564A for ; Fri, 24 Jun 2011 07:17:08 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 857098FC0A for ; Fri, 24 Jun 2011 07:17:08 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id EFB048D5F5E; Fri, 24 Jun 2011 08:59:13 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 11.3450] X-CRM114-CacheID: sfid-20110624_08591_9ECFE1D9 X-CRM114-Status: Good ( pR: 11.3450 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jun 24 08:59:13 2011 X-DSPAM-Confidence: 0.7011 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4e0435c1770608156169377 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, FreeBSD, 0.00056, FreeBSD, 0.00056, wrote+>, 0.00200, is+>, 0.00299, >+>, 0.00324, References*mail.gmail.com>, 0.00356, root, 0.00478, wrote, 0.00486, In-Reply-To*mail.gmail.com>, 0.00555, FreeBSD+8, 0.00596, STABLE, 0.00596, STABLE, 0.00596, i386, 0.00793, Tue+Jun, 0.00850, >+not, 0.00990, >+And, 0.00990, CEST, 0.00990, ratio, 0.99000, through+my, 0.99000, >+I've, 0.01000, my+>, 0.01000, Received*24+Jun, 0.99000, doesn't+look, 0.01000, so+it, 0.01000, 21+59, 0.01000, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id 89CAA8D5F52; Fri, 24 Jun 2011 08:59:13 +0200 (CEST) Message-ID: <4E0435B6.30004@fsn.hu> Date: Fri, 24 Jun 2011 08:59:02 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Wiktor Niesiobedzki References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS L2ARC hit ratio X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 07:17:08 -0000 On 06/21/11 21:59, Wiktor Niesiobedzki wrote: > I've recently migrated my 8.2 box to recent stable: > FreeBSD kadlubek.vink.pl 8.2-STABLE FreeBSD 8.2-STABLE #22: Tue Jun 7 > 03:43:29 CEST 2011 root@kadlubek:/usr/obj/usr/src/sys/KADLUB i386 > > And upgraded my ZFS/ZPOOL to newest versions. Though through my > monitoring I've noticed some declination in L2ARC hit ratio (server is > not busy, so it doesn't look that suspicious). I've made some tests > today and I guess, that there might be some problem: Likely because vfs.zfs.l2arc_noprefetch is now 1. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 08:38:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88AF91065670; Fri, 24 Jun 2011 08:38:46 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 254CD8FC17; Fri, 24 Jun 2011 08:38:45 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p5O8cfrI004028 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 24 Jun 2011 11:38:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p5O8cfN2055992; Fri, 24 Jun 2011 11:38:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p5O8cf8e055991; Fri, 24 Jun 2011 11:38:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 24 Jun 2011 11:38:41 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110624083841.GV48734@deviant.kiev.zoral.com.ua> References: <20110623222630.GU48734@deviant.kiev.zoral.com.ua> <1437987696.1010265.1308875030014.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Xm1wk/WBwRlpvc4I" Content-Disposition: inline In-Reply-To: <1437987696.1010265.1308875030014.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, Robert Watson , Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 08:38:46 -0000 --Xm1wk/WBwRlpvc4I Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 23, 2011 at 08:23:50PM -0400, Rick Macklem wrote: > Kostik Belousov wrote: > > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > > > On 6/23/11 4:11 AM, Kostik Belousov wrote: > > > >On Thu, Jun 23, 2011 at 09:43:33AM +0300, Gleb Kurtsou wrote: > > > > > > > >>On (22/06/2011 19:19), Garance A Drosehn wrote: > > > >> > > > >>>Sorry for replying to an older message, but a reply made in a > > > >>>different > > > >>>thread reminded me about this project... > > > >>> > > > >>>Also, I may have asked this before. In fact, I'm almost sure that > > > >>>I > > > >>>started > > > >>>a reply to this back in Jan/Feb, but my email client claims I > > > >>>never > > > >>>replied > > > >>>to this topic... > > > >>> > > > >>>Are you increasing only the size of ino_t, or could you also look > > > >>>at > > > >>>increasing the size of dev_t? (just curious...) > > > >>> > > > >>Sure. Incorporating as much of similar changes as possible is > > > >>good. > > > >>I've added Kostik and Matthew to CC list, it's for them to decide. > > > >> > > > >>dev_t on other OSes: > > > >> NetBSD - uint64_t > > > >> DragonFly - uint32_t > > > >> Darwin - __int32_t > > > >> OpenSolaris - ulong_t > > > >> Linux - __u32 > > > >> > > > >>Considering this I think 3rd party software is not ready for such > > > >>change. > > > >> > > > >>Major/minor mapping to dev_t will get more complicated. > > > >> > > > >>And the most important question: what would you want it for? [...] > > > >> > > > >Indeed, this is the right question. > > > > > > > Consider the thread "Increasing the size of dev_t and ino_t" from > > > freebsd-arch in 2002: > > > > > > http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freeb= sd-arch.html > > > > > > In particular, this message by Robert Watson: > > > > > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D139853+0+archive/2002/= freebsd-arch/20020317.freebsd-arch > > > > > > I just participated in an online conference for OpenAFS, and while > > > it > > > isn't exactly taking the world by storm, I keep thinking it would be > > > useful if FreeBSD could map individual AFS volumes to unique dev_t > > > identifiers. And given the way AFS is implemented (as a global > > > filesystem > > > with many cells all reachable at the same time), and given the way > > > most > > > sites deploy AFS (with thousands or tens-of-thousands of individual > > > AFS > > > volumes *per site*), that adds up to a lot of values for dev_t. > > > > > > The upcoming release of OpenAFS should include a working and pretty > > > stable AFS client for FreeBSD, so having a larger dev_t would have a > > > more immediate application than it did back in 2002. > > Am I right that the issue is the uniqueness of the dev_t for each > > AFS volume, as reported by stat(2) ? > >=20 > > Shouldn't the AFS client synthesize the dev_t for each new volume > > mounted ? It seems that the current 32bit dev_t would be enough, > > since I do not expect to see hundreds of thousands of mounts > > on an single system. > >=20 > I think the main concern is making sure that the value is not the > same as what another mount already has. That's why mnt_stat.f_fsid > is synthesized for NFS, I think? We have quite useful unit number allocator that guarantees uniqueness, see sys/systm.h, look for unrhdr. In fact, it is used by devfs to maintain dev_t values, see struct cdev_priv member cdp_inode, which value typically ends up in the st_dev. If needed, the devfs inode allocator can be exported. >=20 > > Please note that we do not guarantee dev_t stability across reboots > > even > > for real devices. --Xm1wk/WBwRlpvc4I Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4ETRAACgkQC3+MBN1Mb4gS7ACg6DqqPm+BjxLGEZolpPPtXuSZ XswAnRrAHE9tlv8hUt/eHWRCLDMGQrfM =S+AK -----END PGP SIGNATURE----- --Xm1wk/WBwRlpvc4I-- From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 21:07:20 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 133EF106564A; Fri, 24 Jun 2011 21:07:20 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp5.server.rpi.edu (smtp5.server.rpi.edu [128.113.2.225]) by mx1.freebsd.org (Postfix) with ESMTP id C32598FC0A; Fri, 24 Jun 2011 21:07:19 +0000 (UTC) Received: from gilead.netel.rpi.edu (gilead.netel.rpi.edu [128.113.124.121]) by smtp5.server.rpi.edu (8.13.1/8.13.1) with ESMTP id p5OL7HiX001321; Fri, 24 Jun 2011 17:07:18 -0400 Message-ID: <4E04FC7F.6090801@FreeBSD.org> Date: Fri, 24 Jun 2011 17:07:11 -0400 From: Garance A Drosehn User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4 MIME-Version: 1.0 To: Kostik Belousov References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> <20110623064333.GA2823@tops> <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> <4E03B8C4.6040800@FreeBSD.org> <20110623222630.GU48734@deviant.kiev.zoral.com.ua> In-Reply-To: <20110623222630.GU48734@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Bayes-Prob: 0.0001 (Score 0) X-RPI-SA-Score: 1.50 (*) [Hold at 12.00] COMBINED_FROM,RATWARE_GECKO_BUILD X-CanItPRO-Stream: outgoing X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.225 Cc: freebsd-fs@FreeBSD.org, Robert Watson Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 21:07:20 -0000 On 6/23/11 6:26 PM, Kostik Belousov wrote: > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > >> Consider the thread "Increasing the size of dev_t and ino_t" from >> freebsd-arch in 2002: >> >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html >> >> In particular, this message by Robert Watson: >> >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch >> >> I just participated in an online conference for OpenAFS, and while it >> isn't exactly taking the world by storm, I keep thinking it would be >> useful if FreeBSD could map individual AFS volumes to unique dev_t >> identifiers. And given the way AFS is implemented (as a global FS >> with many cells all reachable at the same time), and given the way most >> sites deploy AFS (with thousands or tens-of-thousands of individual >> AFS volumes *per site*), that adds up to a lot of values for dev_t. >> >> The upcoming release of OpenAFS should include a working and pretty >> stable AFS client for FreeBSD, so having a larger dev_t would have >> a more immediate application than it did back in 2002. >> > Am I right that the issue is the uniqueness of the dev_t for each > AFS volume, as reported by stat(2) ? > > Shouldn't the AFS client synthesize the dev_t for each new volume > mounted ? It seems that the current 32bit dev_t would be enough, > since I do not expect to see hundreds of thousands of mounts > on an single system. > > Please note that we do not guarantee dev_t stability across reboots > even for real devices. > The AFS cell at RPI has approximately 40,000 AFS volumes, and each volume should have it's own dev_t (IMO). That's just counting the collection of AFS volumes which are on RPI file servers, and any user sitting on one computer could access AFS volumes which are made available by other sites (aka "AFS cells"). Most RPI users would only have access to maybe 1/4 of those volumes which exist at RPI, but we do know that individual users have run 'find' over the entire RPI cell looking for whatever they're looking for. I once did a run of 'md5deep' on the entire RPI cell, thanks to a symlink which I didn't realize was in my home directory! So one person can easily trigger the access of 10,000 AFS volumes on one computer using one command. That might sound terrifying if you imagine it as being 10,000 NFS mounts, but accessing AFS volumes isn't the same amount of work as auto-mounting NFS filesystems. So ignore whatever problems you might expect to see with 10,000 filesystems mounted on one computer. Just realize that it is very easy for a single user to access tens of thousands of AFS volumes from one computer, and it would be "most correct" (programming wise) if all of those AFS volumes were to get a unique value for dev_t. And of course it's even easier for a remote-access system to access tens-of-thousands of AFS volumes, since it would have a few dozen users logged in at the same time. Obviously most computers never access even 30,000 AFS cells before they (as the AFS client) will reboot, but I'm wondering how much overhead is there in trying to make sure that many different volumes are mapped to unique dev_t numbers. Please realize that I do not mind if people felt that there was no need to increase the size of dev_t at this time, and that we should wait until we see more of a demand for increasing it. But given the project to increase the size of inode numbers, I thought this was a good time to also ask about dev_t. I ask about it every few years :-) -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 22:09:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FC161065672; Fri, 24 Jun 2011 22:09:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 399B78FC19; Fri, 24 Jun 2011 22:09:04 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EAFYKBU6DaFvO/2dsb2JhbABShEmjb7hXkHKBK4N4gQoEj32BfpAu X-IronPort-AV: E=Sophos;i="4.65,421,1304308800"; d="scan'208";a="125126348" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 24 Jun 2011 18:09:04 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 34BBEB3E96; Fri, 24 Jun 2011 18:09:04 -0400 (EDT) Date: Fri, 24 Jun 2011 18:09:04 -0400 (EDT) From: Rick Macklem To: Garance A Drosehn Message-ID: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4E04FC7F.6090801@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - SAF3 (Mac)/6.0.10_GA_2692) Cc: freebsd-fs@FreeBSD.org, Robert Watson Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 22:09:05 -0000 Garance A Drosehn wrote: > On 6/23/11 6:26 PM, Kostik Belousov wrote: > > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > > > >> Consider the thread "Increasing the size of dev_t and ino_t" from > >> freebsd-arch in 2002: > >> > >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html > >> > >> In particular, this message by Robert Watson: > >> > >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch > >> > >> I just participated in an online conference for OpenAFS, and while > >> it > >> isn't exactly taking the world by storm, I keep thinking it would > >> be > >> useful if FreeBSD could map individual AFS volumes to unique dev_t > >> identifiers. And given the way AFS is implemented (as a global FS > >> with many cells all reachable at the same time), and given the way > >> most > >> sites deploy AFS (with thousands or tens-of-thousands of individual > >> AFS volumes *per site*), that adds up to a lot of values for dev_t. > >> > >> The upcoming release of OpenAFS should include a working and pretty > >> stable AFS client for FreeBSD, so having a larger dev_t would have > >> a more immediate application than it did back in 2002. > >> > > Am I right that the issue is the uniqueness of the dev_t for each > > AFS volume, as reported by stat(2) ? > > > > Shouldn't the AFS client synthesize the dev_t for each new volume > > mounted ? It seems that the current 32bit dev_t would be enough, > > since I do not expect to see hundreds of thousands of mounts > > on an single system. > > > > Please note that we do not guarantee dev_t stability across reboots > > even for real devices. > > > The AFS cell at RPI has approximately 40,000 AFS volumes, and each > volume should have it's own dev_t (IMO). That's just counting the > collection of AFS volumes which are on RPI file servers, and any > user sitting on one computer could access AFS volumes which are > made available by other sites (aka "AFS cells"). Most RPI users > would only have access to maybe 1/4 of those volumes which exist > at RPI, but we do know that individual users have run 'find' over > the entire RPI cell looking for whatever they're looking for. I > once did a run of 'md5deep' on the entire RPI cell, thanks to a > symlink which I didn't realize was in my home directory! > Note that it the value in mnt_stat.f_fsid that needs to be unique w.r.t other mount points in the machine. If AFS appears to be one mount point in the FreeBSD client, then the only issue I know of is how the client is expected to handle changes in dev_t within the mount point. fts(3) and friends will assume that it is a mount point crossing when st_dev changes. It will then expect that the funny rule that the d_ino in dirent will not be the same as st_ino. What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and return that as st_dev for the mounted volume until I see the fsid returned by the server change. Below that point, I return the fsid from the server as st_dev so long as it isn't the same as the synthesized one. That way, fts(3) and friends figure out the mount point crossings within the server. "ls -lR" will usually find problems if this is broken. > So one person can easily trigger the access of 10,000 AFS volumes > on one computer using one command. That might sound terrifying if > you imagine it as being 10,000 NFS mounts, but accessing AFS volumes > isn't the same amount of work as auto-mounting NFS filesystems. > So ignore whatever problems you might expect to see with 10,000 > filesystems mounted on one computer. Just realize that it is very > easy for a single user to access tens of thousands of AFS volumes > from one computer, and it would be "most correct" (programming wise) > if all of those AFS volumes were to get a unique value for dev_t. > And of course it's even easier for a remote-access system to access > tens-of-thousands of AFS volumes, since it would have a few dozen > users logged in at the same time. > > Obviously most computers never access even 30,000 AFS cells before > they (as the AFS client) will reboot, but I'm wondering how much > overhead is there in trying to make sure that many different volumes > are mapped to unique dev_t numbers. > > Please realize that I do not mind if people felt that there was no > need to increase the size of dev_t at this time, and that we should > wait until we see more of a demand for increasing it. But given the > project to increase the size of inode numbers, I thought this was a > good time to also ask about dev_t. I ask about it every few years :-) > > -- > Garance Alistair Drosehn = gad@gilead.netel.rpi.edu > Senior Systems Programmer or gad@freebsd.org > Rensselaer Polytechnic Institute or drosih@rpi.edu > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jun 24 22:51:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2421D106566C; Fri, 24 Jun 2011 22:51:10 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id ABB3A8FC0A; Fri, 24 Jun 2011 22:51:09 +0000 (UTC) Received: by qwc9 with SMTP id 9so2171620qwc.13 for ; Fri, 24 Jun 2011 15:51:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=xoojBqhilGK5L1Hus75v48OkrwKFZiWn2bhPt1cNq2g=; b=gt68RHQgih78fhtkSuZ6L5OZbUMHUWqfVrP/fRye2qvIPhRIppnwimBAlW/WXSfd4G R9eMrvSmbMpB40jZ6bY6xnds/qYEukapD4TMobE/2aIQoaLqJMd6i5Az85M4a2iHL+1/ DF/2GWKsmEvETgi1Zqz+VQVtA/2anHWcFh1FE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=Qb1rbqKMqyRPFEApvY4M5tSygrunsmhWcgZaI5JvdKOnTPoDg9Am4q389HdgOlIBEn Sn00K+GIx4TSE1Ru+q3kiOzVJx1eenrztSPYKx6+vrYxfq1md8qZuAaXPJP8U+hcB0vW xFOvbMpVmeKMXQaIkn9xJsAfvIB5eIp+LpOL0= MIME-Version: 1.0 Received: by 10.229.215.6 with SMTP id hc6mr2932721qcb.93.1308954060099; Fri, 24 Jun 2011 15:21:00 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.229.241.197 with HTTP; Fri, 24 Jun 2011 15:21:00 -0700 (PDT) In-Reply-To: <4E04FC7F.6090801@FreeBSD.org> References: <20101201091203.GA3933@tops> <20110104175558.GR3140@deviant.kiev.zoral.com.ua> <20110120124108.GA32866@tops.skynet.lt> <4E027897.8080700@FreeBSD.org> <20110623064333.GA2823@tops> <20110623081140.GQ48734@deviant.kiev.zoral.com.ua> <4E03B8C4.6040800@FreeBSD.org> <20110623222630.GU48734@deviant.kiev.zoral.com.ua> <4E04FC7F.6090801@FreeBSD.org> Date: Fri, 24 Jun 2011 15:21:00 -0700 X-Google-Sender-Auth: rrG7Gd83jVcnAAvU_sOFjMNg4Pk Message-ID: From: mdf@FreeBSD.org To: Garance A Drosehn Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Robert Watson Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 22:51:10 -0000 On Fri, Jun 24, 2011 at 2:07 PM, Garance A Drosehn wrote: > The AFS cell at RPI has approximately 40,000 AFS volumes, and each > volume should have it's own dev_t (IMO). > > Please realize that I do not mind if people felt that there was no > need to increase the size of dev_t at this time, and that we should > wait until we see more of a demand for increasing it. =A0But given the > project to increase the size of inode numbers, I thought this was a > good time to also ask about dev_t. =A0I ask about it every few years :-) I don't see why 32 bits are anywhere close to becoming tight to represent 40k unique values. Is there something wrong with how each new dev_t is computed, that runs out of space quicker than this implies? Thanks, matthew From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 03:53:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB25F106566C; Sat, 25 Jun 2011 03:53:42 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU [18.9.25.13]) by mx1.freebsd.org (Postfix) with ESMTP id 7604C8FC08; Sat, 25 Jun 2011 03:53:42 +0000 (UTC) X-AuditID: 1209190d-b7bdeae0000004f8-4d-4e0557f3c985 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP id 87.29.01272.3F7550E4; Fri, 24 Jun 2011 23:37:23 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id p5P3cdZI019910; Fri, 24 Jun 2011 23:38:39 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p5P3caOD020970 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 24 Jun 2011 23:38:37 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p5P3cZOI009925; Fri, 24 Jun 2011 23:38:35 -0400 (EDT) Date: Fri, 24 Jun 2011 23:38:35 -0400 (EDT) From: Benjamin Kaduk To: freebsd-fs@freebsd.org In-Reply-To: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprIKsWRmVeSWpSXmKPExsUixCmqrPs5nNXP4N1RWYuWBb9ZLY49/slm 8a8l16Jh2mM2i4fLrjFZ9B2azWjxYfEhFgd2jxmf5rN47Jx1l93j9fl17B6/N+9lCmCJ4rJJ Sc3JLEst0rdL4Mroa2tjK7hsVdG6bBt7A+NVvS5GTg4JAROJzV/+skDYYhIX7q1n62Lk4hAS 2McoseL5AnYIZwOjxJede6CcA0wS71/3MUM4DYwSR6+cZwLpZxHQlpgwaykziM0moCIx881G NhBbREBKYubJ12DdzAIvGSV2XfrLCpIQFlCX+D/nOTuIzSkQKPF/6w6wOK+Ag8SSl/vAjhIS CJC4cuEJI4gtKqAjsXr/FBaIGkGJkzOfgNnMApYS/9b+Yp3AKDgLSWoWktQCRqZVjLIpuVW6 uYmZOcWpybrFyYl5ealFukZ6uZkleqkppZsYwSEvybuD8d1BpUOMAhyMSjy8T/ey+AmxJpYV V+YeYpTkYFIS5dUNY/UT4kvKT6nMSCzOiC8qzUktPsQowcGsJMKbtBGonDclsbIqtSgfJiXN waIkzqvm/d9XSCA9sSQ1OzW1ILUIJivDwaEkwTsBGNtCgkWp6akVaZk5JQhpJg5OkOE8QMPP gSzmLS5IzC3OTIfIn2JUlBLnTQdpFgBJZJTmwfXCUtIrRnGgV4R5Q0GqeIDpDK77FdBgJqDB vE+ZQAaXJCKkpBoYj/g/Velf7LUtL0bQWKtx4S6v9zsN+FfFbn4wf5tFN/Nq1gcLLjXP3M7a 7XFph6UUS8T5VrkiPiV2FoOquu5X2+ctMjg7Z8fSoOCd2s8XC+2+0c25vmSX26o5Bo/498ee cHgx5WPipLOa8eJR1r8qeyY6J6fYh04okWubbLlVxK8l3Pkcz6MJSizFGYmGWsxFxYkACtJT 3iQDAAA= Cc: Garance A Drosehn , Robert Watson , shadow@gmail.com Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 03:53:42 -0000 Hmm, several messages regarding AFS that I will try to address at once. On Fri, 24 Jun 2011, Kostik Belousov wrote: > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: >> Consider the thread "Increasing the size of dev_t and ino_t" from >> freebsd-arch in 2002: >> >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html >> >> In particular, this message by Robert Watson: >> >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch >> >> I just participated in an online conference for OpenAFS, and while it >> isn't exactly taking the world by storm, I keep thinking it would be >> useful if FreeBSD could map individual AFS volumes to unique dev_t >> identifiers. And given the way AFS is implemented (as a global filesystem >> with many cells all reachable at the same time), and given the way most >> sites deploy AFS (with thousands or tens-of-thousands of individual AFS >> volumes *per site*), that adds up to a lot of values for dev_t. >> >> The upcoming release of OpenAFS should include a working and pretty >> stable AFS client for FreeBSD, so having a larger dev_t would have a >> more immediate application than it did back in 2002. > Am I right that the issue is the uniqueness of the dev_t for each > AFS volume, as reported by stat(2) ? > > Shouldn't the AFS client synthesize the dev_t for each new volume > mounted ? It seems that the current 32bit dev_t would be enough, > since I do not expect to see hundreds of thousands of mounts > on an single system. The current OpenAFS implementation only presents a single mountpoint, /afs, and does not really distinguish between different mounted volumes. This is not ideal, and we would like to be able to make each volume appear as a separate device if there's a good way to do so. The technical challenge of doing this while sill only having a single mount method for AFS is not something that I've looked at, there being more pressing issues on my plate. > > Please note that we do not guarantee dev_t stability across reboots even > for real devices. Hmm, this is somewhat annoying, as the AFS global namespace does provide a stable unique identifier for files/directories using a unique cell ID, volume ID, per-file ID, and uniquifier. Being able to directly use the cell/volume information for a dev_t would be quite convenient. On Fri, 24 Jun 2011, Bruce Evans wrote: > > mnt_stat.f_fsid is generated from the dev_t, and tries to give stability > across reboots. Otherwise, IIRC, nfs mounts break if the server is > rebooted. Not only the dev_t part, but other things in f_fsid, depend > on the order of initialization, but the ids usually end up the same if > you don't reconfigure much on the server. > > f_fsid also has a problem with uniqeness, but that is mainly because it > wants to be unique when truncated to a 16-bit dev_t. dev_t is only 16 > bits in some versions of Linux, including in the FreeBSD i386 Linux > emulator (I can see traces of 32-bit dev_t in Linux-2.6.10 but not in > the FreeBSD emulator). > > I hope AFS ids could be implemented like fsids and not need to literally > match foreign ids, but if they are synthesized then they might be harder > than fsids to keep invariant across reboots. I'm not sure how one would have a chance of keeping things invariant across reboots other than to use the cell/volume IDs in some fashion. That said, the AFS client maintains its own copy of these unique IDs in the fs-specific vnode area, and should be able to talk to the server just fine if the fsids end up faked. Keeping the fake fsids consistent if a file goes in and out of the local cache may be a different issue, though. On Fri, 24 Jun 2011, Rick Macklem wrote: > Garance A Drosehn wrote: >> The AFS cell at RPI has approximately 40,000 AFS volumes, and each >> volume should have it's own dev_t (IMO). That's just counting the >> collection of AFS volumes which are on RPI file servers, and any >> user sitting on one computer could access AFS volumes which are >> made available by other sites (aka "AFS cells"). Most RPI users >> would only have access to maybe 1/4 of those volumes which exist >> at RPI, but we do know that individual users have run 'find' over >> the entire RPI cell looking for whatever they're looking for. I >> once did a run of 'md5deep' on the entire RPI cell, thanks to a >> symlink which I didn't realize was in my home directory! We have almost 50,000 volumes in the athena cell, here. >> > Note that it the value in mnt_stat.f_fsid that needs to be unique w.r.t > other mount points in the machine. If AFS appears to be one mount > point in the FreeBSD client, then the only issue I know of is how > the client is expected to handle changes in dev_t within the mount Er, how is the client expected to communicate these changes? As mentioned above, I believe we currently present only a single device and mountpoint, which is suboptimal. (Actually, it looks like we don't even initialize mnt_stat.f_fsid at all if I'm reading the current code correctly. Oops.) I would love to be able to present volume mountpoints as actually being mountpoints. > point. fts(3) and friends will assume that it is a mount point > crossing when st_dev changes. It will then expect that the funny > rule that the d_ino in dirent will not be the same as st_ino. > > What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and > return that as st_dev for the mounted volume until I see the fsid > returned by the server change. Below that point, I return the fsid > from the server as st_dev so long as it isn't the same as the I think I'm confused. You're ... walking a directory heirarchy, and return a fake st_dev value but hold onto the fsid value from the server, then when the fsid from the server changes (due to a ... different NFS mount?), start reporting that new fsid and throw away the fake st_dev value? Can you point me at the code that is doing this? > synthesized one. That way, fts(3) and friends figure out the mount > point crossings within the server. > > "ls -lR" will usually find problems if this is broken. >> So one person can easily trigger the access of 10,000 AFS volumes >> on one computer using one command. That might sound terrifying if >> you imagine it as being 10,000 NFS mounts, but accessing AFS volumes >> isn't the same amount of work as auto-mounting NFS filesystems. >> So ignore whatever problems you might expect to see with 10,000 >> filesystems mounted on one computer. Just realize that it is very >> easy for a single user to access tens of thousands of AFS volumes >> from one computer, and it would be "most correct" (programming wise) >> if all of those AFS volumes were to get a unique value for dev_t. >> And of course it's even easier for a remote-access system to access >> tens-of-thousands of AFS volumes, since it would have a few dozen >> users logged in at the same time. >> I guess, at the end of the day, it's not clear to me what OpenAFS should do when we finally get around to exposing AFS volume mountpoints as device mountpoints to userland. Reusing existing globally-unique AFS ID information would be nice, but how to cleanly transform that to a smaller unique ID for the particular machine in question? -Ben Kaduk From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 11:54:54 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 95A7B106564A for ; Sat, 25 Jun 2011 11:54:54 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay03.ispgateway.de (smtprelay03.ispgateway.de [80.67.31.30]) by mx1.freebsd.org (Postfix) with ESMTP id AC1518FC1E for ; Sat, 25 Jun 2011 11:54:53 +0000 (UTC) Received: from [78.34.174.229] (helo=fabiankeil.de) by smtprelay03.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1QaRG4-0002jJ-Vh for freebsd-fs@freebsd.org; Sat, 25 Jun 2011 13:42:41 +0200 Date: Sat, 25 Jun 2011 13:40:31 +0200 From: Fabian Keil To: freebsd-fs@freebsd.org Message-ID: <20110625134031.3cbc5952@fabiankeil.de> In-Reply-To: <20110307202531.2c90ff5a@r500.local> References: <20110227202957.GD1992@garage.freebsd.pl> <20110228192129.119cac0c@r500.local> <20110307200634.3c0f92df@r500.local> <20110307202531.2c90ff5a@r500.local> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/FxqTGjH_sS8/36jOTVJVHX9"; protocol="application/pgp-signature" X-Df-Sender: 775067 Subject: Re: g_wither_washer() called 470000 times per second (was: ZFSv28: log_sysevent: type 19 is not implemented) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 11:54:54 -0000 --Sig_/FxqTGjH_sS8/36jOTVJVHX9 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Fabian Keil wrote: > Fabian Keil wrote: > > After unintentionally unplugging an USB disk with this > > zpool on it: > >=20 > > | fk@r500 ~ $zpool status toshiba > > | pool: toshiba > > | state: ONLINE > > | scan: none requested > > | config: > > | > > | NAME STATE READ WRITE CKSUM > > | toshiba ONLINE 0 0 0 > > | label/toshiba.eli ONLINE 0 0 0 > > | > > | errors: No known data errors > >=20 > > the system became sluggish and /var/log/messages got spammed > > with error messages: > >=20 > > Mar 6 21:33:10 r500 kernel: ugen7.2: at usbus7 (disconnected) > > Mar 6 21:33:10 r500 kernel: umass1: at uhub7, port 1, addr 2 (disconne= cted) > > Mar 6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): lost device > > Mar 6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): removing device = entry > > Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): lost device > > Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache = failed, status =3D=3D 0xa, scsi status =3D=3D 0x0 > > Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): removing device en= try > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:33:10 r500 last message repeated 50 times > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlo= g_sysevent: type 19 is not im > > Mar 6 21:33:10 r500 kernel: plemented > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:33:10 r500 last message repeated 238 times > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlo= g_sysevent: type 19 is > > Mar 6 21:33:10 r500 kernel: not implemented > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlo= g_sysevent: type 19 is not impleme > > Mar 6 21:33:10 r500 kernel: nted > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:33:10 r500 last message repeated 87 times > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlo= g_sysevent: type 19 is not implemented > > Mar 6 21:33:10 r500 kernel:=20 > > Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:33:10 r500 last message repeated 47 times > > Mar 6 21:33:11 r500 kernel: type 19 is not implemented > > Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:33:11 r500 last message repeated 1527 times > > Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implementedlo= g_sysevent: type 19 is not implemented > > Mar 6 21:33:11 r500 kernel:=20 > > Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented > >=20 > > fk@r500 ~ $zcat /var/log/messages.0.bz2 | grep -c "type 19 is not imple= mented" > > 34101 > >=20 > > At the time of the unplugging, a video was read from the pool. > >=20 > > When trying to export the pool, zpool export hung. > >=20 > > After rebooting the system, the message was shown two > > more times between the creation of the provider for the > > main zpool and the swap device: > >=20 > > Mar 6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1d.eli created. > > Mar 6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128 > > Mar 6 21:43:20 r500 kernel: GEOM_ELI: Crypto: software > > Mar 6 21:43:20 r500 kernel: Trying to mount root from ufs:/dev/ada0s1a= [rw]... > > Mar 6 21:43:20 r500 kernel: WARNING: / was not properly dismounted > > Mar 6 21:43:20 r500 kernel: start_init: trying /sbin/init > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense= data > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 2= 5 0 0 0 0 0 0 0 0 0=20 > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Stat= us Error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Co= ndition > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY= asc:3a,1 (Medium not present - tray closed) > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable = error > > Mar 6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented > > Mar 6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1b.eli created. > > Mar 6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-XTS 256 > > Mar 6 21:43:20 r500 kernel: GEOM_ELI: Crypto: software > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense= data > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 2= 5 0 0 0 0 0 0 0 0 0=20 > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Stat= us Error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Co= ndition > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY= asc:3a,1 (Medium not present - tray closed) > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable = error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense= data > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 2= 5 0 0 0 0 0 0 0 0 0=20 > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Stat= us Error > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Co= ndition > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY= asc:3a,1 (Medium not present - tray closed) > > Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable = error > > Mar 6 21:43:20 r500 kernel: lo1: bpf attached > > Mar 6 21:43:20 r500 kernel: wlan0: bpf attached > > Mar 6 21:43:20 r500 kernel: wlan0: bpf attached > > Mar 6 21:43:20 r500 kernel: wlan0: Ethernet address: 00:[...] > > Mar 6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 by= tes loaded at 0xffffffff814120b0 > > Mar 6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 by= tes loaded at 0xffffffff814120b0 > > Mar 6 21:43:20 r500 kernel: bge0: Disabling fastboot > > Mar 6 21:43:20 r500 kernel: bge0: Disabling fastboot > > Mar 6 21:43:20 r500 savecore: /dev/ada0s1b: Operation not permitted > > Mar 6 21:43:21 r500 named[2219]: starting BIND 9.6.3 -t /var/named -u = bind > > Mar 6 21:43:21 r500 named[2219]: built with '--prefix=3D/usr' '--infod= ir=3D/usr/share/info' '--mandir=3D/usr/share/man' '--enable-threads' '--ena= ble-getifaddrs' '--disable-linux-caps' '--with-openssl=3D/usr' '--with-rand= omdev=3D/dev/random' '--without-idn' '--without-libxml2' > > Mar 6 21:43:21 r500 named[2219]: command channel listening on 127.0.0.= 1#953 > > Mar 6 21:43:21 r500 named[2219]: command channel listening on ::1#953 > > Mar 6 21:43:21 r500 named[2219]: the working directory is not writable > > Mar 6 21:43:21 r500 named[2219]: running > > Mar 6 21:43:53 r500 wpa_supplicant[512]: WPA: Group rekeying completed= with 00:[...] [GTK=3DTKIP] > > Mar 6 21:44:14 r500 syslogd: exiting on signal 15 > >=20 > > As the boot process got stuck with no additional messages > > printed, I rebooted into single-user mode, exported the > > faulted pool and finished the boot process. The system > > came back normally and the pool could be imported without > > issues > >=20 > > fk@r500 ~ $grep zfs /boot/loader.conf | grep -v "^ *#" > > zfs_load=3D"YES" > >=20 > > I used the attached patch to stop the log spam, but the > > main issue seems to be reproducible. The top output after > > detaching the pool: > >=20 > > last pid: 4985; load averages: 10.47, 3.96, 2.02 up 0+02:01:49 1= 9:20:49 > > 552 processes: 12 running, 518 sleeping, 22 waiting > > CPU: 1.2% user, 0.0% nice, 98.8% system, 0.0% interrupt, 0.0% idle > > Mem: 267M Active, 95M Inact, 859M Wired, 2032K Cache, 7872K Buf, 692M F= ree > > Swap: 2048M Total, 2048M Free > >=20 > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU CO= MMAND = =20 > > 2 root 1 -8 - 0K 16K CPU1 1 1:26 96.97% g_= event > > 0 root 387 -8 0 0K 6192K - 0 2:11 80.66% ke= rnel > > 1702 root 1 76 0 6276K 796K RUN 0 0:15 11.96% de= vd > > 11 root 2 155 ki31 0K 32K RUN 0 26:46 0.00% id= le > > 3395 fk 1 22 0 465M 329M select 0 7:27 0.00% Xo= rg > > 3398 fk 1 20 0 96120K 9704K RUN 0 0:49 0.00% e16 > > 12 root 22 -84 - 0K 352K WAIT 1 0:37 0.00% in= tr > > 26 root 1 20 - 0K 16K geli:w 0 0:31 0.00% g_= eli[0] ada0s1d > > 27 root 1 22 - 0K 16K geli:w 1 0:29 0.00% g_= eli[1] ada0s1d > >=20 > > The stuck zpool's stack: > >=20 > > fk@r500 ~ $sudo procstat -kk $(pgrep zpool) > > PID TID COMM TDNAME KSTACK = =20 > > 5087 100490 zpool initial thread mi_switch+0x174 sleepq_w= ait+0x42 __lockmgr_args+0x7a3 vop_stdlock+0x39 VOP_LOCK1_APV+0x52 _vn_lock+= 0x47 vflush+0x125 zfs_umount+0x9f dounmount+0x31e unmount+0x38b syscallente= r+0x331 syscall+0x4b Xfast_syscall+0xdd=20 > >=20 > > When I re-attached the disk, g_event kept eating cpu. > >=20 > > I'm using a script to attach and import pools on USB devices and as > > it currently doesn't handle faulted pools, it tried to import the > > already faulted pool, which resulted in a zpool core dump. > >=20 > > Mar 7 19:27:25 r500 sudo: fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; US= ER=3Droot ; COMMAND=3D/sbin/geli attach -j - -k /home/fk/geli-keys/toshiba.= key /dev/label/toshiba > > Mar 7 19:27:33 r500 sudo: fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; US= ER=3Droot ; COMMAND=3D/sbin/zpool import toshiba > > Mar 7 19:27:33 r500 kernel: GEOM_ELI: Device label/toshiba.eli created. > > Mar 7 19:27:33 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128 > > Mar 7 19:27:33 r500 kernel: GEOM_ELI: Crypto: software > > Mar 7 19:27:33 r500 kernel: pid 5206 (zpool), uid 0: exited on signal = 11 (core dumped) > >=20 > > fk@r500 ~ $gdb /sbin/zpool zpool.core=20 > > GNU gdb 6.1.1 [FreeBSD] > > Copyright 2004 Free Software Foundation, Inc. > > [...] > > Loaded symbols for /libexec/ld-elf.so.1 > > #0 0x000000080085ec7d in avl_insert () from /lib/libavl.so.2 > > [New Thread 802807400 (LWP 100524/initial thread)] > > (gdb) where > > #0 0x000000080085ec7d in avl_insert () from /lib/libavl.so.2 > > #1 0x000000080085ed5e in avl_add () from /lib/libavl.so.2 > > #2 0x0000000801ae9105 in zpool_find_import_cached () from /lib/libzfs.= so.2 > > #3 0x0000000000409deb in zpool_do_import () > > #4 0x0000000000406fa9 in main () > >=20 > > This time rebooting into single-user mode to export the faulted > > pool wasn't necessary, but I doubt that the patch had anything > > to do with it. > >=20 > > The second time I was using today's CURRENT instead of yesterday's, > > but I don't think there were any ZFS-related commits. >=20 > I just tried it a third time, this time with the pool imported > but not in active use. Again I couldn't export the pool afterwards: >=20 > fk@r500 ~ $sudo zpool list > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > tank 228G 166G 62.1G 72% 1.00x ONLINE - > toshiba 1.36T 1.31T 47.9G 96% 1.00x UNAVAIL - > fk@r500 ~ $sudo zpool export toshiba > load: 1.05 cmd: zpool 4117 [tx->tx_sync_done_cv)] 249.93r 0.00u 0.06s 0%= 2528k > load: 1.05 cmd: zpool 4117 [tx->tx_sync_done_cv)] 250.11r 0.00u 0.06s 0%= 2528k > load: 1.05 cmd: zpool 4117 [tx->tx_sync_done_cv)] 250.25r 0.00u 0.06s 0%= 2528k >=20 >=20 > fk@r500 ~ $sudo procstat -kk $(pgrep zpool) > PID TID COMM TDNAME KSTACK = =20 > 4117 102251 zpool initial thread mi_switch+0x174 sleepq_wai= t+0x42 _cv_wait+0x129 txg_wait_synced+0x85 dmu_tx_assign+0x170 spa_history_= log+0x43 zfs_log_history+0x82 zfs_ioc_pool_export+0x2a zfsdev_ioctl+0xe6 de= vfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4= b Xfast_syscall+0xdd=20 >=20 > And importing the pool again wasn't possible either: >=20 > fk@r500 ~ $zpool list > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > tank 228G 166G 62.1G 72% 1.00x ONLINE - > toshiba 1.36T 1.31T 47.9G 96% 1.00x UNAVAIL - > fk@r500 ~ $sudo zpool import toshiba > cannot import 'toshiba': a pool with that name is already created/importe= d, > and no additional pools with that name were found >=20 > The system stayed responsive, though, and other pools could > still be imported and exported. It also didn't result in a > "log_sysevent: type 19 is not implemented" message. While the log message in the subject has been recently removed, the problem unfortunately is still present. It can also be reproduced by unplugging a labeled and geli-encrypted usb stick while writing to the ZFS pool on it: Jun 25 12:53:04 r500 kernel: ugen7.2: at usbus7 Jun 25 12:53:04 r500 kernel: umass0: on usbus7 Jun 25 12:53:04 r500 kernel: umass0: SCSI over Bulk-Only; quirks =3D 0x0000 Jun 25 12:53:05 r500 kernel: umass0:2:0:-1: Attached to scbus2 Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): Down reving Protoco= l Version from 2 to 0? Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): SCSI status error Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): TEST UNIT READY. CD= B: 0 0 0 0 0 0 Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): CAM status: SCSI St= atus Error Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): SCSI status: Check = Condition Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): SCSI sense: UNIT AT= TENTION asc:28,0 (Not ready to ready change, medium may have changed) Jun 25 12:53:05 r500 kernel: (probe0:umass-sim0:0:0:0): Retrying command (p= er sense data) Jun 25 12:53:05 r500 kernel: pass2 at umass-sim0 bus 0 scbus2 target 0 lun 0 Jun 25 12:53:05 r500 kernel: pass2: Removable Direct= Access SCSI-0 device Jun 25 12:53:05 r500 kernel: pass2: Serial Number AA04012900007508 Jun 25 12:53:05 r500 kernel: pass2: 40.000MB/s transfers Jun 25 12:53:05 r500 kernel: da0 at umass-sim0 bus 0 scbus2 target 0 lun 0 Jun 25 12:53:05 r500 kernel: da0: Removable Direct A= ccess SCSI-0 device Jun 25 12:53:05 r500 kernel: da0: Serial Number AA04012900007508 Jun 25 12:53:05 r500 kernel: da0: 40.000MB/s transfers Jun 25 12:53:05 r500 kernel: da0: 956MB (1957888 512 byte sectors: 64H 32S/= T 956C) Jun 25 12:53:05 r500 kernel: GEOM: new disk da0 Jun 25 12:53:17 r500 sudo: fk : TTY=3Dpts/13 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/takems Jun 25 12:53:22 r500 sudo: fk : TTY=3Dpts/13 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool import takems Jun 25 12:53:22 r500 kernel: GEOM_ELI: Device label/takems.eli created. Jun 25 12:53:22 r500 kernel: GEOM_ELI: Encryption: AES-XTS 256 Jun 25 12:53:22 r500 kernel: GEOM_ELI: Crypto: software Jun 25 12:53:22 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Jun 25 12:53:22 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 = 0 0 0 0 0 0 0 0 Jun 25 12:53:22 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E= rror Jun 25 12:53:22 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit= ion Jun 25 12:53:22 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc= :3a,1 (Medium not present - tray closed) Jun 25 12:53:22 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error Jun 25 12:54:38 r500 sudo: fk : TTY=3Dpts/13 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs receive -vF takems/backup/r500/fk Jun 25 12:54:38 r500 sudo: fk : TTY=3Dpts/13 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zfs send -i @2011-06-21_18:05 tank/home/fk@2011-0= 6-25_12:43 Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Request completed with= CAM_REQ_CMP_ERR Jun 25 12:54:42 r500 kernel: ugen7.2: at usbus7 (disconne= cted) Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:umass0: 0:at uhub7, port 2, = addr 2 (disconnected) Jun 25 12:54:42 r500 kernel: 0): Retrying command Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Selection timeout Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Retrying command Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Selection timeout Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Retrying command Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Selection timeout Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Retrying command Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): lost device - 1 outsta= nding Jun 25 12:54:42 r500 kernel: (pass2:umass-sim0:0:0:0): lost device Jun 25 12:54:42 r500 kernel: (pass2:umass-sim0:0:0:0): removing device entry Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): Error 6, Unretryable e= rror Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): oustanding 0 Jun 25 12:54:42 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error= =3D6). label/takems.eli[WRITE(offset=3D731511808, length=3D512)] Jun 25 12:54:42 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error= =3D6). label/takems.eli[WRITE(offset=3D93514752, length=3D5120)] Jun 25 12:54:42 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error= =3D6). label/takems.eli[WRITE(offset=3D731509248, length=3D1024)] Jun 25 12:54:42 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/takem= s.eli[READ(offset=3D270336, length=3D8192)] Jun 25 12:54:42 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/takem= s.eli[READ(offset=3D1001660416, length=3D8192)] Jun 25 12:54:42 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/takem= s.eli[READ(offset=3D1001922560, length=3D8192)] Jun 25 12:54:42 r500 kernel: (da0:umass-sim0:0:0:0): removing device entry Jun 25 12:54:42 r500 kernel: zio_vdev_io_start: Setting zio->io_error to EN= XIO for vdev takems /dev/label/takems.eli Jun 25 12:54:42 r500 last message repeated 14 times Jun 25 12:54:55 r500 kernel: ugen3.3: at usbus3 Jun 25 12:54:55 r500 kernel: umass0: on usbus3 Jun 25 12:54:55 r500 kernel: umass0: SCSI over Bulk-Only; quirks =3D 0x0000 Jun 25 12:54:56 r500 kernel: umass0:2:0:-1: Attached to scbus2 Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): Down reving Protoco= l Version from 2 to 0? Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): SCSI status error Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): TEST UNIT READY. CD= B: 0 0 0 0 0 0 Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): CAM status: SCSI St= atus Error Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): SCSI status: Check = Condition Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): SCSI sense: UNIT AT= TENTION asc:28,0 (Not ready to ready change, medium may have changed) Jun 25 12:54:56 r500 kernel: (probe0:umass-sim0:0:0:0): Retrying command (p= er sense data) Jun 25 12:54:56 r500 kernel: pass2 at umass-sim0 bus 0 scbus2 target 0 lun 0 Jun 25 12:54:56 r500 kernel: pass2: Removable Direct= Access SCSI-0 device Jun 25 12:54:56 r500 kernel: pass2: Serial Number AA04012900007508 Jun 25 12:54:56 r500 kernel: pass2: 40.000MB/s transfers Jun 25 12:54:56 r500 kernel: GEOM: new disk da0 Jun 25 12:54:56 r500 kernel: da0 at umass-sim0 bus 0 scbus2 target 0 lun 0 Jun 25 12:54:56 r500 kernel: da0: Removable Direct A= ccess SCSI-0 device Jun 25 12:54:56 r500 kernel: da0: Serial Number AA04012900007508 Jun 25 12:54:56 r500 kernel: da0: 40.000MB/s transfers Jun 25 12:54:56 r500 kernel: da0: 956MB (1957888 512 byte sectors: 64H 32S/= T 956C) Jun 25 12:55:52 r500 kernel: ugen3.3: at usbus3 (disconne= cted) Jun 25 12:55:52 r500 kernel: umass0: at uhub3, port 1, addr 3 (disconnected) Jun 25 12:55:52 r500 kernel: (da0:umass-sim0:0:0:0): lost device - 0 outsta= nding Jun 25 12:55:52 r500 kernel: (da0:umass-sim0:0:0:0): removing device entry Jun 25 12:55:52 r500 kernel: (pass2:umass-sim0:0:0:0): lost device Jun 25 12:55:52 r500 kernel: (pass2:umass-sim0:0:0:0): removing device entry Apparently what's eating the cpu is the kernel calling g_wither_washer() about 470000 time per second which seems a bit excessive: r500# dtrace -n 'fbt:kernel:g_*:entry { @[probefunc, stack()] =3D count(); = } tick-1sec { trunc(@, 15); printa(@); trunc(@)}' dtrace: description 'fbt:kernel:g_*:entry ' matched 232 probes CPU ID FUNCTION:NAME [...] 0 37691 :tick-1sec=20 g_trace =20 kernel`g_io_request+0x56 kernel`g_io_schedule_down+0x1d4 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 71 g_bioq_lock =20 kernel`g_io_deliver+0xa7 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 76 g_bioq_unlock =20 kernel`g_io_deliver+0x124 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 76 g_destroy_bio =20 kernel`g_std_done+0x32 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 76 g_io_deliver =20 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 76 g_std_done =20 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 76 g_trace =20 kernel`g_io_deliver+0x81 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 76 g_bioq_unlock =20 kernel`g_io_schedule_down+0x40 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 124 g_bioq_first =20 kernel`g_io_schedule_down+0x28 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 149 g_bioq_lock =20 kernel`g_io_schedule_down+0x1c kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 149 g_bioq_unlock =20 kernel`g_io_schedule_up+0x93 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 152 g_bioq_first =20 kernel`g_io_schedule_up+0x7f kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 189 g_bioq_first =20 kernel`g_io_schedule_up+0x38 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 189 g_bioq_lock =20 kernel`g_io_schedule_up+0x2c kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 189 g_wither_washer =20 kernel`g_run_events+0x358 kernel`fork_exit+0x11f kernel`0xffffffff808debde 472287 0 37691 :tick-1sec=20 g_wither_washer =20 kernel`g_run_events+0x358 kernel`fork_exit+0x11f kernel`0xffffffff808debde 475959 0 37691 :tick-1sec=20 g_wither_washer =20 kernel`g_run_events+0x358 kernel`fork_exit+0x11f kernel`0xffffffff808debde 479539 0 37691 :tick-1sec=20 g_wither_washer =20 kernel`g_run_events+0x358 kernel`fork_exit+0x11f kernel`0xffffffff808debde 478386 0 37691 :tick-1sec=20 g_io_deliver =20 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_io_request =20 kernel`g_io_schedule_down+0x1d4 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_part_start =20 kernel`g_io_schedule_down+0x1d4 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_std_done =20 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_trace =20 kernel`g_io_request+0x56 kernel`g_io_schedule_down+0x1d4 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_trace =20 kernel`g_io_deliver+0x81 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_trace =20 kernel`g_part_start+0x57 kernel`g_io_schedule_down+0x1d4 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 144 g_bioq_unlock =20 kernel`g_io_schedule_up+0x93 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 288 g_bioq_unlock =20 kernel`g_io_schedule_down+0x40 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 288 g_bioq_first =20 kernel`g_io_schedule_up+0x7f kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 336 g_bioq_first =20 kernel`g_io_schedule_up+0x38 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 336 g_bioq_lock =20 kernel`g_io_schedule_up+0x2c kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 336 g_bioq_first =20 kernel`g_io_schedule_down+0x28 kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 339 g_bioq_lock =20 kernel`g_io_schedule_down+0x1c kernel`g_down_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff808debde 339 g_wither_washer =20 kernel`g_run_events+0x358 kernel`fork_exit+0x11f kernel`0xffffffff808debde 447147 [...] Any ideas? Fabian --Sig_/FxqTGjH_sS8/36jOTVJVHX9 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk4FyTIACgkQBYqIVf93VJ2X7ACdEGPC1R/x4WQwIE3mSvNI4NN3 XIsAoLWElkacyengwEuyhC1fADgELkAz =N0r0 -----END PGP SIGNATURE----- --Sig_/FxqTGjH_sS8/36jOTVJVHX9-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 11:58:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4F52106564A; Sat, 25 Jun 2011 11:58:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7524B8FC14; Sat, 25 Jun 2011 11:58:28 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id EA33246B3C; Sat, 25 Jun 2011 07:58:27 -0400 (EDT) Received: from kavik.baldwin.cx (c-68-36-150-83.hsd1.nj.comcast.net [68.36.150.83]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 77A7E8A01F; Sat, 25 Jun 2011 07:58:27 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Sat, 25 Jun 2011 07:58:23 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-RELEASE-p2; KDE/4.5.5; i386; ; ) References: <1656190156.1051008.1308953344203.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201106250758.23935.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Sat, 25 Jun 2011 07:58:27 -0400 (EDT) Cc: shadow@gmail.com, Robert Watson , Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 11:58:28 -0000 On Friday, June 24, 2011 11:38:35 pm Benjamin Kaduk wrote: > > point. fts(3) and friends will assume that it is a mount point > > crossing when st_dev changes. It will then expect that the funny > > rule that the d_ino in dirent will not be the same as st_ino. > > > > What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and > > return that as st_dev for the mounted volume until I see the fsid > > returned by the server change. Below that point, I return the fsid > > from the server as st_dev so long as it isn't the same as the > > I think I'm confused. You're ... walking a directory heirarchy, and > return a fake st_dev value but hold onto the fsid value from the server, > then when the fsid from the server changes (due to a ... different NFS > mount?), start reporting that new fsid and throw away the fake st_dev > value? Can you point me at the code that is doing this? I think he's saying that VOP_GETATTR() for different vnodes in a single NFSv4 "mount" (as in 'struct mount *') can return different st_dev values to userland where the st_dev value for a given vnode depends on the remote fsid of the file on the NFSv4 server. That is, for NFSv4 it seems that all files on a mount do not use the same value of st_dev (as they would for a local filesystem), but instead only files from the logical volume on the server share an st_dev. That is, st_dev is per-vnode rather than just copied from the mount. This is done by storing va_fsid in the NFS attribute cache for each vnode: int nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, void *stuff, int writeattr, int dontshrink) { ... /* * For NFSv4, if the node's fsid is not equal to the mount point's * fsid, return the low order 32bits of the node's fsid. This * allows getcwd(3) to work. There is a chance that the fsid might * be the same as a local fs, but since this is in an NFS mount * point, I don't think that will cause any problems? */ if (NFSHASNFSV4(nmp) && NFSHASHASSETFSID(nmp) && (nmp->nm_fsid[0] != np->n_vattr.na_filesid[0] || nmp->nm_fsid[1] != np->n_vattr.na_filesid[1])) { /* * va_fsid needs to be set to some value derived from * np->n_vattr.na_filesid that is not equal * vp->v_mount->mnt_stat.f_fsid[0], so that it changes * from the value used for the top level server volume * in the mounted subtree. */ if (vp->v_mount->mnt_stat.f_fsid.val[0] != (uint32_t)np->n_vattr.na_filesid[0]) vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0]; else vap->va_fsid = (uint32_t)hash32_buf( np->n_vattr.na_filesid, 2 * sizeof(uint64_t), 0); } else vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; ... } Then for VOP_GETATTR() it returns the va_fsid from the attribute cache saved in 'vap' as the vnode's va_fsid which is used to compute st_dev in vn_stat(). I think the effect here is that 'mount' still only shows a single mountpoint for NFSv4, but applications that check for 'st_dev' changing to see if they are crossing a mountpoint (e.g. find -x) will treat the volumes as different mountpoints. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 13:04:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D327D106566B for ; Sat, 25 Jun 2011 13:04:55 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f44.google.com (mail-fx0-f44.google.com [209.85.161.44]) by mx1.freebsd.org (Postfix) with ESMTP id 4389A8FC0A for ; Sat, 25 Jun 2011 13:04:54 +0000 (UTC) Received: by fxe6 with SMTP id 6so507215fxe.17 for ; Sat, 25 Jun 2011 06:04:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to :sender:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=J4aDkgaSTTP1AF1u+/EoKtu5JCRWXmtO+HpqTG9ummk=; b=Wd5ZxiMzU9dvbrvEOnfYdSJZCwVhAFJBb63zgsBLVYOGWn4JXWbI7DfTwBlSunVqTj pqCfw75JBfa+9rvZit23enGbhTvX45L1eWTLkTVQa5RsT9Lj3gcOaO5nzc4pXhhmlxAn ztB3t0SKNXlawPcfn2fay7qaYjgrOZmVn9A3o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=dlV2DtFmMQk/rNSs2bDMUG8fUtQFuKNhUZf4Ft7ac7YZRy6nhUP18cEXGOd6fh/dVI LgDSzWKA4KRZLAxqzSGGkSIsNSsBuMNfVrHfymMuCieDKISwotA2FyhUGC/HLRoeBl6v ENWZkup3gboZM9xZY7LnjU+rkj87w8Jb8PHHk= Received: by 10.223.48.139 with SMTP id r11mr6030918faf.63.1309007094133; Sat, 25 Jun 2011 06:04:54 -0700 (PDT) Received: from localhost ([95.69.173.122]) by mx.google.com with ESMTPS id 11sm2257633fax.12.2011.06.25.06.04.52 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 25 Jun 2011 06:04:52 -0700 (PDT) From: Mikolaj Golub To: Nikitin Vitaly References: <4E0300FE.9000205@arqa.ru> X-Comment-To: Nikitin Vitaly Sender: Mikolaj Golub Date: Sat, 25 Jun 2011 16:04:50 +0300 In-Reply-To: <4E0300FE.9000205@arqa.ru> (Nikitin Vitaly's message of "Thu, 23 Jun 2011 16:01:50 +0700") Message-ID: <86fwmyrril.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: problem with copying data on hast-device X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 13:04:55 -0000 On Thu, 23 Jun 2011 16:01:50 +0700 Nikitin Vitaly wrote: NV> Hello! NV> I seem to have some trouble with hastd, I hope you can help me in this. NV> I set up hastd successfully on two nodes for testing. NV> Everything seems to work fine, I set up hast.conf, do the role part, NV> newfs and mount. On the first node I set up primary role. On the NV> second node I set up secondary role. NV> Problem: When I start copying about 300-400GB data from hdd to the NV> hast mount device, the whole system freezes, in that time gstat showed NV> hast/[devicename] %busy 99.9 NV> No error msg in log files found. The problem repeats even if I copy NV> data from the network or with help of rsync, on the hast mount device. NV> I try to switch off hastd service on the second node, but this isn't NV> take effect. Problem starts randomly. Sometimes, the copying small NV> data is done, but more not. NV> Please help me, if you can. NV> Environment: NV> 8.2-RELEASE FreeBSD 8.2-RELEASE There were several issues with HAST that have been fixed in CURRENT and MFCed to STABLE. So, I would recommend starting from upgrading to STABLE and reporting back if you still have the issue. Note, there is still a known problem in STABLE that has been fixed in CURRENT (r223181) but is not MFCed yet. I am going to MFC it soon. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 13:53:23 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CD84106564A; Sat, 25 Jun 2011 13:53:23 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B0F968FC08; Sat, 25 Jun 2011 13:53:22 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AugAALznBU6DaFvO/2dsb2JhbABShEmTU5AjukOQMYErg3mBDASSA5A3 X-IronPort-AV: E=Sophos;i="4.65,424,1304308800"; d="scan'208";a="129007130" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 25 Jun 2011 09:53:21 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CFA7CB3F07; Sat, 25 Jun 2011 09:53:21 -0400 (EDT) Date: Sat, 25 Jun 2011 09:53:21 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <1714423172.1062587.1309010001836.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201106250758.23935.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - SAF3 (Mac)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, shadow@gmail.com, Robert Watson , Garance A Drosehn Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 13:53:23 -0000 John Baldwin wrote: > On Friday, June 24, 2011 11:38:35 pm Benjamin Kaduk wrote: > > > point. fts(3) and friends will assume that it is a mount point > > > crossing when st_dev changes. It will then expect that the funny > > > rule that the d_ino in dirent will not be the same as st_ino. > > > > > > What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and > > > return that as st_dev for the mounted volume until I see the fsid > > > returned by the server change. Below that point, I return the fsid > > > from the server as st_dev so long as it isn't the same as the > > > > I think I'm confused. You're ... walking a directory heirarchy, and > > return a fake st_dev value but hold onto the fsid value from the > > server, > > then when the fsid from the server changes (due to a ... different > > NFS > > mount?), start reporting that new fsid and throw away the fake > > st_dev > > value? Can you point me at the code that is doing this? > > I think he's saying that VOP_GETATTR() for different vnodes in a > single NFSv4 > "mount" (as in 'struct mount *') can return different st_dev values to > userland where the st_dev value for a given vnode depends on the > remote > fsid of the file on the NFSv4 server. That is, for NFSv4 it seems that > all > files on a mount do not use the same value of st_dev (as they would > for a > local filesystem), but instead only files from the logical volume on > the > server share an st_dev. That is, st_dev is per-vnode rather than just > copied > from the mount. This is done by storing va_fsid in the NFS attribute > cache > for each vnode: > > int > nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void > *nvaper, > void *stuff, int writeattr, int dontshrink) > { > ... > /* > * For NFSv4, if the node's fsid is not equal to the mount point's > * fsid, return the low order 32bits of the node's fsid. This > * allows getcwd(3) to work. There is a chance that the fsid might > * be the same as a local fs, but since this is in an NFS mount > * point, I don't think that will cause any problems? > */ > if (NFSHASNFSV4(nmp) && NFSHASHASSETFSID(nmp) && > (nmp->nm_fsid[0] != np->n_vattr.na_filesid[0] || > nmp->nm_fsid[1] != np->n_vattr.na_filesid[1])) { > /* > * va_fsid needs to be set to some value derived from > * np->n_vattr.na_filesid that is not equal > * vp->v_mount->mnt_stat.f_fsid[0], so that it changes > * from the value used for the top level server volume > * in the mounted subtree. > */ > if (vp->v_mount->mnt_stat.f_fsid.val[0] != > (uint32_t)np->n_vattr.na_filesid[0]) > vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0]; > else > vap->va_fsid = (uint32_t)hash32_buf( > np->n_vattr.na_filesid, 2 * sizeof(uint64_t), 0); > } else > vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; > ... > } > > Then for VOP_GETATTR() it returns the va_fsid from the attribute cache > saved in 'vap' as the vnode's va_fsid which is used to compute st_dev > in > vn_stat(). > > I think the effect here is that 'mount' still only shows a single > mountpoint > for NFSv4, but applications that check for 'st_dev' changing to see if > they > are crossing a mountpoint (e.g. find -x) will treat the volumes as > different > mountpoints. > Yes, John. You said it way better than I did:-) This is necessary for NFSv4 because the server crosses server mount points (unlike NFSv3 where servers do not) and, as such, st_ino is not unique within one client NFSv4 mount (struct mount *). Without this, things like "ls -R" will complain about cycles when the same tuple is seen again. rick From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 14:04:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE23A106564A; Sat, 25 Jun 2011 14:04:21 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 642C28FC0C; Sat, 25 Jun 2011 14:04:21 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAB/qBU6DaFvO/2dsb2JhbABSEIQ5o3a6OJAwgSuDeYEMBJIDj2RT X-IronPort-AV: E=Sophos;i="4.65,424,1304308800"; d="scan'208";a="125165797" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Jun 2011 10:04:20 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 580E0B3F0A; Sat, 25 Jun 2011 10:04:20 -0400 (EDT) Date: Sat, 25 Jun 2011 10:04:20 -0400 (EDT) From: Rick Macklem To: Benjamin Kaduk Message-ID: <1182998178.1062689.1309010660304.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - SAF3 (Mac)/6.0.10_GA_2692) Cc: Garance A Drosehn , freebsd-fs@freebsd.org, Robert Watson , shadow@gmail.com Subject: Re: [rfc] 64-bit inode numbers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 14:04:22 -0000 Benjamin Kaduk wrote: > Hmm, several messages regarding AFS that I will try to address at > once. > > > On Fri, 24 Jun 2011, Kostik Belousov wrote: > > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote: > >> Consider the thread "Increasing the size of dev_t and ino_t" from > >> freebsd-arch in 2002: > >> > >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html > >> > >> In particular, this message by Robert Watson: > >> > >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch > >> > >> I just participated in an online conference for OpenAFS, and while > >> it > >> isn't exactly taking the world by storm, I keep thinking it would > >> be > >> useful if FreeBSD could map individual AFS volumes to unique dev_t > >> identifiers. And given the way AFS is implemented (as a global > >> filesystem > >> with many cells all reachable at the same time), and given the way > >> most > >> sites deploy AFS (with thousands or tens-of-thousands of individual > >> AFS > >> volumes *per site*), that adds up to a lot of values for dev_t. > >> > >> The upcoming release of OpenAFS should include a working and pretty > >> stable AFS client for FreeBSD, so having a larger dev_t would have > >> a > >> more immediate application than it did back in 2002. > > Am I right that the issue is the uniqueness of the dev_t for each > > AFS volume, as reported by stat(2) ? > > > > Shouldn't the AFS client synthesize the dev_t for each new volume > > mounted ? It seems that the current 32bit dev_t would be enough, > > since I do not expect to see hundreds of thousands of mounts > > on an single system. > > The current OpenAFS implementation only presents a single mountpoint, > /afs, and does not really distinguish between different mounted > volumes. > This is not ideal, and we would like to be able to make each volume > appear > as a separate device if there's a good way to do so. The technical > challenge of doing this while sill only having a single mount method > for > AFS is not something that I've looked at, there being more pressing > issues > on my plate. > With a single mount point in the client (struct mount *), if the st_dev remains the same throughout the mountpoint, then all st_ino's must be unique (ie. no duplicate ino# == 2 or similar) or fts(3) complains about cycles in the tree and gives up. (Shows up when you do "ls -lR".) On the other hand, if st_dev changes within the single client mountpoint, then the value of d_ino in the directory entry for it (I've heard of this being referred to as the "mounted on inode#") must be different than the st_ino reported for the object via stat(2) or getcwd() gets confused, if I recall correctly. > > > > Please note that we do not guarantee dev_t stability across reboots > > even > > for real devices. > > Hmm, this is somewhat annoying, as the AFS global namespace does > provide a > stable unique identifier for files/directories using a unique cell ID, > volume ID, per-file ID, and uniquifier. Being able to directly use the > cell/volume information for a dev_t would be quite convenient. > > > > > > On Fri, 24 Jun 2011, Bruce Evans wrote: > > > > mnt_stat.f_fsid is generated from the dev_t, and tries to give > > stability > > across reboots. Otherwise, IIRC, nfs mounts break if the server is > > rebooted. Not only the dev_t part, but other things in f_fsid, > > depend > > on the order of initialization, but the ids usually end up the same > > if > > you don't reconfigure much on the server. > > > > f_fsid also has a problem with uniqeness, but that is mainly because > > it > > wants to be unique when truncated to a 16-bit dev_t. dev_t is only > > 16 > > bits in some versions of Linux, including in the FreeBSD i386 Linux > > emulator (I can see traces of 32-bit dev_t in Linux-2.6.10 but not > > in > > the FreeBSD emulator). > > > > I hope AFS ids could be implemented like fsids and not need to > > literally > > match foreign ids, but if they are synthesized then they might be > > harder > > than fsids to keep invariant across reboots. > > I'm not sure how one would have a chance of keeping things invariant > across reboots other than to use the cell/volume IDs in some fashion. > That said, the AFS client maintains its own copy of these unique IDs > in > the fs-specific vnode area, and should be able to talk to the server > just > fine if the fsids end up faked. Keeping the fake fsids consistent if a > file goes in and out of the local cache may be a different issue, > though. > > > > > > On Fri, 24 Jun 2011, Rick Macklem wrote: > > > Garance A Drosehn wrote: > >> The AFS cell at RPI has approximately 40,000 AFS volumes, and each > >> volume should have it's own dev_t (IMO). That's just counting the > >> collection of AFS volumes which are on RPI file servers, and any > >> user sitting on one computer could access AFS volumes which are > >> made available by other sites (aka "AFS cells"). Most RPI users > >> would only have access to maybe 1/4 of those volumes which exist > >> at RPI, but we do know that individual users have run 'find' over > >> the entire RPI cell looking for whatever they're looking for. I > >> once did a run of 'md5deep' on the entire RPI cell, thanks to a > >> symlink which I didn't realize was in my home directory! > > We have almost 50,000 volumes in the athena cell, here. > > >> > > Note that it the value in mnt_stat.f_fsid that needs to be unique > > w.r.t > > other mount points in the machine. If AFS appears to be one mount > > point in the FreeBSD client, then the only issue I know of is how > > the client is expected to handle changes in dev_t within the mount > > Er, how is the client expected to communicate these changes? As > mentioned > above, I believe we currently present only a single device and > mountpoint, > which is suboptimal. (Actually, it looks like we don't even initialize > mnt_stat.f_fsid at all if I'm reading the current code correctly. > Oops.) > I would love to be able to present volume mountpoints as actually > being > mountpoints. > > > point. fts(3) and friends will assume that it is a mount point > > crossing when st_dev changes. It will then expect that the funny > > rule that the d_ino in dirent will not be the same as st_ino. > > > > What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and > > return that as st_dev for the mounted volume until I see the fsid > > returned by the server change. Below that point, I return the fsid > > from the server as st_dev so long as it isn't the same as the > > I think I'm confused. You're ... walking a directory heirarchy, and > return a fake st_dev value but hold onto the fsid value from the > server, > then when the fsid from the server changes (due to a ... different NFS > mount?), start reporting that new fsid and throw away the fake st_dev > value? Can you point me at the code that is doing this? > > > synthesized one. That way, fts(3) and friends figure out the mount > > point crossings within the server. > > > > "ls -lR" will usually find problems if this is broken. > >> So one person can easily trigger the access of 10,000 AFS volumes > >> on one computer using one command. That might sound terrifying if > >> you imagine it as being 10,000 NFS mounts, but accessing AFS > >> volumes > >> isn't the same amount of work as auto-mounting NFS filesystems. > >> So ignore whatever problems you might expect to see with 10,000 > >> filesystems mounted on one computer. Just realize that it is very > >> easy for a single user to access tens of thousands of AFS volumes > >> from one computer, and it would be "most correct" (programming > >> wise) > >> if all of those AFS volumes were to get a unique value for dev_t. > >> And of course it's even easier for a remote-access system to access > >> tens-of-thousands of AFS volumes, since it would have a few dozen > >> users logged in at the same time. > >> > > > > I guess, at the end of the day, it's not clear to me what OpenAFS > should > do when we finally get around to exposing AFS volume mountpoints as > device > mountpoints to userland. Reusing existing globally-unique AFS ID > information would be nice, but how to cleanly transform that to a > smaller > unique ID for the particular machine in question? > > -Ben Kaduk From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 14:26:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D4AB106564A for ; Sat, 25 Jun 2011 14:26:55 +0000 (UTC) (envelope-from tdgx86@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id F3FC08FC0C for ; Sat, 25 Jun 2011 14:26:54 +0000 (UTC) Received: by gxk28 with SMTP id 28so1902312gxk.13 for ; Sat, 25 Jun 2011 07:26:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=q9Txle6QOp7uejuAh2aFXFdGzsPTyMk7D5lNJin0o5o=; b=O43YlQh/XUSpXaICs9UC335x4tid+NV0QtVEgwEXC0Nm7hcuDbDDiqOFJdJpv44/SX HXskbPZcyCeK8eLAAGSjdJq9+IN2EEHNZH+6V71fNRTDa4TNoJBn488IGsBO/L6g+Fzw jGOeoZ71Jbco4CXhFUQ5qGMsee385oiqQaoUU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=qVPTALFSKWJKWxQ8omeX9JdLy0kOqWp8Y/A2get2ZSFSdGd+Dm4Su6z6Z1G4pfgLrZ k4zJFbexQC7FfAhYdPWtsFcXuECpU1Nv9Ck2uYQDcNO629205F3dfPpmSHWOPb4+2OXr KGIt+e46E5GYTDcT7ryWU5Z7qLRBjwpeGkKLg= MIME-Version: 1.0 Received: by 10.100.55.33 with SMTP id d33mr3967759ana.151.1309010482662; Sat, 25 Jun 2011 07:01:22 -0700 (PDT) Received: by 10.100.189.20 with HTTP; Sat, 25 Jun 2011 07:01:22 -0700 (PDT) Date: Sat, 25 Jun 2011 09:01:22 -0500 Message-ID: From: Troy Drake To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: GEOM: the primary GPT table is corrupt or invalid after RAIDZ creation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 14:26:55 -0000 I have recently setup a new file server with 5x 2TB Samsung 4K sector drives in a RAIDZ array and after going through the steps of creating GPT partitions to set the alignment a 1 MB aligned sector (block 2048) and gnop to force 4k sectors and finally a RAIDZ array, I receive the following warning: GEOM: ad6: the primary GPT table is corrupt or invalid. GEOM: ad6: using the secondary instead -- recovery strongly advised. (Snipped, but applies to ad8, ad10, ad12, ad14 identically). The steps I used to setup this array are as follows: gpart create -s GPT ad6 (and to all others) gpart add -t freebsd-zfs -b 2048 ad6 (and to all others) gnop create -S 4086 ad6 zpool create raidz tank ad6.nop ad8 ad10 ad12 ad14 zpool export tank gnop destroy ad6.nop zpool import tank Even though the pool works fine, gpart for all disks show no geometry: gpart: No such geom: ad6. (et al.) Now the pools status is good with no errors, however performance seems a bit backwards to me: #/usr/bin/time dd if=/dev/zero of=/tank/foo1 bs=2M count=10000 10000+0 records in 10000+0 records out 20971520000 bytes transferred in 66.247113 secs (316565040 bytes/sec) 66.24 real 0.01 user 12.44 sys # /usr/bin/time dd if=/tank/foo1 of=/dev/null bs=2M 10000+0 records in 10000+0 records out 20971520000 bytes transferred in 102.649646 secs (204301922 bytes/sec) 102.65 real 0.00 user 5.71 sys 316 MB/sec writes and 204 MB/sec reads, I would expect it the other way around, but this is consistent performance after running similar dd tests many times. Writes are always faster than reads. My main concern is reliability and the ability to import this pool into perhaps into another OS just in case. Is the GEOM warnings about a corrupt or invalid primary table a concern? Are the steps I'm using best practices or is there a better way? I've searched the lists and Google and what little I can find this seems like a verbose warning but isn't too much to be concerned about, but these are my first 4K sector disks on ZFS so I'm wanting to make sure before loading it up with data. The performance of writes faster than reads makes me has me a bit concerned though. Interestingly, if I start over and create a nop on the slice at step 4 - (gnop create -S 4096 ad6p1.nop) ad6 will retain its valid GPT table however the rest wont as they don't get nop'd. However that makes me unsure that ad6 itself aligned properly, or that ad6 is aligned the same as the rest of the disks. Regards, Troy From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 18:04:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CEE91065677 for ; Sat, 25 Jun 2011 18:04:42 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 59FBF8FC0C for ; Sat, 25 Jun 2011 18:04:42 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC46A03.dip.t-dialin.net [79.196.106.3]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D303484400D; Sat, 25 Jun 2011 20:04:27 +0200 (CEST) Received: from unknown (unknown [192.168.1.5]) by outgoing.leidinger.net (Postfix) with ESMTP id 1BD272CEF; Sat, 25 Jun 2011 20:04:25 +0200 (CEST) Date: Sat, 25 Jun 2011 20:04:24 +0200 From: Alexander Leidinger To: Troy Drake Message-ID: <20110625200424.0000374c@unknown> In-Reply-To: References: X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D303484400D.AF905 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.923, required 6, autolearn=disabled, ALL_TRUSTED -1.00, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1309629869.27489@AjjO5tbg18HJBICLPoZcnA X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: GEOM: the primary GPT table is corrupt or invalid after RAIDZ creation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 18:04:42 -0000 On Sat, 25 Jun 2011 09:01:22 -0500 Troy Drake wrote: > I have recently setup a new file server with 5x 2TB Samsung 4K sector > drives in a RAIDZ array and after going through the steps of creating > GPT partitions to set the alignment a 1 MB aligned sector (block 2048) > and gnop to force 4k sectors and finally a RAIDZ array, I receive the > following warning: > > GEOM: ad6: the primary GPT table is corrupt or invalid. > GEOM: ad6: using the secondary instead -- recovery strongly advised. > (Snipped, but applies to ad8, ad10, ad12, ad14 identically). > > The steps I used to setup this array are as follows: > gpart create -s GPT ad6 (and to all others) Here you create the GPT on ad6 (harddisk). > gpart add -t freebsd-zfs -b 2048 ad6 (and to all others) Here you create a partition on the harddisk. > gnop create -S 4086 ad6 Here you tell to create a 4k-sector pseudo-drive of the entire harddisk (instead of the partition). > zpool create raidz tank ad6.nop ad8 ad10 ad12 ad14 Here you create a pool on the complete harddisks, not on the zfs partition you created above. This should also destroy the GPT. I suggest you have a look at http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/ there you can see how to do what I think you wanted to do. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Sat Jun 25 19:22:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BAF6D10656B5 for ; Sat, 25 Jun 2011 19:22:11 +0000 (UTC) (envelope-from a.smith@ukgrid.net) Received: from mx1.ukgrid.net (mx1.ukgrid.net [89.107.22.36]) by mx1.freebsd.org (Postfix) with ESMTP id 180888FC0A for ; Sat, 25 Jun 2011 19:22:10 +0000 (UTC) Received: from [89.21.28.38] (port=64937 helo=omicron.ukgrid.net) by mx1.ukgrid.net with esmtp (Exim 4.76; FreeBSD) envelope-from a.smith@ukgrid.net envelope-to freebsd-fs@freebsd.org id 1QaYQj-000J2I-3X; Sat, 25 Jun 2011 20:22:09 +0100 Received: from 81.60.148.108.dyn.user.ono.com (81.60.148.108.dyn.user.ono.com [81.60.148.108]) by webmail2.ukgrid.net (Horde Framework) with HTTP; Sat, 25 Jun 2011 20:22:08 +0100 Message-ID: <20110625202208.36707k3wovx5ixwk@webmail2.ukgrid.net> Date: Sat, 25 Jun 2011 20:22:08 +0100 From: a.smith@ukgrid.net To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.9) / FreeBSD-8.1 Subject: RE: GEOM: the primary GPT table is corrupt or invalid after RAIDZ creation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 19:22:11 -0000 Hi, Looks to me like you are creating 4k aligned partions (adxp1) and then not using them. I think you need to do: gnop create -S 4096 ad6p1 (you have 4086 so hopefully just a typo from you) zpool create raidz tank ad6p1.nop ad8p1 ad10p1 ad12p1 ad14p1 Using adx[.nop] at the zpool create stage is using the whole usable disk, not the aligned GPT partion you went to the trouble of creating (gpart add -t freebsd-zfs -b 2048 ad6). cheers Andy.