From owner-freebsd-fs@FreeBSD.ORG Sun Dec 18 13:10:05 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B889D106564A; Sun, 18 Dec 2011 13:10:05 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 586008FC12; Sun, 18 Dec 2011 13:10:05 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 40806EA6; Sun, 18 Dec 2011 13:53:18 +0100 (CET) Date: Sun, 18 Dec 2011 13:52:12 +0100 From: Pawel Jakub Dawidek To: Andriy Gapon Message-ID: <20111218125212.GE1685@garage.freebsd.pl> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <73B607D0-8C3E-4980-B901-B986E060D32E@slu.se> <4EE7754D.3050605@FreeBSD.org> <558C926F-14FA-458D-BB8E-D20BA46BE6D2@slu.se> <4EE9EC6F.2080808@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="idY8LE8SD6/8DnRI" Content-Disposition: inline In-Reply-To: <4EE9EC6F.2080808@FreeBSD.org> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Martin =?utf-8?Q?Matu=C5=A1ka?= , "fs@freebsd.org" , Karli =?iso-8859-1?Q?Sj=F6berg?= Subject: Re: Consistant panics trying to access zfs filesystems replicated from Sun/Oracle appliance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Dec 2011 13:10:05 -0000 --idY8LE8SD6/8DnRI Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 15, 2011 at 02:47:43PM +0200, Andriy Gapon wrote: > on 15/12/2011 13:59 Karli Sj=F6berg said the following: > > Hi all, > >=20 > > with the help of Andriy Gapon, I managed to capture what happened: > >=20 > > # cd /export/Portfolio/ci (TAB) > > http://oi40.tinypic.com/b3lsog.jpg > >=20 > > # cd /export/Portfolio/cifs_share > > http://oi42.tinypic.com/6e40op.jpg > >=20 > > # ls /export/Portfolio/cifs_share > > http://oi42.tinypic.com/23rn60j.jpg > >=20 > >=20 > > And this was Andriy=B4s response: > > Hmm, so it adds the "FreeBSD" string twice. > > I am not sure what that means, consider sharing this result with the p= ublic, > > maybe someone will have a better idea. >=20 >=20 > Ah, hah, no wonder there is a panic: > static __inline ksiddomain_t * > ksid_lookupdomain(const char *domain) > { > ksiddomain_t *kd; >=20 > kd =3D kmem_alloc(sizeof(*kd), KM_SLEEP); > strlcpy(kd->kd_name, "FreeBSD", sizeof(kd->kd_name)); > return (kd); > } >=20 >=20 > So, no matter what input domain value is, the returned ksiddomain_t is go= ing to > have kd_name of "FreeBSD". Basically it means that if an on-disk fuid_nv= list > has more than one entry then we always are going to hit this panic. Not = good. Yeah. Karli, could you try the patch below? http://people.freebsd.org/~pjd/patches/zfs_sid.h.patch --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --idY8LE8SD6/8DnRI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk7t4fwACgkQForvXbEpPzS9GACgkhXq2YkBld7bdCZCKTrWwbdb jF0Anjy9HvcQzV4aM9lL8jh2qokTc9J0 =wuEf -----END PGP SIGNATURE----- --idY8LE8SD6/8DnRI-- From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 00:59:15 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EFDE106564A; Mon, 19 Dec 2011 00:59:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D4D818FC14; Mon, 19 Dec 2011 00:59:14 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAICL7k6DaFvO/2dsb2JhbABDFoR2p1KCHARSNQINGQKIdKVMkHmBL4c7ggSBFgSINoxIkkw X-IronPort-AV: E=Sophos;i="4.71,373,1320642000"; d="scan'208";a="148998857" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 18 Dec 2011 19:59:13 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CEAA2B3F77; Sun, 18 Dec 2011 19:59:13 -0500 (EST) Date: Sun, 18 Dec 2011 19:59:13 -0500 (EST) From: Rick Macklem To: freebsd-fs@freebsd.org Message-ID: <255844377.375232.1324256353832.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: John Subject: NFS client UDP retransmit timer busted for 8.n/9.n (patch) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 00:59:15 -0000 Thanks to recent detective work done by jwd@, a problem w.r.t. retransmit timeouts for UDP mounts (both old and new NFS clients) has been identified. The kernel rpc has two timeouts for UDP: 1 - a timeout that causes the RPC request to be retransmitted on the same socket, using the same xid. This one defaults to 3seconds and can be set via CLSET_RETRY_TIMEOUT. (This is always the default of 3seconds for FreeBSD currently.) 2 - a timeout that cause the socket to be destroyed and a fresh one created. The request is then sent on this new socket, with a different xid. The problem with #2 is that the retransmitted RPC request will miss a server's Duplicate Request Cache (DRC), because of the different xid. As such, #2 should be much larger than #1. However, #2 defaults to 1second (ie. smaller than #1->trouble!) One way to avoid this problem is to set #2 to a much larger value via the "timeout=" mount option. (Btw, the is in 1/10 seconds, so "timeout="600" sets it to 60sec.) I now have a patch that I believe deals with this correctly. It sets #1 to the "timeout=" (default 1second) and #2 to a much larger value. (#2 timeouts are what the kernel rpc counts as retries, so for "soft" mounts, I set #2 to "nm_retry * nm_timeout / 2" and "retries = 2", so that it fails after "nm_retry * nm_timeout", which I think is the correct semantics.) This patch is attached and is also available at: http://people.freebsd.org/~rmacklem/udp-timer.patch (jwd@, this patch is updated from what I emailed you, so you probably want it:-) In summary, if you are using NFS mounts over UDP on FreeBSD8 or 9 systems, you either want to use "timeout=600" or try the patch. You are pretty badly broken otherwise. Hopefully, this patch can make it into -current/head soon, rick ps: jhb@, could you maybe review this, thanks, rick. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 07:05:29 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D92D21065672; Mon, 19 Dec 2011 07:05:29 +0000 (UTC) (envelope-from Karli.Sjoberg@slu.se) Received: from Edge1-2.slu.se (edge1-2.slu.se [193.10.100.97]) by mx1.freebsd.org (Postfix) with ESMTP id 4180F8FC14; Mon, 19 Dec 2011 07:05:28 +0000 (UTC) Received: from Exchange1.ad.slu.se (193.10.100.94) by Edge1-2.slu.se (193.10.100.97) with Microsoft SMTP Server (TLS) id 8.3.213.0; Mon, 19 Dec 2011 08:05:25 +0100 Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange1.ad.slu.se ([193.10.100.94]) with mapi; Mon, 19 Dec 2011 08:05:25 +0100 From: =?Windows-1252?Q?Karli_Sj=F6berg?= To: Pawel Jakub Dawidek Date: Mon, 19 Dec 2011 08:05:24 +0100 Thread-Topic: Consistant panics trying to access zfs filesystems replicated from Sun/Oracle appliance Thread-Index: Acy+HJTS3cv68xpkQIitL8xpVNh+Kg== Message-ID: References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <73B607D0-8C3E-4980-B901-B986E060D32E@slu.se> <4EE7754D.3050605@FreeBSD.org> <558C926F-14FA-458D-BB8E-D20BA46BE6D2@slu.se> <4EE9EC6F.2080808@FreeBSD.org> <20111218125212.GE1685@garage.freebsd.pl> In-Reply-To: <20111218125212.GE1685@garage.freebsd.pl> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: sv-SE, en-US MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "fs@freebsd.org" , =?Windows-1252?Q?Martin_Matu=9Aka?= , Andriy Gapon Subject: Re: Consistant panics trying to access zfs filesystems replicated from Sun/Oracle appliance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 07:05:29 -0000 Hi Pawel, thank you so much for the patch, I=B4ll try it right away. Do I need to rec= ompile anything afterwards, or does this patch just magically solve everyth= ing right away? /Karli 18 dec 2011 kl. 13.52 skrev Pawel Jakub Dawidek: On Thu, Dec 15, 2011 at 02:47:43PM +0200, Andriy Gapon wrote: on 15/12/2011 13:59 Karli Sj=F6berg said the following: Hi all, with the help of Andriy Gapon, I managed to capture what happened: # cd /export/Portfolio/ci (TAB) http://oi40.tinypic.com/b3lsog.jpg # cd /export/Portfolio/cifs_share http://oi42.tinypic.com/6e40op.jpg # ls /export/Portfolio/cifs_share http://oi42.tinypic.com/23rn60j.jpg And this was Andriy=B4s response: Hmm, so it adds the "FreeBSD" string twice. I am not sure what that means, consider sharing this result with the publi= c, maybe someone will have a better idea. Ah, hah, no wonder there is a panic: static __inline ksiddomain_t * ksid_lookupdomain(const char *domain) { ksiddomain_t *kd; kd =3D kmem_alloc(sizeof(*kd), KM_SLEEP); strlcpy(kd->kd_name, "FreeBSD", sizeof(kd->kd_name)); return (kd); } So, no matter what input domain value is, the returned ksiddomain_t is goin= g to have kd_name of "FreeBSD". Basically it means that if an on-disk fuid_nvli= st has more than one entry then we always are going to hit this panic. Not go= od. Yeah. Karli, could you try the patch below? http://people.freebsd.org/~pjd/patches/zfs_sid.h.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com Med V=E4nliga H=E4lsningar ---------------------------------------------------------------------------= ---- Karli Sj=F6berg Swedish University of Agricultural Sciences Box 7079 (Visiting Address Kron=E5sv=E4gen 8) S-750 07 Uppsala, Sweden Phone: +46-(0)18-67 15 66 karli.sjoberg@slu.se From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 07:44:05 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5F16106564A; Mon, 19 Dec 2011 07:44:05 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 38E4C8FC12; Mon, 19 Dec 2011 07:44:04 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 0095A1E2; Mon, 19 Dec 2011 08:44:01 +0100 (CET) Date: Mon, 19 Dec 2011 08:42:54 +0100 From: Pawel Jakub Dawidek To: Karli =?iso-8859-1?Q?Sj=F6berg?= Message-ID: <20111219074254.GB13434@garage.freebsd.pl> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <73B607D0-8C3E-4980-B901-B986E060D32E@slu.se> <4EE7754D.3050605@FreeBSD.org> <558C926F-14FA-458D-BB8E-D20BA46BE6D2@slu.se> <4EE9EC6F.2080808@FreeBSD.org> <20111218125212.GE1685@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PuGuTyElPB9bOcsM" Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "fs@freebsd.org" , Martin =?utf-8?Q?Matu=C5=A1ka?= , Andriy Gapon Subject: Re: Consistant panics trying to access zfs filesystems replicated from Sun/Oracle appliance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 07:44:05 -0000 --PuGuTyElPB9bOcsM Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 19, 2011 at 08:05:24AM +0100, Karli Sj=F6berg wrote: > Hi Pawel, >=20 > thank you so much for the patch, I=B4ll try it right away. Do I need to r= ecompile anything afterwards, or does this patch just magically solve every= thing right away? You need to recompile at very least zfs.ko module. It should solve the panic you are seeing, but I can't tell if this would be the last. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --PuGuTyElPB9bOcsM Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk7u6v4ACgkQForvXbEpPzS9YACeNnCao+bGXekVBlxq2ojmVfj+ t5kAoLb2Dd0lYJQ5TQZsXgLkx5/WKEof =tF7V -----END PGP SIGNATURE----- --PuGuTyElPB9bOcsM-- From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 09:32:57 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D94241065672; Mon, 19 Dec 2011 09:32:57 +0000 (UTC) (envelope-from Karli.Sjoberg@slu.se) Received: from edge1-1.slu.se (edge1-1.slu.se [193.10.100.96]) by mx1.freebsd.org (Postfix) with ESMTP id 50F4C8FC15; Mon, 19 Dec 2011 09:32:56 +0000 (UTC) Received: from Exchange2.ad.slu.se (193.10.100.95) by edge1-1.slu.se (193.10.100.96) with Microsoft SMTP Server (TLS) id 8.3.213.0; Mon, 19 Dec 2011 10:32:54 +0100 Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange2.ad.slu.se ([193.10.100.95]) with mapi; Mon, 19 Dec 2011 10:32:53 +0100 From: =?Windows-1252?Q?Karli_Sj=F6berg?= To: Pawel Jakub Dawidek Date: Mon, 19 Dec 2011 10:32:52 +0100 Thread-Topic: Consistant panics trying to access zfs filesystems replicated from Sun/Oracle appliance Thread-Index: Acy+MS65tSSUxmCKS5yVDlH01HBWCA== Message-ID: <12A0B28F-A715-4D64-A050-1468E7948AB8@slu.se> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <73B607D0-8C3E-4980-B901-B986E060D32E@slu.se> <4EE7754D.3050605@FreeBSD.org> <558C926F-14FA-458D-BB8E-D20BA46BE6D2@slu.se> <4EE9EC6F.2080808@FreeBSD.org> <20111218125212.GE1685@garage.freebsd.pl> <20111219074254.GB13434@garage.freebsd.pl> In-Reply-To: <20111219074254.GB13434@garage.freebsd.pl> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: sv-SE, en-US MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "fs@freebsd.org" , =?Windows-1252?Q?Martin_Matu=9Aka?= , Andriy Gapon Subject: Re: Consistant panics trying to access zfs filesystems replicated from Sun/Oracle appliance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 09:32:57 -0000 19 dec 2011 kl. 08.42 skrev Pawel Jakub Dawidek: On Mon, Dec 19, 2011 at 08:05:24AM +0100, Karli Sj=F6berg wrote: Hi Pawel, thank you so much for the patch, I=B4ll try it right away. Do I need to rec= ompile anything afterwards, or does this patch just magically solve everyth= ing right away? You need to recompile at very least zfs.ko module. It should solve the panic you are seeing, but I can't tell if this would be the last. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com After recompiling, rebooting and making a "find" in a 9TB filesystem, I am = happy to be able to say it totally worked! Awesome guys, thank you so much!= When do you think this can be merged into 8.2-STABLE or 9.0? Med V=E4nliga H=E4lsningar ---------------------------------------------------------------------------= ---- Karli Sj=F6berg Swedish University of Agricultural Sciences Box 7079 (Visiting Address Kron=E5sv=E4gen 8) S-750 07 Uppsala, Sweden Phone: +46-(0)18-67 15 66 karli.sjoberg@slu.se From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 11:07:04 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EDD531065672 for ; Mon, 19 Dec 2011 11:07:04 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DAD418FC0C for ; Mon, 19 Dec 2011 11:07:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id pBJB74Dh010931 for ; Mon, 19 Dec 2011 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id pBJB74rc010929 for freebsd-fs@FreeBSD.org; Mon, 19 Dec 2011 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 19 Dec 2011 11:07:04 GMT Message-Id: <201112191107.pBJB74rc010929@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 11:07:05 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/162083 fs [zfs] [panic] zfs unmount -f pool o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161674 fs [ufs] snapshot on journaled ufs doesn't work o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161511 fs [unionfs] Filesystem deadlocks when using multiple uni o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs Random UFS root filesystem corruption with SU+J [regre o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159971 fs [ffs] [panic] panic with soft updates journaling durin o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs f kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 256 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 13:04:55 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B548106566B for ; Mon, 19 Dec 2011 13:04:55 +0000 (UTC) (envelope-from hugo@barafranca.com) Received: from mail.barafranca.com (mail.barafranca.com [67.213.67.47]) by mx1.freebsd.org (Postfix) with ESMTP id 3FE2B8FC15 for ; Mon, 19 Dec 2011 13:04:55 +0000 (UTC) Received: from localhost (unknown [172.16.100.24]) by mail.barafranca.com (Postfix) with ESMTP id D0A55745 for ; Mon, 19 Dec 2011 12:47:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at barafranca.com Received: from mail.barafranca.com ([172.16.100.24]) by localhost (mail.barafranca.com [172.16.100.24]) (amavisd-new, port 10024) with ESMTP id EBogoNmhMnEG for ; Mon, 19 Dec 2011 12:46:30 +0000 (UTC) Received: from [192.168.1.1] (static-b4-252-232.telepac.pt [81.193.252.232]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.barafranca.com (Postfix) with ESMTPSA id 54803735 for ; Mon, 19 Dec 2011 12:46:30 +0000 (UTC) Message-ID: <4EEF321E.5090806@barafranca.com> Date: Mon, 19 Dec 2011 12:46:22 +0000 From: Hugo Silva User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.ORG Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 13:04:55 -0000 Hello, I've been doing some tests with 9.0 RC3 and ZFS. This particular server has 6 disks and a single zpool on mfid?p3 partitions, which I've temporarily arranged into a raid-10 with 2 spares, single pool. I've been thinking about whether it makes sense to separate the rpool from the data pool(s).. It seems to make sense at some levels to separate the rpool from the app/user data, but to me it's less clear whether this is a good idea or not when the backing disks are all the same. One idea would be creating a 4-way mirror on small partitions for the rpool (sturdier), and a zfs raid-10 on the remaining larger partition. I'm curious about the performance implications (if any) of having >1 zpools on the same disks (considering that during normal usage, it'll be the data pool seeing 99.999% of the action) and whether anyone has thought the same and/or applied this concept in production. Regards, Hugo From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 22:16:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 5B4581065670; Mon, 19 Dec 2011 22:16:17 +0000 (UTC) Date: Mon, 19 Dec 2011 22:16:17 +0000 From: Alexander Best To: freebsd-current@freebsd.org Message-ID: <20111219221617.GA70383@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Cc: freebsd-fs@freebsd.org Subject: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 22:16:17 -0000 hi there, i'm using a usb hdd with the following specs: otaku% sudo smartctl -i /dev/da0 smartctl 5.42 2011-10-20 r3458 [FreeBSD 10.0-CURRENT amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital My Passport Essential SE (USB, Adv. Format) Device Model: WDC WD10TMVW-11ZSMS4 Serial Number: WD-WXJ1A81C1845 LU WWN Device Id: 5 0014ee 1af1e4483 Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes [1,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Dec 19 23:00:43 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled unfortunately i didn't align it properly using gpart(8)'s -a switch. performance wise it shouldn't cause any issues, because i'm accessing this hdd through usb 2 exclusively. however my concern is that using an alignment of 512 will put an extra workload onto the hdd (doing the conversion -> 4096). will this reduce my hdd's life expectancy? in that case i might consider re-partitioning it (with proper alignment settings). cheers. alex ps: the hdd only gets mounted read-only! From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 22:42:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 946BA106566C; Mon, 19 Dec 2011 22:42:01 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 0A5A48FC0C; Mon, 19 Dec 2011 22:42:00 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 3D2C05DD3; Mon, 19 Dec 2011 22:22:22 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.5/8.14.5) with ESMTP id pBJMMMjF042200; Mon, 19 Dec 2011 22:22:22 GMT (envelope-from phk@phk.freebsd.dk) To: Alexander Best From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 19 Dec 2011 22:16:17 GMT." <20111219221617.GA70383@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Date: Mon, 19 Dec 2011 22:22:22 +0000 Message-ID: <42198.1324333342@critter.freebsd.dk> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 22:42:01 -0000 In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: >ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 22:47:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 1F42F1065672; Mon, 19 Dec 2011 22:47:00 +0000 (UTC) Date: Mon, 19 Dec 2011 22:47:00 +0000 From: Alexander Best To: Poul-Henning Kamp Message-ID: <20111219224700.GA75581@freebsd.org> References: <20111219221617.GA70383@freebsd.org> <42198.1324333342@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42198.1324333342@critter.freebsd.dk> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 22:47:00 -0000 On Mon Dec 19 11, Poul-Henning Kamp wrote: > In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: > > >ps: the hdd only gets mounted read-only! > > There is no known wear-effects in flash storage as long as you > only read. > > You may need to do refresh-writes every 5-10 years to avoid > tunnel-leakage bit errors, but most flash controllers use semi-long > ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 22:49:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 467221065672; Mon, 19 Dec 2011 22:49:30 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id F32CC8FC0C; Mon, 19 Dec 2011 22:49:29 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id E81255DAA; Mon, 19 Dec 2011 22:49:28 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.5/8.14.5) with ESMTP id pBJMnSU3032198; Mon, 19 Dec 2011 22:49:28 GMT (envelope-from phk@phk.freebsd.dk) To: Alexander Best From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 19 Dec 2011 22:47:00 GMT." <20111219224700.GA75581@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Date: Mon, 19 Dec 2011 22:49:28 +0000 Message-ID: <32197.1324334968@critter.freebsd.dk> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 22:49:30 -0000 In message <20111219224700.GA75581@freebsd.org>, Alexander Best writes: >On Mon Dec 19 11, Poul-Henning Kamp wrote: >> In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: >> >> >ps: the hdd only gets mounted read-only! >> >> There is no known wear-effects in flash storage as long as you >> only read. >> >> You may need to do refresh-writes every 5-10 years to avoid >> tunnel-leakage bit errors, but most flash controllers use semi-long >> ECC syndromes and will do so on first bit that gives an read error. > >this is a regular hdd i believe -- no ssd. at least when i plug it into my >usb drive i hear the hdd spinning up and causing vibrations. i don't think >that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 22:56:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 3C98B1065673; Mon, 19 Dec 2011 22:56:33 +0000 (UTC) Date: Mon, 19 Dec 2011 22:56:33 +0000 From: Alexander Best To: Poul-Henning Kamp Message-ID: <20111219225633.GA77147@freebsd.org> References: <20111219224700.GA75581@freebsd.org> <32197.1324334968@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <32197.1324334968@critter.freebsd.dk> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 22:56:33 -0000 On Mon Dec 19 11, Poul-Henning Kamp wrote: > In message <20111219224700.GA75581@freebsd.org>, Alexander Best writes: > >On Mon Dec 19 11, Poul-Henning Kamp wrote: > >> In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: > >> > >> >ps: the hdd only gets mounted read-only! > >> > >> There is no known wear-effects in flash storage as long as you > >> only read. > >> > >> You may need to do refresh-writes every 5-10 years to avoid > >> tunnel-leakage bit errors, but most flash controllers use semi-long > >> ECC syndromes and will do so on first bit that gives an read error. > > > >this is a regular hdd i believe -- no ssd. at least when i plug it into my > >usb drive i hear the hdd spinning up and causing vibrations. i don't think > >that would be the case with an ssd. > > Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? cheers. alex > > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 23:01:16 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AE1E9106566B; Mon, 19 Dec 2011 23:01:16 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 697848FC0A; Mon, 19 Dec 2011 23:01:16 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 57F6E5DBE; Mon, 19 Dec 2011 23:01:15 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.5/8.14.5) with ESMTP id pBJN1FXV078666; Mon, 19 Dec 2011 23:01:15 GMT (envelope-from phk@phk.freebsd.dk) To: Alexander Best From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 19 Dec 2011 22:56:33 GMT." <20111219225633.GA77147@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Date: Mon, 19 Dec 2011 23:01:15 +0000 Message-ID: <78665.1324335675@critter.freebsd.dk> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 23:01:16 -0000 In message <20111219225633.GA77147@freebsd.org>, Alexander Best writes: >no problem. so will the improper alignment also not cause a life expectancy >shortage in case of a hdd (non-flash-based)? Well, theoretically you will have more track-to-track seeks, as some blocks will span cylinders, but I doubt that will have measurable impact on lifetime, compared with the gains you could harvest if you spin it down for even just 1 hour a day... Read-Only/Read-Write makes no difference that I know of for hard-disks. >and one other question: the hdd also supports usb 3. will the improper >alignment have any effect (speed wise) when connected via usb 3, or is even >usb 3 too slow to notice the performance drop due to the improper alignment? Again: I doubt it will be measurable. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 23:20:12 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66AF41065686 for ; Mon, 19 Dec 2011 23:20:12 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.emeryville.ca.mail.comcast.net (qmta10.emeryville.ca.mail.comcast.net [76.96.30.17]) by mx1.freebsd.org (Postfix) with ESMTP id 3983C8FC0C for ; Mon, 19 Dec 2011 23:20:12 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta10.emeryville.ca.mail.comcast.net with comcast id BAVw1i0031bwxycAABL5kz; Mon, 19 Dec 2011 23:20:05 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id BBtN1i0121t3BNj8eBtNg6; Mon, 19 Dec 2011 23:53:23 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B3562102C19; Mon, 19 Dec 2011 15:20:10 -0800 (PST) Date: Mon, 19 Dec 2011 15:20:10 -0800 From: Jeremy Chadwick To: Alexander Best Message-ID: <20111219232010.GA31612@icarus.home.lan> References: <20111219224700.GA75581@freebsd.org> <32197.1324334968@critter.freebsd.dk> <20111219225633.GA77147@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111219225633.GA77147@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Poul-Henning Kamp , freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 23:20:12 -0000 On Mon, Dec 19, 2011 at 10:56:33PM +0000, Alexander Best wrote: > On Mon Dec 19 11, Poul-Henning Kamp wrote: > > In message <20111219224700.GA75581@freebsd.org>, Alexander Best writes: > > >On Mon Dec 19 11, Poul-Henning Kamp wrote: > > >> In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: > > >> > > >> >ps: the hdd only gets mounted read-only! > > >> > > >> There is no known wear-effects in flash storage as long as you > > >> only read. > > >> > > >> You may need to do refresh-writes every 5-10 years to avoid > > >> tunnel-leakage bit errors, but most flash controllers use semi-long > > >> ECC syndromes and will do so on first bit that gives an read error. > > > > > >this is a regular hdd i believe -- no ssd. at least when i plug it into my > > >usb drive i hear the hdd spinning up and causing vibrations. i don't think > > >that would be the case with an ssd. > > > > Ahh, sorry, I don't know why I thought it was flash. > > no problem. so will the improper alignment also not cause a life expectancy > shortage in case of a hdd (non-flash-based)? The improper alignment will result in sub-par write performance, and a slight decrease in read performance writes -- but will not impact life expectancy or "harm" the drive in any way. I recommend strongly that you rectify the situation before you get too carried away with software installations, etc.. And yes I am aware what you have is a mechanical HDD not an SSD (I say in this advance of what I'm about to write). If you need a ""safe"" alignment value, most software on Windows (including Windows 7) pick a value of 2MBytes as the alignment offset, which I believe is LBA 4095, since everything software-wise uses 512-byte sectors. That's calculated via: 2097152 / 512. This number is also evenly divisible by 4096 bytes (which is what you're trying to ensure for performance). Readers, as well as you, may wonder where the "magical" 2MByte value comes from, and can you pick something smaller. Yes you can pick something smaller, but the value itself stems from the added complexity of SSDs and NAND erase page size vs. NAND page size. A value of 2MBytes works well on all brands of SSDs on the market (as of this writing). Which reminds me -- I need to go back and redo most of our systems that use Intel SSDs, since at the time I picked the default offset in sysinstall (LBA 63, thus 64 * 512 = 32KBytes), which though divisible by 4096, is not optimal for NAND erase page size. I would love to advocate FreeBSD change sysinstall/bsdinstall to use a default offset of 2MBytes, but I imagine that would upset a lot of people who install FreeBSD on "limited space" devices (CF, etc.). Honestly though, with the size of media these days........ > and one other question: the hdd also supports usb 3. will the improper > alignment have any effect (speed wise) when connected via usb 3, or is even > usb 3 too slow to notice the performance drop due to the improper alignment? USB 3.0 vs. 2.0 vs. eSATA vs. native SATA has no bearing on the situation. Those are transport protocols that define "maximum bandwidth". By the way, the hard disk itself does not "support USB 3.0" -- your drive is in an enclosure that contains a SATA<->USB3.0 conversion chipset inside. If you open the enclosure, you will find the hard disk is SATA, and probably supports SATA600. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 23:22:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C94E3106566C for ; Mon, 19 Dec 2011 23:22:18 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id 745918FC0A for ; Mon, 19 Dec 2011 23:22:18 +0000 (UTC) Received: from omta15.westchester.pa.mail.comcast.net ([76.96.62.87]) by qmta09.westchester.pa.mail.comcast.net with comcast id B8dA1i0021swQuc59BNJqU; Mon, 19 Dec 2011 23:22:18 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta15.westchester.pa.mail.comcast.net with comcast id BBNH1i01B1t3BNj3bBNJAQ; Mon, 19 Dec 2011 23:22:18 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 65077102C19; Mon, 19 Dec 2011 15:22:16 -0800 (PST) Date: Mon, 19 Dec 2011 15:22:16 -0800 From: Jeremy Chadwick To: Alexander Best Message-ID: <20111219232216.GA31865@icarus.home.lan> References: <20111219224700.GA75581@freebsd.org> <32197.1324334968@critter.freebsd.dk> <20111219225633.GA77147@freebsd.org> <20111219232010.GA31612@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111219232010.GA31612@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Poul-Henning Kamp , freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 23:22:18 -0000 On Mon, Dec 19, 2011 at 03:20:10PM -0800, Jeremy Chadwick wrote: > On Mon, Dec 19, 2011 at 10:56:33PM +0000, Alexander Best wrote: > > On Mon Dec 19 11, Poul-Henning Kamp wrote: > > > In message <20111219224700.GA75581@freebsd.org>, Alexander Best writes: > > > >On Mon Dec 19 11, Poul-Henning Kamp wrote: > > > >> In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: > > > >> > > > >> >ps: the hdd only gets mounted read-only! > > > >> > > > >> There is no known wear-effects in flash storage as long as you > > > >> only read. > > > >> > > > >> You may need to do refresh-writes every 5-10 years to avoid > > > >> tunnel-leakage bit errors, but most flash controllers use semi-long > > > >> ECC syndromes and will do so on first bit that gives an read error. > > > > > > > >this is a regular hdd i believe -- no ssd. at least when i plug it into my > > > >usb drive i hear the hdd spinning up and causing vibrations. i don't think > > > >that would be the case with an ssd. > > > > > > Ahh, sorry, I don't know why I thought it was flash. > > > > no problem. so will the improper alignment also not cause a life expectancy > > shortage in case of a hdd (non-flash-based)? > > The improper alignment will result in sub-par write performance, and a > slight decrease in read performance writes -- but will not impact life > expectancy or "harm" the drive in any way. This should have read "...slight decrease in read performance", not "read performance writes". Editing mistake on my part. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Dec 19 23:43:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 25E501065688; Mon, 19 Dec 2011 23:43:10 +0000 (UTC) Date: Mon, 19 Dec 2011 23:43:10 +0000 From: Alexander Best To: Jeremy Chadwick Message-ID: <20111219234310.GA84478@freebsd.org> References: <20111219224700.GA75581@freebsd.org> <32197.1324334968@critter.freebsd.dk> <20111219225633.GA77147@freebsd.org> <20111219232010.GA31612@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111219232010.GA31612@icarus.home.lan> Cc: freebsd-fs@freebsd.org, Poul-Henning Kamp , freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 23:43:10 -0000 On Mon Dec 19 11, Jeremy Chadwick wrote: > On Mon, Dec 19, 2011 at 10:56:33PM +0000, Alexander Best wrote: > > On Mon Dec 19 11, Poul-Henning Kamp wrote: > > > In message <20111219224700.GA75581@freebsd.org>, Alexander Best writes: > > > >On Mon Dec 19 11, Poul-Henning Kamp wrote: > > > >> In message <20111219221617.GA70383@freebsd.org>, Alexander Best writes: > > > >> > > > >> >ps: the hdd only gets mounted read-only! > > > >> > > > >> There is no known wear-effects in flash storage as long as you > > > >> only read. > > > >> > > > >> You may need to do refresh-writes every 5-10 years to avoid > > > >> tunnel-leakage bit errors, but most flash controllers use semi-long > > > >> ECC syndromes and will do so on first bit that gives an read error. > > > > > > > >this is a regular hdd i believe -- no ssd. at least when i plug it into my > > > >usb drive i hear the hdd spinning up and causing vibrations. i don't think > > > >that would be the case with an ssd. > > > > > > Ahh, sorry, I don't know why I thought it was flash. > > > > no problem. so will the improper alignment also not cause a life expectancy > > shortage in case of a hdd (non-flash-based)? > > The improper alignment will result in sub-par write performance, and a > slight decrease in read performance writes -- but will not impact life > expectancy or "harm" the drive in any way. > > I recommend strongly that you rectify the situation before you get too > carried away with software installations, etc.. > > And yes I am aware what you have is a mechanical HDD not an SSD (I say > in this advance of what I'm about to write). > > If you need a ""safe"" alignment value, most software on Windows > (including Windows 7) pick a value of 2MBytes as the alignment offset, > which I believe is LBA 4095, since everything software-wise uses > 512-byte sectors. That's calculated via: 2097152 / 512. > > This number is also evenly divisible by 4096 bytes (which is what you're > trying to ensure for performance). > > Readers, as well as you, may wonder where the "magical" 2MByte value > comes from, and can you pick something smaller. Yes you can pick > something smaller, but the value itself stems from the added complexity > of SSDs and NAND erase page size vs. NAND page size. A value of 2MBytes > works well on all brands of SSDs on the market (as of this writing). > > Which reminds me -- I need to go back and redo most of our systems that > use Intel SSDs, since at the time I picked the default offset in > sysinstall (LBA 63, thus 64 * 512 = 32KBytes), which though divisible by > 4096, is not optimal for NAND erase page size. > > I would love to advocate FreeBSD change sysinstall/bsdinstall to use a > default offset of 2MBytes, but I imagine that would upset a lot of > people who install FreeBSD on "limited space" devices (CF, etc.). > Honestly though, with the size of media these days........ thanks a lot for the explanation. i'm going to get another drive, soon, and will then be able to "fix" the alignment, as i currently have no place where i can backup the data of my current (misaligned) hdd. > > > and one other question: the hdd also supports usb 3. will the improper > > alignment have any effect (speed wise) when connected via usb 3, or is even > > usb 3 too slow to notice the performance drop due to the improper alignment? > > USB 3.0 vs. 2.0 vs. eSATA vs. native SATA has no bearing on the > situation. Those are transport protocols that define "maximum > bandwidth". > > By the way, the hard disk itself does not "support USB 3.0" -- your > drive is in an enclosure that contains a SATA<->USB3.0 conversion > chipset inside. If you open the enclosure, you will find the hard disk > is SATA, and probably supports SATA600. i was ware of this fact. what i meant by speed in connection with usb 3 was the following example-case (please don't take the numbers literally) 1) the drive itself can do 500 mb/sec when aligned properly 2) the drive does 350 mb/sec when aligned improperly (512 boundry) 3) usb 3 can do 100 mb/sec ... so in this case the improper alignment wouldn't have an impact, since even with proper alignment only 100 mb/sec were possible. however in the following example: 1) 500 mb/sec 2) 100 mb/sec 3) 200 mb/sec the improper alignment would have an impact, since usb 3 *could* perform at 200 mb/sec with proper alignment, but will drop to 100 mb/sec in the case of improper alignment. again...please don't take the transfer rates literaly. they're most defenately bogus. cheers. alex > > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Dec 20 04:47:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4402C1065670 for ; Tue, 20 Dec 2011 04:47:14 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id DAA578FC14 for ; Tue, 20 Dec 2011 04:47:13 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id pBK4AG4h002046; Mon, 19 Dec 2011 21:10:16 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id pBK4AGO8002043; Mon, 19 Dec 2011 21:10:16 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Mon, 19 Dec 2011 21:10:16 -0700 (MST) From: Warren Block To: Alexander Best In-Reply-To: <20111219225633.GA77147@freebsd.org> Message-ID: References: <20111219224700.GA75581@freebsd.org> <32197.1324334968@critter.freebsd.dk> <20111219225633.GA77147@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Mon, 19 Dec 2011 21:10:16 -0700 (MST) Cc: freebsd-fs@freebsd.org, Poul-Henning Kamp , freebsd-current@freebsd.org Subject: Re: can a wrong alignment cause a decrease in a hdd's life expectancy? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 04:47:14 -0000 On Mon, 19 Dec 2011, Alexander Best wrote: > no problem. so will the improper alignment also not cause a life expectancy > shortage in case of a hdd (non-flash-based)? > > and one other question: the hdd also supports usb 3. will the improper > alignment have any effect (speed wise) when connected via usb 3, or is even > usb 3 too slow to notice the performance drop due to the improper alignment? Many variables: file system, file size, drive firmware... The only reason not to fix it is time. And space for a temporary copy... two, two reasons not to fix it. Benchmark it as-is, back up, realign, restore, benchmark again. Or live with the gnawing, creeping doubt of not knowing for sure. Every day wondering "is that drive slower than it could be just from a simple alignment error? Is every read a mere fraction of its potential? But it's probably fine. No pressure. From owner-freebsd-fs@FreeBSD.ORG Tue Dec 20 10:03:47 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 96FA91065679 for ; Tue, 20 Dec 2011 10:03:47 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from fallbackmx06.syd.optusnet.com.au (fallbackmx06.syd.optusnet.com.au [211.29.132.8]) by mx1.freebsd.org (Postfix) with ESMTP id 224D68FC13 for ; Tue, 20 Dec 2011 10:03:45 +0000 (UTC) Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au [211.29.132.197]) by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id pBK7vOew020514 for ; Tue, 20 Dec 2011 18:57:27 +1100 Received: from server.vk2pj.dyndns.org (c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103]) by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id pBK7vHnQ031874 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 20 Dec 2011 18:57:18 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.5/8.14.4) with ESMTP id pBK7vFaR035981; Tue, 20 Dec 2011 18:57:15 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.5/8.14.4/Submit) id pBK7vFoU035980; Tue, 20 Dec 2011 18:57:15 +1100 (EST) (envelope-from peter) Date: Tue, 20 Dec 2011 18:57:14 +1100 From: Peter Jeremy To: Hugo Silva Message-ID: <20111220075714.GA35787@server.vk2pj.dyndns.org> References: <4EEF321E.5090806@barafranca.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="x+6KMIRAuhnl3hBn" Content-Disposition: inline In-Reply-To: <4EEF321E.5090806@barafranca.com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.ORG Subject: Re: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 10:03:47 -0000 --x+6KMIRAuhnl3hBn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Dec-19 12:46:22 +0000, Hugo Silva wrote: >I've been thinking about whether it makes sense to separate the rpool >from the data pool(s).. I think it does. I have 6 1TB disks with 8GB carved off the front of each disk for root & swap. I initially used a separate (gmirrored) UFS root (including /usr/src and /usr/obj) because I didn't completely trust ZFS. I've since moved to a 3-way mirrored ZFS root, with the "root" area of the remaining 3 disks basically spare (I use them for upgrades). The bulk of the disks form a 6-way RAIDZ2 data pool. I still think having a separate root makes sense because it should simplify recovery if everything goes pear-shaped. >One idea would be creating a 4-way mirror on small partitions for the >rpool (sturdier), and a zfs raid-10 on the remaining larger partition. I'd recommend having two 2-way mirrored root pools that you update alternately. There are a couple of failure modes where it can be difficult to difficult to get back to a known working state without a second boot/root. >I'm curious about the performance implications (if any) of having >1 >zpools on the same disks (considering that during normal usage, it'll be >the data pool seeing 99.999% of the action) and whether anyone has >thought the same and/or applied this concept in production. I haven't done any performance comparisons but would expect this to be similar to having multiple UFS filesystems on one disk. --=20 Peter Jeremy --x+6KMIRAuhnl3hBn Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7wP9oACgkQ/opHv/APuIeHlACfTT4yQqQFZCYpf1TZ3Y5B407L JIUAnR8dueaWQfZ9hGpv7gPIwgyP6mcM =Niwg -----END PGP SIGNATURE----- --x+6KMIRAuhnl3hBn-- From owner-freebsd-fs@FreeBSD.ORG Tue Dec 20 12:54:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FD9C106566B for ; Tue, 20 Dec 2011 12:54:00 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id C126C8FC18 for ; Tue, 20 Dec 2011 12:53:59 +0000 (UTC) Received: by ghrr16 with SMTP id r16so1035348ghr.13 for ; Tue, 20 Dec 2011 04:53:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Hdw6PxOGkr2Jmhn9fOLlsJOkN5pQC/SQMlCVZVCaN18=; b=UWAUeVUW0OgP2C/3zuj8XQ9M9y/KkaqZN1+/Ugj3KvqeORzF8AAaXsk9oXSrmSc74l 560ptl3L02qwVA+luEfb/kjR/vfHPeByFb2nPwkpQAm23KtmKMOu1Aqq3Wll5FBNJCnX 63eOofLQR4W2s7Vp4RraPv7vF8osDlukkyXUQ= MIME-Version: 1.0 Received: by 10.101.155.21 with SMTP id h21mr1054923ano.10.1324385639265; Tue, 20 Dec 2011 04:53:59 -0800 (PST) Received: by 10.236.190.41 with HTTP; Tue, 20 Dec 2011 04:53:59 -0800 (PST) In-Reply-To: <20111220075714.GA35787@server.vk2pj.dyndns.org> References: <4EEF321E.5090806@barafranca.com> <20111220075714.GA35787@server.vk2pj.dyndns.org> Date: Tue, 20 Dec 2011 12:53:59 +0000 Message-ID: From: krad To: Peter Jeremy Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 12:54:00 -0000 On 20 December 2011 07:57, Peter Jeremy wrote: > On 2011-Dec-19 12:46:22 +0000, Hugo Silva wrote: > >I've been thinking about whether it makes sense to separate the rpool > >from the data pool(s).. > > I think it does. I have 6 1TB disks with 8GB carved off the front of > each disk for root & swap. I initially used a separate (gmirrored) > UFS root (including /usr/src and /usr/obj) because I didn't completely > trust ZFS. I've since moved to a 3-way mirrored ZFS root, with the > "root" area of the remaining 3 disks basically spare (I use them for > upgrades). The bulk of the disks form a 6-way RAIDZ2 data pool. > > I still think having a separate root makes sense because it should > simplify recovery if everything goes pear-shaped. > > >One idea would be creating a 4-way mirror on small partitions for the > >rpool (sturdier), and a zfs raid-10 on the remaining larger partition. > > I'd recommend having two 2-way mirrored root pools that you update > alternately. There are a couple of failure modes where it can be > difficult to difficult to get back to a known working state without > a second boot/root. > > >I'm curious about the performance implications (if any) of having >1 > >zpools on the same disks (considering that during normal usage, it'll be > >the data pool seeing 99.999% of the action) and whether anyone has > >thought the same and/or applied this concept in production. > > I haven't done any performance comparisons but would expect this to > be similar to having multiple UFS filesystems on one disk. > > --ghr be, > Peter Jeremy > even easier option might just be to boot off a flash drive with a minimal installation on. then mount all the writable parts of the system off the pool (/tmp. var. home etc.) along with any meatier bits if the installation etc databases. Having said all that unless your doing lots of logging its unlikely the main os will actually cause much read/writes of the binaries on zfs. If you have a decent amount of stuff going on you will find most of the frequently used stuff will be in arc so you might be better off having the os on the main pool for simplicity. After all its the data thats the important part of the system not the os. OS configs are easy to back up and having a usb stick as a live recovery os is not a hard thing to do so os recovery is easy. Data recovery is another matter though. If you do put it on the same pool though I would separate the os off into its own hierarchy though. Somthing along the lines of this As you can see I create a new root fs every time I make world so rolling back is fairly easy system-4k/be 29.4G 120G 264K /system-4k/be system-4k/be/root20110930 1.73G 120G 1.31G legacy system-4k/be/root20111011 2.03G 120G 1.69G legacy system-4k/be/root20111023 1.98G 120G 1.68G /system-4k/be/root20111023 system-4k/be/root20111028 2.00G 120G 1.68G /system-4k/be/root20111028 system-4k/be/root20111112 2.08G 120G 1.76G /system-4k/be/root20111112 system-4k/be/root20111125 2.56G 120G 2.16G /system-4k/be/root20111125 system-4k/be/tmp 372K 122G 372K /tmp system-4k/be/usr-local 3.32G 120G 3.32G /usr/local/ system-4k/be/usr-obj 731M 120G 731M /usr/obj system-4k/be/usr-ports 2.34G 120G 1.71G /usr/ports system-4k/be/usr-ports/distfiles 641M 120G 641M /usr/ports/distfiles system-4k/be/usr-src 705M 120G 705M /usr/src system-4k/be/var 2.34G 126G 875M /var system-4k/be/var/log 1.46G 126G 1.46G /var/log system-4k/be/var/mysql 34.0M 126G 34.0M /var/db/mysql From owner-freebsd-fs@FreeBSD.ORG Tue Dec 20 18:10:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C79F106566B for ; Tue, 20 Dec 2011 18:10:26 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3DE028FC18 for ; Tue, 20 Dec 2011 18:10:26 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id E90B446B06; Tue, 20 Dec 2011 13:10:25 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6F1F1B914; Tue, 20 Dec 2011 13:10:25 -0500 (EST) From: John Baldwin To: Rick Macklem Date: Tue, 20 Dec 2011 13:10:24 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <255844377.375232.1324256353832.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <255844377.375232.1324256353832.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201112201310.24815.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 20 Dec 2011 13:10:25 -0500 (EST) Cc: freebsd-fs@freebsd.org, John Subject: Re: NFS client UDP retransmit timer busted for 8.n/9.n (patch) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 18:10:26 -0000 On Sunday, December 18, 2011 7:59:13 pm Rick Macklem wrote: > Thanks to recent detective work done by jwd@, a problem w.r.t. > retransmit timeouts for UDP mounts (both old and new NFS clients) > has been identified. > > The kernel rpc has two timeouts for UDP: > 1 - a timeout that causes the RPC request to be retransmitted on > the same socket, using the same xid. This one defaults to > 3seconds and can be set via CLSET_RETRY_TIMEOUT. > (This is always the default of 3seconds for FreeBSD currently.) > 2 - a timeout that cause the socket to be destroyed and a fresh > one created. The request is then sent on this new socket, with > a different xid. > > The problem with #2 is that the retransmitted RPC request will miss > a server's Duplicate Request Cache (DRC), because of the different xid. > As such, #2 should be much larger than #1. However, #2 defaults to 1second > (ie. smaller than #1->trouble!) > > One way to avoid this problem is to set #2 to a much larger value via the > "timeout=" mount option. (Btw, the is in 1/10 seconds, so > "timeout="600" sets it to 60sec.) > > I now have a patch that I believe deals with this correctly. It sets #1 > to the "timeout=" (default 1second) and #2 to a much larger value. > (#2 timeouts are what the kernel rpc counts as retries, so for "soft" > mounts, I set #2 to "nm_retry * nm_timeout / 2" and "retries = 2", so > that it fails after "nm_retry * nm_timeout", which I think is the correct > semantics.) > This patch is attached and is also available at: > http://people.freebsd.org/~rmacklem/udp-timer.patch > (jwd@, this patch is updated from what I emailed you, so you probably want it:-) > > In summary, if you are using NFS mounts over UDP on FreeBSD8 or 9 systems, you > either want to use "timeout=600" or try the patch. You are pretty badly broken > otherwise. > > Hopefully, this patch can make it into -current/head soon, rick > ps: jhb@, could you maybe review this, thanks, rick. It looks ok to me from what I can tell. I definitely agree that you want #2 to be much larger than #1, and I'll defer to you on the details of how to divide nm_timeo up, etc. I do think 'nm_retry * nm_timeout' is the timeout people expect for a soft mount. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Wed Dec 21 01:23:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70408106566B for ; Wed, 21 Dec 2011 01:23:44 +0000 (UTC) (envelope-from jwd@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3E47E8FC0C; Wed, 21 Dec 2011 01:23:44 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id pBL1Ni9m094376; Wed, 21 Dec 2011 01:23:44 GMT (envelope-from jwd@freefall.freebsd.org) Received: (from jwd@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id pBL1NhfZ094375; Wed, 21 Dec 2011 01:23:43 GMT (envelope-from jwd) Date: Wed, 21 Dec 2011 01:23:43 +0000 From: John De To: Rick Macklem Message-ID: <20111221012343.GA86024@FreeBSD.org> References: <255844377.375232.1324256353832.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <255844377.375232.1324256353832.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org Subject: Re: NFS client UDP retransmit timer busted for 8.n/9.n (patch) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2011 01:23:45 -0000 Hi Rick, ----- Rick Macklem's Original Message ----- > Thanks to recent detective work done by jwd@, a problem w.r.t. > retransmit timeouts for UDP mounts (both old and new NFS clients) > has been identified. > > The kernel rpc has two timeouts for UDP: > 1 - a timeout that causes the RPC request to be retransmitted on > the same socket, using the same xid. This one defaults to > 3seconds and can be set via CLSET_RETRY_TIMEOUT. > (This is always the default of 3seconds for FreeBSD currently.) > 2 - a timeout that cause the socket to be destroyed and a fresh > one created. The request is then sent on this new socket, with > a different xid. > > The problem with #2 is that the retransmitted RPC request will miss > a server's Duplicate Request Cache (DRC), because of the different xid. > As such, #2 should be much larger than #1. However, #2 defaults to 1second > (ie. smaller than #1->trouble!) > > One way to avoid this problem is to set #2 to a much larger value via the > "timeout=" mount option. (Btw, the is in 1/10 seconds, so > "timeout="600" sets it to 60sec.) > > I now have a patch that I believe deals with this correctly. It sets #1 > to the "timeout=" (default 1second) and #2 to a much larger value. > (#2 timeouts are what the kernel rpc counts as retries, so for "soft" > mounts, I set #2 to "nm_retry * nm_timeout / 2" and "retries = 2", so > that it fails after "nm_retry * nm_timeout", which I think is the correct > semantics.) > This patch is attached and is also available at: > http://people.freebsd.org/~rmacklem/udp-timer.patch > (jwd@, this patch is updated from what I emailed you, so you probably want it:-) We've tested both the mount_nfs option change and the patch and both seem to work great. No re-occurence of the problem we were seeing. We've only been able to catch one retransmit via tcpdump and it showed the system retransmitting with the same xid/port and recovering. +1 for getting this committed. Thanks for your time looking into this Rick. -John > In summary, if you are using NFS mounts over UDP on FreeBSD8 or 9 systems, you > either want to use "timeout=600" or try the patch. You are pretty badly broken > otherwise. > > Hopefully, this patch can make it into -current/head soon, rick > ps: jhb@, could you maybe review this, thanks, rick. From owner-freebsd-fs@FreeBSD.ORG Wed Dec 21 02:53:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC4F71065670; Wed, 21 Dec 2011 02:53:37 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 566CC8FC0C; Wed, 21 Dec 2011 02:53:37 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAIVJ8U6DaFvO/2dsb2JhbABDFoR5qAeBcgEBBSMEUhsOCgICDRkCWQaIFaVkkWCBL4lKgRYEiDeMSJJQ X-IronPort-AV: E=Sophos;i="4.71,385,1320642000"; d="scan'208";a="149384165" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 20 Dec 2011 21:53:36 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 67D91B3EA4; Tue, 20 Dec 2011 21:53:36 -0500 (EST) Date: Tue, 20 Dec 2011 21:53:36 -0500 (EST) From: Rick Macklem To: John De Message-ID: <1231463684.484891.1324436016408.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20111221012343.GA86024@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: NFS client UDP retransmit timer busted for 8.n/9.n (patch) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2011 02:53:37 -0000 John De wrote: > Hi Rick, > > ----- Rick Macklem's Original Message ----- > > Thanks to recent detective work done by jwd@, a problem w.r.t. > > retransmit timeouts for UDP mounts (both old and new NFS clients) > > has been identified. > > > > The kernel rpc has two timeouts for UDP: > > 1 - a timeout that causes the RPC request to be retransmitted on > > the same socket, using the same xid. This one defaults to > > 3seconds and can be set via CLSET_RETRY_TIMEOUT. > > (This is always the default of 3seconds for FreeBSD currently.) > > 2 - a timeout that cause the socket to be destroyed and a fresh > > one created. The request is then sent on this new socket, with > > a different xid. > > > > The problem with #2 is that the retransmitted RPC request will miss > > a server's Duplicate Request Cache (DRC), because of the different > > xid. > > As such, #2 should be much larger than #1. However, #2 defaults to > > 1second > > (ie. smaller than #1->trouble!) > > > > One way to avoid this problem is to set #2 to a much larger value > > via the > > "timeout=" mount option. (Btw, the is in 1/10 > > seconds, so > > "timeout="600" sets it to 60sec.) > > > > I now have a patch that I believe deals with this correctly. It sets > > #1 > > to the "timeout=" (default 1second) and #2 to a much larger > > value. > > (#2 timeouts are what the kernel rpc counts as retries, so for > > "soft" > > mounts, I set #2 to "nm_retry * nm_timeout / 2" and "retries = 2", > > so > > that it fails after "nm_retry * nm_timeout", which I think is the > > correct > > semantics.) > > This patch is attached and is also available at: > > http://people.freebsd.org/~rmacklem/udp-timer.patch > > (jwd@, this patch is updated from what I emailed you, so you > > probably want it:-) > > We've tested both the mount_nfs option change and the patch and both > seem to work great. No re-occurence of the problem we were seeing. > We've only > been able to catch one retransmit via tcpdump and it showed the system > retransmitting with the same xid/port and recovering. > > +1 for getting this committed. > > Thanks for your time looking into this Rick. > And thanks for reporting it. Also, sorry everyone, this has been broken for a long time and I never spotted it. rick > -John > > > In summary, if you are using NFS mounts over UDP on FreeBSD8 or 9 > > systems, you > > either want to use "timeout=600" or try the patch. You are pretty > > badly broken > > otherwise. > > > > Hopefully, this patch can make it into -current/head soon, rick > > ps: jhb@, could you maybe review this, thanks, rick. From owner-freebsd-fs@FreeBSD.ORG Wed Dec 21 07:22:07 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 60128106566B for ; Wed, 21 Dec 2011 07:22:07 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id E7B338FC13 for ; Wed, 21 Dec 2011 07:22:06 +0000 (UTC) Received: by eekc50 with SMTP id c50so8588229eek.13 for ; Tue, 20 Dec 2011 23:22:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=6ynAqBbEPQ3iIUCGKIVxG00HOHr916Rj2fhYgDnLov8=; b=IhLYzfHqEOALI+XCpoExismIcAAAw8l3gPK+0XZy6sJ8chunndXxqhGvlA9A4ujNUm +7n6lYHDroTkPVdn+gYSB43xyfyc6xBdJV533cttV/sDBlP5BqPQh3X6IpWWW11gNNYu YLr6jlUl5z0aqCosBFMWnR209IQaIWaVFpBos= Received: by 10.14.4.229 with SMTP id 77mr760839eej.7.1324450408215; Tue, 20 Dec 2011 22:53:28 -0800 (PST) Received: from imba-brutale.totalterror.net ([93.152.152.135]) by mx.google.com with ESMTPS id q28sm17076455eea.6.2011.12.20.22.53.25 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 20 Dec 2011 22:53:26 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: Nikolay Denev In-Reply-To: <4EEF321E.5090806@barafranca.com> Date: Wed, 21 Dec 2011 08:53:24 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <654420CC-9E7D-4BAA-AC14-4F49196DFC74@gmail.com> References: <4EEF321E.5090806@barafranca.com> To: Hugo Silva X-Mailer: Apple Mail (2.1251.1) Cc: freebsd-fs@FreeBSD.ORG Subject: Re: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2011 07:22:07 -0000 On Dec 19, 2011, at 2:46 PM, Hugo Silva wrote: > Hello, >=20 > I've been doing some tests with 9.0 RC3 and ZFS. This particular = server > has 6 disks and a single zpool on mfid?p3 partitions, which I've > temporarily arranged into a raid-10 with 2 spares, single pool. >=20 >=20 > I've been thinking about whether it makes sense to separate the rpool > from the data pool(s).. >=20 > It seems to make sense at some levels to separate the rpool from the > app/user data, but to me it's less clear whether this is a good idea = or > not when the backing disks are all the same. >=20 > One idea would be creating a 4-way mirror on small partitions for the > rpool (sturdier), and a zfs raid-10 on the remaining larger partition. >=20 > I'm curious about the performance implications (if any) of having >1 > zpools on the same disks (considering that during normal usage, it'll = be > the data pool seeing 99.999% of the action) and whether anyone has > thought the same and/or applied this concept in production. >=20 > Regards, >=20 > Hugo Hello, It depends on what you will run on the machine but keep in mind that the root pool is limited to a single vdev and you cannot add ZIL/LOG = devices to=20 the root pool. As for the performance, I don't think it will make a difference having = separate root and data pools, except that you will have less spindles for the data pool. Regards, Nikolay= From owner-freebsd-fs@FreeBSD.ORG Wed Dec 21 22:15:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3230F106564A for ; Wed, 21 Dec 2011 22:15:33 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by mx1.freebsd.org (Postfix) with ESMTP id B467B8FC12 for ; Wed, 21 Dec 2011 22:15:32 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103]) by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id pBLMFT48021014 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Dec 2011 09:15:30 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.5/8.14.4) with ESMTP id pBLMFT54083908; Thu, 22 Dec 2011 09:15:29 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.5/8.14.4/Submit) id pBLMFSgk083907; Thu, 22 Dec 2011 09:15:28 +1100 (EST) (envelope-from peter) Date: Thu, 22 Dec 2011 09:15:28 +1100 From: Peter Jeremy To: krad Message-ID: <20111221221527.GA83643@server.vk2pj.dyndns.org> References: <4EEF321E.5090806@barafranca.com> <20111220075714.GA35787@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vtzGhvizbBRQ85DL" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2011 22:15:33 -0000 --vtzGhvizbBRQ85DL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Dec-20 12:53:59 +0000, krad wrote: >even easier option might just be to boot off a flash drive with a minimal >installation on. then mount all the writable parts of the system off the >pool (/tmp. var. home etc.) along with any meatier bits if the installation >etc databases. If you have the flash drive and spare ports, this may be an option but I would not recommend relying on USB. And note that, unless you use noatime or explicitly mount R/O, just accessing files will trigger writes. >As you can see I create a new root fs every time I make world so rolling >back is fairly easy Until you get the dreaded "ZFS: i/o error - all block copies unavailable" error from the bootloader. That appears to occur at the pool level so multiple BEs within the pool may not help. That said, your approach looks interesting - have you written up what you do and have tried booting from different roots? --=20 Peter Jeremy --vtzGhvizbBRQ85DL Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7yWn8ACgkQ/opHv/APuIea5QCfesCVsVgTbEM4DGIlaE1KtDXF H/UAoJOzxMhPWB801DxPLo1/iNzh1mtG =tnOi -----END PGP SIGNATURE----- --vtzGhvizbBRQ85DL-- From owner-freebsd-fs@FreeBSD.ORG Wed Dec 21 23:12:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A491C1065670 for ; Wed, 21 Dec 2011 23:12:58 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (unknown [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 79DC88FC13 for ; Wed, 21 Dec 2011 23:12:58 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id pBLNCj2a054427; Wed, 21 Dec 2011 15:12:45 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201112212312.pBLNCj2a054427@chez.mckusick.com> To: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= In-reply-to: Date: Wed, 21 Dec 2011 15:12:45 -0800 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@freebsd.org, Dieter BSD Subject: Re: Maximum blocksize for FFS? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2011 23:12:58 -0000 > From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= > Date: Wed, 14 Dec 2011 12:50:05 +0100 > Subject: Re: Maximum blocksize for FFS? > To: Kirk McKusick > Cc: Dieter BSD , freebsd-fs@freebsd.org > X-ASK-Info: Message Queued (2011/12/14 03:50:37) > X-ASK-Info: Confirmed by User (2011/12/14 04:45:07) > > On Tue, Dec 13, 2011 at 7:18 PM, Kirk McKusick wrote: > > The default blocksize in FreeBSD 9.0 is 32K/4K. We have been > > running with this size in -current for a almost a year with no > > reported problems. > > Hi, > > There is a reported problem: > The number of inode was divided by two with FreeBSD 9.0 (PR > bin/162659) and this create some problems because "the number of > fragments per inode (NFPI) was not adapted to the new default block > size" (Bruce Evans'explanation [1]). > > Regards, > > Olivier > > [1] http://lists.freebsd.org/pipermail/freebsd-bugs/2011-December/046713.html Thanks for bringing your report to my attention. I have applied the suggested change (reducing NFPI from 4 to 2) so as to keep the default number of inodes for a 32K/4K filesystem the same as were created on a 16K/2K filesystem. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Thu Dec 22 09:34:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04144106566B for ; Thu, 22 Dec 2011 09:34:51 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id A3B508FC0A for ; Thu, 22 Dec 2011 09:34:50 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC41BC5.dip.t-dialin.net [79.196.27.197]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 8B75C84400D; Thu, 22 Dec 2011 10:18:42 +0100 (CET) Received: from localhost (unknown [85.94.224.20]) by outgoing.leidinger.net (Postfix) with ESMTPSA id D6C035188; Thu, 22 Dec 2011 10:18:38 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1324545519; bh=I3sPYCgfUFglBQ/LX/wZCgX2z59EKkvVhWG0pKwUST0=; h=Date:Subject:Message-ID:From:To:Cc:MIME-Version:Content-Type; b=niI1GQvbXWoCNzUcPFknbHIT6E1anKJ0+91aCxPO5xZJO3a0Akahk+V0DIV7CbG64 VzRwQEyOm+Qi0CqKyDxcp3nGLYq7rYCbRQcNTyO0nHAT1KsO3VvyhH+P4bJCFXZQJi MKD4hxyvCvM2D/MDvCKLKABIt9bacnQcTSKmMp9/azV1NxCtW/EHUMFothPe0T5XmJ Ue6fI6kx+x8vAJ+c68b/uCxWLGGOqA5ksyx5BW+wWpNeWAJtIl5526nXkDaWTSm1sz AZLaZMoMLbhFNiYfXTnWFNGHIxgtYsyzZT3gTgE6bQWFAiSK3Lrc5w37LysyBdz50Z WkbGEVOzJ/ODg== Date: Thu, 22 Dec 2011 10:17:47 +0100 Message-ID: Importance: normal From: Alexander Leidinger To: peterjeremy@acm.org, kraduk@gmail.com MIME-Version: 1.0 X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 8B75C84400D.A0D1F X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.099, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1325150324.26779@T5hwU/O1zv7U9XaXcNX+ag X-EBL-Spam-Status: No X-Mailman-Approved-At: Thu, 22 Dec 2011 12:28:21 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2011 09:34:51 -0000 SGksCgp0aGVyZSBpcyBhIHNjcmlwdCB3aGljaCBoZWxwcyB0byBtYW5hZ2UgZGlmZmVyZW50IEJF cy4gaHR0cDovL2Fub25zdm4uaDNxLmNvbS9wcm9qZWN0cy9mcmVlYnNkLXBhdGNoZXMvbWFuYWdl QkUKClRoZXJlIGlzIGFsc28gYSBsaXR0bGUgYml0IG9mIGRvY3VtZW50YXRpb24uCgpCeWUsCkFs ZXhhbmRlci4KCi0tIApTZW5kIHZpYSBhbiBBbmRyb2lkIGRldmljZSwgcGxlYXNlIGZvcmdpdmUg YnJldml0eSBhbmQgdHlwb2dyYXBoaWMgYW5kIHNwZWxsaW5nIGVycm9ycy4gCgpQZXRlciBKZXJl bXkgPHBldGVyamVyZW15QGFjbS5vcmc+IGhhdCBnZXNjaHJpZWJlbjpPbiAyMDExLURlYy0yMCAx Mjo1Mzo1OSArMDAwMCwga3JhZCA8a3JhZHVrQGdtYWlsLmNvbT4gd3JvdGU6Cj5ldmVuIGVhc2ll ciBvcHRpb24gbWlnaHQganVzdCBiZSB0byBib290IG9mZiBhIGZsYXNoIGRyaXZlIHdpdGggYSBt aW5pbWFsCj5pbnN0YWxsYXRpb24gb24uIHRoZW4gbW91bnQgYWxsIHRoZSB3cml0YWJsZSBwYXJ0 cyBvZiB0aGUgc3lzdGVtIG9mZiB0aGUKPnBvb2wgKC90bXAuIHZhci4gaG9tZSBldGMuKSBhbG9u ZyB3aXRoIGFueSBtZWF0aWVyIGJpdHMgaWYgdGhlIGluc3RhbGxhdGlvbgo+ZXRjIGRhdGFiYXNl cy4KCklmIHlvdSBoYXZlIHRoZSBmbGFzaCBkcml2ZSBhbmQgc3BhcmUgcG9ydHMsIHRoaXMgbWF5 IGJlIGFuIG9wdGlvbiBidXQKSSB3b3VsZCBub3QgcmVjb21tZW5kIHJlbHlpbmcgb24gVVNCLsKg IEFuZCBub3RlIHRoYXQsIHVubGVzcyB5b3UgdXNlCm5vYXRpbWUgb3IgZXhwbGljaXRseSBtb3Vu dCBSL08sIGp1c3QgYWNjZXNzaW5nIGZpbGVzIHdpbGwgdHJpZ2dlcgp3cml0ZXMuCgo+QXMgeW91 IGNhbiBzZWUgSSBjcmVhdGUgYSBuZXcgcm9vdCBmcyBldmVyeSB0aW1lIEkgbWFrZSB3b3JsZCBz byByb2xsaW5nCj5iYWNrIGlzIGZhaXJseSBlYXN5CgpVbnRpbCB5b3UgZ2V0IHRoZSBkcmVhZGVk ICJaRlM6IGkvbyBlcnJvciAtIGFsbCBibG9jayBjb3BpZXMgdW5hdmFpbGFibGUiCmVycm9yIGZy b20gdGhlIGJvb3Rsb2FkZXIuwqAgVGhhdCBhcHBlYXJzIHRvIG9jY3VyIGF0IHRoZSBwb29sIGxl dmVsIHNvCm11bHRpcGxlIEJFcyB3aXRoaW4gdGhlIHBvb2wgbWF5IG5vdCBoZWxwLgoKVGhhdCBz YWlkLCB5b3VyIGFwcHJvYWNoIGxvb2tzIGludGVyZXN0aW5nIC0gaGF2ZSB5b3Ugd3JpdHRlbiB1 cCB3aGF0CnlvdSBkbyBhbmQgaGF2ZSB0cmllZCBib290aW5nIGZyb20gZGlmZmVyZW50IHJvb3Rz PwoKLS0gClBldGVyIEplcmVteQo= From owner-freebsd-fs@FreeBSD.ORG Thu Dec 22 12:42:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DBDA51065672 for ; Thu, 22 Dec 2011 12:42:25 +0000 (UTC) (envelope-from hugo@barafranca.com) Received: from mail.barafranca.com (mail.barafranca.com [67.213.67.47]) by mx1.freebsd.org (Postfix) with ESMTP id ABF1B8FC13 for ; Thu, 22 Dec 2011 12:42:25 +0000 (UTC) Received: from localhost (unknown [172.16.100.24]) by mail.barafranca.com (Postfix) with ESMTP id 049D22CD for ; Thu, 22 Dec 2011 12:42:25 +0000 (UTC) X-Virus-Scanned: amavisd-new at barafranca.com Received: from mail.barafranca.com ([172.16.100.24]) by localhost (mail.barafranca.com [172.16.100.24]) (amavisd-new, port 10024) with ESMTP id CWimHNoNVXtD for ; Thu, 22 Dec 2011 12:41:46 +0000 (UTC) Received: from [192.168.1.1] (static-b4-252-232.telepac.pt [81.193.252.232]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.barafranca.com (Postfix) with ESMTPSA id 8F1F42BD for ; Thu, 22 Dec 2011 12:41:46 +0000 (UTC) Message-ID: <4EF32583.9000000@barafranca.com> Date: Thu, 22 Dec 2011 12:41:39 +0000 From: Hugo Silva User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4EEF321E.5090806@barafranca.com> <20111220075714.GA35787@server.vk2pj.dyndns.org> <20111221221527.GA83643@server.vk2pj.dyndns.org> In-Reply-To: <20111221221527.GA83643@server.vk2pj.dyndns.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: ZFS: root pool considerations, multiple pools on the same disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2011 12:42:25 -0000 On 12/21/11 22:15, Peter Jeremy wrote: > Until you get the dreaded "ZFS: i/o error - all block copies unavailable" > error from the bootloader. That appears to occur at the pool level so > multiple BEs within the pool may not help. > > That said, your approach looks interesting - have you written up what > you do and have tried booting from different roots? > I came across this during initial tests but could never reproduce it again. Is there a root cause? I could have reinstalled, being a new/test system and all, but a few commands (I forgot which by now) from the mfsbsd cd did the trick. Still, I never did get to the bottom of why it wouldn't boot. To all others: Thanks for the input and suggestions/clarifications so far. Regards, Hugo From owner-freebsd-fs@FreeBSD.ORG Thu Dec 22 21:47:29 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DFAD2106564A; Thu, 22 Dec 2011 21:47:29 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 2A4A18FC17; Thu, 22 Dec 2011 21:47:29 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 658472A298B6; Thu, 22 Dec 2011 22:47:28 +0100 (CET) Date: Thu, 22 Dec 2011 22:47:28 +0100 From: Ed Schouten To: pjd@FreeBSD.org, freebsd-fs@FreeBSD.org Message-ID: <20111222214728.GV1771@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="H23uHpCUqgUcHMpK" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2011 21:47:30 -0000 --H23uHpCUqgUcHMpK Content-Type: multipart/mixed; boundary="rjqqsQBSnnHPMzvu" Content-Disposition: inline --rjqqsQBSnnHPMzvu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi all, As some of you may know, the upcoming C standard has support for atomic operations. Looking at how they implemented it, it seems that they tried to keep it somehow compatible with existing compiler standards. For example, one could implement a poor-man's version of it as follows: | #define _Atomic(T) struct { pthread_mutex_t m; T v; } | | #define ATOMIC_VAR_INIT(value) { PTHREAD_MUTEX_INITIALIZER, (value) } |=20 | #define atomic_store(object, value) do { | pthread_mutex_lock(&(object)->m); | (object)->v =3D (value); | pthread_mutex_unlock(&(object)->m); | } while(0) | | ... Voila; atomics! Just out of curiosity, I did some experiments with this, where I have a that works with both Clang and GCC (except ARM and MIPS, for some reason). My first test subject: . It seems that porting to use the ISO C1X atomic operations is a bit hard, as the refcount_* functions operate on volatile u_ints instead of some opaque object. Looking ahead, I think it would be a good idea to already add a typedef from volatile uint to refcount_t and already merge that back to 9. That way it won't matter how we actually implement by the time C1X is finalised. The reason why I'm emailing this to fs@, is because this change breaks one of the existing file system drivers, namely ZFS. Solaris also implements a refcount_t, but unlike FreeBSD's, it has a more complex API and is 64-bits in size. Still, I suspect it's hard to overflow a 32-bit reference counter, right? Even if it is, we can fix this in the long run by making refcount_t a truly opaque object of type u_long. Can any of you ZFS user please try the following patch? Do any of you object if I commit it to SVN and merge it in a couple of months from now? Thanks, --=20 Ed Schouten WWW: http://80386.nl/ --rjqqsQBSnnHPMzvu Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="refcount.diff" Content-Transfer-Encoding: quoted-printable Index: share/man/man9/refcount.9 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- share/man/man9/refcount.9 (revision 228798) +++ share/man/man9/refcount.9 (working copy) @@ -39,11 +39,11 @@ .In sys/param.h .In sys/refcount.h .Ft void -.Fn refcount_init "volatile u_int *count, u_int value" +.Fn refcount_init "refcount_t *count, u_int value" .Ft void -.Fn refcount_acquire "volatile u_int *count" +.Fn refcount_acquire "refcount_t *count" .Ft int -.Fn refcount_release "volatile u_int *count" +.Fn refcount_release "refcount_t *count" .Sh DESCRIPTION The .Nm Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (revision= 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (working = copy) @@ -31,10 +31,6 @@ #include #include =20 -#ifdef __cplusplus -extern "C" { -#endif - /* * If the reference is held only by the calling function and not any * particular object, use FTAG (which is a string) for the holder_tag. @@ -42,68 +38,21 @@ */ #define FTAG ((char *)__func__) =20 -#ifdef ZFS_DEBUG -typedef struct reference { - list_node_t ref_link; - void *ref_holder; - uint64_t ref_number; - uint8_t *ref_removed; -} reference_t; - -typedef struct refcount { - kmutex_t rc_mtx; - list_t rc_list; - list_t rc_removed; - int64_t rc_count; - int64_t rc_removed_count; -} refcount_t; - -/* Note: refcount_t must be initialized with refcount_create() */ - -void refcount_create(refcount_t *rc); -void refcount_destroy(refcount_t *rc); -void refcount_destroy_many(refcount_t *rc, uint64_t number); -int refcount_is_zero(refcount_t *rc); -int64_t refcount_count(refcount_t *rc); -int64_t refcount_add(refcount_t *rc, void *holder_tag); -int64_t refcount_remove(refcount_t *rc, void *holder_tag); -int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_ta= g); -int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder= _tag); -void refcount_transfer(refcount_t *dst, refcount_t *src); - -void refcount_sysinit(void); -void refcount_fini(void); - -#else /* ZFS_DEBUG */ - -typedef struct refcount { - uint64_t rc_count; -} refcount_t; - -#define refcount_create(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy_many(rc, number) ((rc)->rc_count =3D 0) -#define refcount_is_zero(rc) ((rc)->rc_count =3D=3D 0) -#define refcount_count(rc) ((rc)->rc_count) -#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1) -#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1) +#define refcount_create(rc) refcount_init(rc, 0) +#define refcount_destroy(rc) refcount_init(rc, 0) +#define refcount_destroy_many(rc, number) refcount_init(rc, 0) +#define refcount_is_zero(rc) ((*rc) =3D=3D 0) +#define refcount_count(rc) (uint64_t)(*rc) +#define refcount_add(rc, holder) refcount_add_many(rc, 1, holder) +#define refcount_remove(rc, holder) refcount_remove_many(rc, 1, holder) #define refcount_add_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, number) + (uint64_t)(atomic_fetchadd_int(rc, number) + (number)) #define refcount_remove_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, -number) -#define refcount_transfer(dst, src) { \ - uint64_t __tmp =3D (src)->rc_count; \ - atomic_add_64(&(src)->rc_count, -__tmp); \ - atomic_add_64(&(dst)->rc_count, __tmp); \ -} + (uint64_t)(atomic_fetchadd_int(rc, -(number)) - (number)) +#define refcount_transfer(dst, src) \ + atomic_add_int(dst, atomic_readandclear_int(src)) =20 #define refcount_sysinit() #define refcount_fini() =20 -#endif /* ZFS_DEBUG */ - -#ifdef __cplusplus -} -#endif - #endif /* _SYS_REFCOUNT_H */ Index: sys/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/refcount.h (revision 228798) +++ sys/sys/refcount.h (working copy) @@ -40,22 +40,24 @@ #define KASSERT(exp, msg) /* */ #endif =20 +typedef volatile u_int refcount_t; + static __inline void -refcount_init(volatile u_int *count, u_int value) +refcount_init(refcount_t *count, u_int value) { =20 *count =3D value; } =20 static __inline void -refcount_acquire(volatile u_int *count) +refcount_acquire(refcount_t *count) { =20 atomic_add_acq_int(count, 1);=09 } =20 static __inline int -refcount_release(volatile u_int *count) +refcount_release(refcount_t *count) { u_int old; =20 --rjqqsQBSnnHPMzvu-- --H23uHpCUqgUcHMpK Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO86VwAAoJEG5e2P40kaK7F5MP/1Ott3zXSvYyRGKpn0t/SJ8w k8fYEkwA2DHjfPBwt3TGjAK5TK4OIycHfncIfLFSU0c/geDodnWk10ppsESTguF5 KO8dyw+X8ocfITeCNJ7ngoicdUGOeyvi1m9rMfogb2SX5tt+/p+i46x19ogAogj+ hnqKtT9PFolw7Up6nT2gSeWRjdiVJm+7rGmY2Ult3YXf9vm4b0lwGmcqxvPph5jr n6C9dZBmnsy0x9ULfj3z46ec+5rJnokJSSZQPWBpKi9m3McWSGSIBieWnkANqTTt rJpzdy5lihwNYf0xVWpiMwHeNEt2YNkto1xS07Mc+G87goCqo2rwyNaHmkq7Tde6 QMrNWFDDPumFO/JuTEdO0C2P2DVqZ1AA+fa0MBqehaAZur0SUnsxqIOoHdsonZi+ WlQjnmWkvJeqDsUY6PisVocNSwmTIEp1c+/xHvnLHyQ5ddPgg/hbMFkZgYlJDcbl TiHxEUZ6IK2R1ir8jL/AaU57upbKBNM566XOuE9Tp+YLyGdpL7Orwyt8Wt3+PZk4 neMUEFfNIR/bCd3+IggHuTglH3DHX6zdIJ7ksHDPlYkmjkyZEbY2EzmCVfUrxU8O mfV5RSfApXbxfDlUunnsQmzYMnQct2gf2osVQ1u92wFiYv/upFWiAROrFw/4nkW5 AM4xGJwMde74jHRzzz+t =8Fi0 -----END PGP SIGNATURE----- --H23uHpCUqgUcHMpK-- From owner-freebsd-fs@FreeBSD.ORG Thu Dec 22 23:07:58 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3169B106566C; Thu, 22 Dec 2011 23:07:58 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 21DD08FC0C; Thu, 22 Dec 2011 23:07:57 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 87A172A28E83; Fri, 23 Dec 2011 00:07:56 +0100 (CET) Date: Fri, 23 Dec 2011 00:07:56 +0100 From: Ed Schouten To: pjd@FreeBSD.org, freebsd-fs@FreeBSD.org Message-ID: <20111222230756.GW1771@hoeg.nl> References: <20111222214728.GV1771@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="A2lxmwzFepEIu8KR" Content-Disposition: inline In-Reply-To: <20111222214728.GV1771@hoeg.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2011 23:07:58 -0000 --A2lxmwzFepEIu8KR Content-Type: multipart/mixed; boundary="XbyeA6tlA6gsyQxQ" Content-Disposition: inline --XbyeA6tlA6gsyQxQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * Ed Schouten , 20111222 22:47: > Can any of you ZFS user please try the following patch? Whoops. It seems I forgot the ZFS source code is also built in userspace (libzpool). I have attached a new patch. Changes: - Remove the refcount.c file, as it has no use anymore. - Port the rrwlock code to not use refcount_t at all. It accesses internal members of the refcount_t and it seems we don't need to use atomics here anyway, as all operations on the counters are performed while holding the rr_lock. --=20 Ed Schouten WWW: http://80386.nl/ --XbyeA6tlA6gsyQxQ Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="refcount.diff" Content-Transfer-Encoding: quoted-printable Index: share/man/man9/refcount.9 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- share/man/man9/refcount.9 (revision 228798) +++ share/man/man9/refcount.9 (working copy) @@ -39,11 +39,11 @@ .In sys/param.h .In sys/refcount.h .Ft void -.Fn refcount_init "volatile u_int *count, u_int value" +.Fn refcount_init "refcount_t *count, u_int value" .Ft void -.Fn refcount_acquire "volatile u_int *count" +.Fn refcount_acquire "refcount_t *count" .Ft int -.Fn refcount_release "volatile u_int *count" +.Fn refcount_release "refcount_t *count" .Sh DESCRIPTION The .Nm Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (revision 2287= 98) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (working copy) @@ -23,7 +23,6 @@ * Use is subject to license terms. */ =20 -#include #include =20 /* @@ -81,7 +80,7 @@ { rrw_node_t *rn; =20 - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) + if (rrl->rr_linked_rcount =3D=3D 0) return (NULL); =20 for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { @@ -115,7 +114,7 @@ rrw_node_t *rn; rrw_node_t *prev =3D NULL; =20 - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) + if (rrl->rr_linked_rcount =3D=3D 0) return (B_FALSE); =20 for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { @@ -138,8 +137,8 @@ mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL); rrl->rr_writer =3D NULL; - refcount_create(&rrl->rr_anon_rcount); - refcount_create(&rrl->rr_linked_rcount); + rrl->rr_anon_rcount =3D 0; + rrl->rr_linked_rcount =3D 0; rrl->rr_writer_wanted =3D B_FALSE; } =20 @@ -149,8 +148,8 @@ mutex_destroy(&rrl->rr_lock); cv_destroy(&rrl->rr_cv); ASSERT(rrl->rr_writer =3D=3D NULL); - refcount_destroy(&rrl->rr_anon_rcount); - refcount_destroy(&rrl->rr_linked_rcount); + rrl->rr_anon_rcount =3D 0; + rrl->rr_linked_rcount =3D 0; } =20 static void @@ -159,26 +158,26 @@ mutex_enter(&rrl->rr_lock); #if !defined(DEBUG) && defined(_KERNEL) if (!rrl->rr_writer && !rrl->rr_writer_wanted) { - rrl->rr_anon_rcount.rc_count++; + rrl->rr_anon_rcount++; mutex_exit(&rrl->rr_lock); return; } DTRACE_PROBE(zfs__rrwfastpath__rdmiss); #endif ASSERT(rrl->rr_writer !=3D curthread); - ASSERT(refcount_count(&rrl->rr_anon_rcount) >=3D 0); + ASSERT(rrl->rr_anon_rcount >=3D 0); =20 while (rrl->rr_writer || (rrl->rr_writer_wanted && - refcount_is_zero(&rrl->rr_anon_rcount) && + rrl->rr_anon_rcount =3D=3D 0 && rrn_find(rrl) =3D=3D NULL)) cv_wait(&rrl->rr_cv, &rrl->rr_lock); =20 if (rrl->rr_writer_wanted) { /* may or may not be a re-entrant enter */ rrn_add(rrl); - (void) refcount_add(&rrl->rr_linked_rcount, tag); + rrl->rr_linked_rcount++; } else { - (void) refcount_add(&rrl->rr_anon_rcount, tag); + rrl->rr_anon_rcount++; } ASSERT(rrl->rr_writer =3D=3D NULL); mutex_exit(&rrl->rr_lock); @@ -190,8 +189,8 @@ mutex_enter(&rrl->rr_lock); ASSERT(rrl->rr_writer !=3D curthread); =20 - while (refcount_count(&rrl->rr_anon_rcount) > 0 || - refcount_count(&rrl->rr_linked_rcount) > 0 || + while (rrl->rr_anon_rcount > 0 || + rrl->rr_linked_rcount > 0 || rrl->rr_writer !=3D NULL) { rrl->rr_writer_wanted =3D B_TRUE; cv_wait(&rrl->rr_cv, &rrl->rr_lock); @@ -224,22 +223,22 @@ } DTRACE_PROBE(zfs__rrwfastpath__exitmiss); #endif - ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) || - !refcount_is_zero(&rrl->rr_linked_rcount) || + ASSERT(rrl->rr_anon_rcount !=3D 0 || + rrl->rr_linked_rcount !=3D 0 || rrl->rr_writer !=3D NULL); =20 if (rrl->rr_writer =3D=3D NULL) { - int64_t count; + uint64_t count; if (rrn_find_and_remove(rrl)) - count =3D refcount_remove(&rrl->rr_linked_rcount, tag); + count =3D rrl->rr_linked_rcount--; else - count =3D refcount_remove(&rrl->rr_anon_rcount, tag); + count =3D rrl->rr_anon_rcount--; if (count =3D=3D 0) cv_broadcast(&rrl->rr_cv); } else { ASSERT(rrl->rr_writer =3D=3D curthread); - ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) && - refcount_is_zero(&rrl->rr_linked_rcount)); + ASSERT(rrl->rr_anon_rcount =3D=3D 0 && + rrl->rr_linked_rcount =3D=3D 0); rrl->rr_writer =3D NULL; cv_broadcast(&rrl->rr_cv); } @@ -255,8 +254,8 @@ if (rw =3D=3D RW_WRITER) { held =3D (rrl->rr_writer =3D=3D curthread); } else { - held =3D (!refcount_is_zero(&rrl->rr_anon_rcount) || - !refcount_is_zero(&rrl->rr_linked_rcount)); + held =3D (rrl->rr_anon_rcount !=3D 0 || + rrl->rr_linked_rcount !=3D 0); } mutex_exit(&rrl->rr_lock); =20 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (revision= 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (working = copy) @@ -31,10 +31,6 @@ #include #include =20 -#ifdef __cplusplus -extern "C" { -#endif - /* * If the reference is held only by the calling function and not any * particular object, use FTAG (which is a string) for the holder_tag. @@ -42,68 +38,21 @@ */ #define FTAG ((char *)__func__) =20 -#ifdef ZFS_DEBUG -typedef struct reference { - list_node_t ref_link; - void *ref_holder; - uint64_t ref_number; - uint8_t *ref_removed; -} reference_t; - -typedef struct refcount { - kmutex_t rc_mtx; - list_t rc_list; - list_t rc_removed; - int64_t rc_count; - int64_t rc_removed_count; -} refcount_t; - -/* Note: refcount_t must be initialized with refcount_create() */ - -void refcount_create(refcount_t *rc); -void refcount_destroy(refcount_t *rc); -void refcount_destroy_many(refcount_t *rc, uint64_t number); -int refcount_is_zero(refcount_t *rc); -int64_t refcount_count(refcount_t *rc); -int64_t refcount_add(refcount_t *rc, void *holder_tag); -int64_t refcount_remove(refcount_t *rc, void *holder_tag); -int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_ta= g); -int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder= _tag); -void refcount_transfer(refcount_t *dst, refcount_t *src); - -void refcount_sysinit(void); -void refcount_fini(void); - -#else /* ZFS_DEBUG */ - -typedef struct refcount { - uint64_t rc_count; -} refcount_t; - -#define refcount_create(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy_many(rc, number) ((rc)->rc_count =3D 0) -#define refcount_is_zero(rc) ((rc)->rc_count =3D=3D 0) -#define refcount_count(rc) ((rc)->rc_count) -#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1) -#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1) +#define refcount_create(rc) refcount_init(rc, 0) +#define refcount_destroy(rc) refcount_init(rc, 0) +#define refcount_destroy_many(rc, number) refcount_init(rc, 0) +#define refcount_is_zero(rc) ((*rc) =3D=3D 0) +#define refcount_count(rc) (uint64_t)(*rc) +#define refcount_add(rc, holder) refcount_add_many(rc, 1, holder) +#define refcount_remove(rc, holder) refcount_remove_many(rc, 1, holder) #define refcount_add_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, number) + (uint64_t)(atomic_fetchadd_int(rc, number) + (number)) #define refcount_remove_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, -number) -#define refcount_transfer(dst, src) { \ - uint64_t __tmp =3D (src)->rc_count; \ - atomic_add_64(&(src)->rc_count, -__tmp); \ - atomic_add_64(&(dst)->rc_count, __tmp); \ -} + (uint64_t)(atomic_fetchadd_int(rc, -(number)) - (number)) +#define refcount_transfer(dst, src) \ + atomic_add_int(dst, atomic_readandclear_int(src)) =20 #define refcount_sysinit() #define refcount_fini() =20 -#endif /* ZFS_DEBUG */ - -#ifdef __cplusplus -} -#endif - #endif /* _SYS_REFCOUNT_H */ Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (revision = 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (working c= opy) @@ -33,7 +33,6 @@ #endif =20 #include -#include =20 /* * A reader-writer lock implementation that allows re-entrant reads, but @@ -53,8 +52,8 @@ kmutex_t rr_lock; kcondvar_t rr_cv; kthread_t *rr_writer; - refcount_t rr_anon_rcount; - refcount_t rr_linked_rcount; + uint64_t rr_anon_rcount; + uint64_t rr_linked_rcount; boolean_t rr_writer_wanted; } rrwlock_t; =20 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (revision 228= 798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (working copy) @@ -1,223 +0,0 @@ -/* - * CDDL HEADER START - * - * The contents of this file are subject to the terms of the - * Common Development and Distribution License (the "License"). - * You may not use this file except in compliance with the License. - * - * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE - * or http://www.opensolaris.org/os/licensing. - * See the License for the specific language governing permissions - * and limitations under the License. - * - * When distributing Covered Code, include this CDDL HEADER in each - * file and include the License file at usr/src/OPENSOLARIS.LICENSE. - * If applicable, add the following below this CDDL HEADER, with the - * fields enclosed by brackets "[]" replaced with your own identifying - * information: Portions Copyright [yyyy] [name of copyright owner] - * - * CDDL HEADER END - */ -/* - * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights rese= rved. - */ - -#include -#include - -#ifdef ZFS_DEBUG - -#ifdef _KERNEL -int reference_tracking_enable =3D FALSE; /* runs out of memory too easily = */ -#else -int reference_tracking_enable =3D TRUE; -#endif -int reference_history =3D 4; /* tunable */ - -static kmem_cache_t *reference_cache; -static kmem_cache_t *reference_history_cache; - -void -refcount_sysinit(void) -{ - reference_cache =3D kmem_cache_create("reference_cache", - sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0); - - reference_history_cache =3D kmem_cache_create("reference_history_cache", - sizeof (uint64_t), 0, NULL, NULL, NULL, NULL, NULL, 0); -} - -void -refcount_fini(void) -{ - kmem_cache_destroy(reference_cache); - kmem_cache_destroy(reference_history_cache); -} - -void -refcount_create(refcount_t *rc) -{ - mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL); - list_create(&rc->rc_list, sizeof (reference_t), - offsetof(reference_t, ref_link)); - list_create(&rc->rc_removed, sizeof (reference_t), - offsetof(reference_t, ref_link)); - rc->rc_count =3D 0; - rc->rc_removed_count =3D 0; -} - -void -refcount_destroy_many(refcount_t *rc, uint64_t number) -{ - reference_t *ref; - - ASSERT(rc->rc_count =3D=3D number); - while (ref =3D list_head(&rc->rc_list)) { - list_remove(&rc->rc_list, ref); - kmem_cache_free(reference_cache, ref); - } - list_destroy(&rc->rc_list); - - while (ref =3D list_head(&rc->rc_removed)) { - list_remove(&rc->rc_removed, ref); - kmem_cache_free(reference_history_cache, ref->ref_removed); - kmem_cache_free(reference_cache, ref); - } - list_destroy(&rc->rc_removed); - mutex_destroy(&rc->rc_mtx); -} - -void -refcount_destroy(refcount_t *rc) -{ - refcount_destroy_many(rc, 0); -} - -int -refcount_is_zero(refcount_t *rc) -{ - ASSERT(rc->rc_count >=3D 0); - return (rc->rc_count =3D=3D 0); -} - -int64_t -refcount_count(refcount_t *rc) -{ - ASSERT(rc->rc_count >=3D 0); - return (rc->rc_count); -} - -int64_t -refcount_add_many(refcount_t *rc, uint64_t number, void *holder) -{ - reference_t *ref; - int64_t count; - - if (reference_tracking_enable) { - ref =3D kmem_cache_alloc(reference_cache, KM_SLEEP); - ref->ref_holder =3D holder; - ref->ref_number =3D number; - } - mutex_enter(&rc->rc_mtx); - ASSERT(rc->rc_count >=3D 0); - if (reference_tracking_enable) - list_insert_head(&rc->rc_list, ref); - rc->rc_count +=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - - return (count); -} - -int64_t -refcount_add(refcount_t *rc, void *holder) -{ - return (refcount_add_many(rc, 1, holder)); -} - -int64_t -refcount_remove_many(refcount_t *rc, uint64_t number, void *holder) -{ - reference_t *ref; - int64_t count; - - mutex_enter(&rc->rc_mtx); - ASSERT(rc->rc_count >=3D number); - - if (!reference_tracking_enable) { - rc->rc_count -=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - return (count); - } - - for (ref =3D list_head(&rc->rc_list); ref; - ref =3D list_next(&rc->rc_list, ref)) { - if (ref->ref_holder =3D=3D holder && ref->ref_number =3D=3D number) { - list_remove(&rc->rc_list, ref); - if (reference_history > 0) { - ref->ref_removed =3D - kmem_cache_alloc(reference_history_cache, - KM_SLEEP); - list_insert_head(&rc->rc_removed, ref); - rc->rc_removed_count++; - if (rc->rc_removed_count >=3D reference_history) { - ref =3D list_tail(&rc->rc_removed); - list_remove(&rc->rc_removed, ref); - kmem_cache_free(reference_history_cache, - ref->ref_removed); - kmem_cache_free(reference_cache, ref); - rc->rc_removed_count--; - } - } else { - kmem_cache_free(reference_cache, ref); - } - rc->rc_count -=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - return (count); - } - } - panic("No such hold %p on refcount %llx", holder, - (u_longlong_t)(uintptr_t)rc); - return (-1); -} - -int64_t -refcount_remove(refcount_t *rc, void *holder) -{ - return (refcount_remove_many(rc, 1, holder)); -} - -void -refcount_transfer(refcount_t *dst, refcount_t *src) -{ - int64_t count, removed_count; - list_t list, removed; - - list_create(&list, sizeof (reference_t), - offsetof(reference_t, ref_link)); - list_create(&removed, sizeof (reference_t), - offsetof(reference_t, ref_link)); - - mutex_enter(&src->rc_mtx); - count =3D src->rc_count; - removed_count =3D src->rc_removed_count; - src->rc_count =3D 0; - src->rc_removed_count =3D 0; - list_move_tail(&list, &src->rc_list); - list_move_tail(&removed, &src->rc_removed); - mutex_exit(&src->rc_mtx); - - mutex_enter(&dst->rc_mtx); - dst->rc_count +=3D count; - dst->rc_removed_count +=3D removed_count; - list_move_tail(&dst->rc_list, &list); - list_move_tail(&dst->rc_removed, &removed); - mutex_exit(&dst->rc_mtx); - - list_destroy(&list); - list_destroy(&removed); -} - -#endif /* ZFS_DEBUG */ Index: sys/cddl/contrib/opensolaris/uts/common/Makefile.files =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/Makefile.files (revision 228798) +++ sys/cddl/contrib/opensolaris/uts/common/Makefile.files (working copy) @@ -55,7 +55,6 @@ gzip.o \ lzjb.o \ metaslab.o \ - refcount.o \ sa.o \ sha256.o \ spa.o \ Index: sys/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/refcount.h (revision 228798) +++ sys/sys/refcount.h (working copy) @@ -40,22 +40,24 @@ #define KASSERT(exp, msg) /* */ #endif =20 +typedef volatile u_int refcount_t; + static __inline void -refcount_init(volatile u_int *count, u_int value) +refcount_init(refcount_t *count, u_int value) { =20 *count =3D value; } =20 static __inline void -refcount_acquire(volatile u_int *count) +refcount_acquire(refcount_t *count) { =20 atomic_add_acq_int(count, 1);=09 } =20 static __inline int -refcount_release(volatile u_int *count) +refcount_release(refcount_t *count) { u_int old; =20 --XbyeA6tlA6gsyQxQ-- --A2lxmwzFepEIu8KR Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO87hMAAoJEG5e2P40kaK7QlsP/2IwTXHnvluHchYhqvRbWLfK U3wfk6PvmgoKMrXbnGTXuusTrg9I6fb1BtxkppVbe1U95G6gGZSwuejgeRqATRWV B2mDE+IOXRtOx159v7fqO16JHEDGNEHggMBo4WFI/g/YZXg8uX6MY5LC05Zwmj0L lPEX+SnNEUdeG977FM8MOIPxGghf02P+HXxux33/i2GFUV4gtB4CByVSOS50EGms YZXnhVJ+5MO6dxQkgg4ccc5ToRURKqL9i1uv6ziPlJQlLpH7dL4Bm4aECSPWbaKw StTGh9dg5zCQiMyQw6APPWmxzUoGYap7RDk/Vs7Wn39afevxUnDeATdkVXO2uq1X Io0W4v6yI1AkBxRsw+npPgRajTo5QxNAbZcdthDJJ17nrq4Pz9waKlKm1CRv9ibC Y9q6jj3qqLOoN55wQjTNyrqXSnh3/RRjej+rdyLZQEKDkajrpmojP8Y8zkDoh5Wn U6qgbqXKZmm+BVUJ1AMos+aXl03Khm+Eaq+lyl/W0b5w4Q33ic62REt3iXPiprF9 spohB0jF5izDZQttSooshfWddlwRwyGTMIsB6LhBvoqhQp53xMEmXZvHX5oi5kja vc6Itf9vVMPTLr+HTjnWRDy4XlMyiCpD0DXRAtFvsHRVjBbGEpD4MONMHoeShwRM stti4oTS9rlMjlJBhUtr =tHix -----END PGP SIGNATURE----- --A2lxmwzFepEIu8KR-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 05:10:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A6D9106564A; Fri, 23 Dec 2011 05:10:28 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4472F8FC0C; Fri, 23 Dec 2011 05:10:20 +0000 (UTC) Received: by ghrr16 with SMTP id r16so3171614ghr.13 for ; Thu, 22 Dec 2011 21:10:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to; bh=PLOyoyQuDiW+MmL1UJvARSaw4/cxopNOT1ccL10mb/c=; b=giqDqLbpH6pyc5zQPXFDDeVfG9tim03VHO89Ad95Rp0KmQAKExAuYxTg4vHKavyzKd vgTr4FPwQ9bDy715cmec2e2bQxDHJp6OOOyBwO+LhLTq6JvIv8iaI+Rth0SNyAlkBOsl Djdqgk5c6DyNZFNkJhbUzWcSHClTqJkwbt6Jw= Received: by 10.236.177.67 with SMTP id c43mr18696682yhm.54.1324617019819; Thu, 22 Dec 2011 21:10:19 -0800 (PST) Received: from DataIX.net (24-247-9-230.dhcp.aldl.mi.charter.com. [24.247.9.230]) by mx.google.com with ESMTPS id y58sm18559711yhi.17.2011.12.22.21.10.17 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 22 Dec 2011 21:10:18 -0800 (PST) Sender: Jason Hellenthal Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.5/8.14.5) with ESMTP id pBN5AE08077544 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 23 Dec 2011 00:10:15 -0500 (EST) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.5/8.14.5/Submit) id pBN5A9a7077502; Fri, 23 Dec 2011 00:10:09 -0500 (EST) (envelope-from jhell@DataIX.net) Date: Fri, 23 Dec 2011 00:10:09 -0500 From: Jason Hellenthal To: Ed Schouten Message-ID: <20111223051009.GA77353@DataIX.net> References: <20111222214728.GV1771@hoeg.nl> <20111222230756.GW1771@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="l76fUT7nc3MelDdI" Content-Disposition: inline In-Reply-To: <20111222230756.GW1771@hoeg.nl> Cc: freebsd-fs@freebsd.org, pjd@freebsd.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 05:10:28 -0000 --l76fUT7nc3MelDdI Content-Type: multipart/mixed; boundary="Q68bSM7Ycu6FN28Q" Content-Disposition: inline --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Nice work. So all-in-all we need both patches concatenated like so...? On Fri, Dec 23, 2011 at 12:07:56AM +0100, Ed Schouten wrote: > * Ed Schouten , 20111222 22:47: > > Can any of you ZFS user please try the following patch? >=20 > Whoops. It seems I forgot the ZFS source code is also built in userspace > (libzpool). I have attached a new patch. Changes: >=20 > - Remove the refcount.c file, as it has no use anymore. > - Port the rrwlock code to not use refcount_t at all. It accesses > internal members of the refcount_t and it seems we don't need to use > atomics here anyway, as all operations on the counters are performed > while holding the rr_lock. >=20 > --=20 > Ed Schouten > WWW: http://80386.nl/ > Index: share/man/man9/refcount.9 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- share/man/man9/refcount.9 (revision 228798) > +++ share/man/man9/refcount.9 (working copy) > @@ -39,11 +39,11 @@ > .In sys/param.h > .In sys/refcount.h > .Ft void > -.Fn refcount_init "volatile u_int *count, u_int value" > +.Fn refcount_init "refcount_t *count, u_int value" > .Ft void > -.Fn refcount_acquire "volatile u_int *count" > +.Fn refcount_acquire "refcount_t *count" > .Ft int > -.Fn refcount_release "volatile u_int *count" > +.Fn refcount_release "refcount_t *count" > .Sh DESCRIPTION > The > .Nm > Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (revision 22= 8798) > +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (working cop= y) > @@ -23,7 +23,6 @@ > * Use is subject to license terms. > */ > =20 > -#include > #include > =20 > /* > @@ -81,7 +80,7 @@ > { > rrw_node_t *rn; > =20 > - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) > + if (rrl->rr_linked_rcount =3D=3D 0) > return (NULL); > =20 > for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { > @@ -115,7 +114,7 @@ > rrw_node_t *rn; > rrw_node_t *prev =3D NULL; > =20 > - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) > + if (rrl->rr_linked_rcount =3D=3D 0) > return (B_FALSE); > =20 > for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { > @@ -138,8 +137,8 @@ > mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL); > cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL); > rrl->rr_writer =3D NULL; > - refcount_create(&rrl->rr_anon_rcount); > - refcount_create(&rrl->rr_linked_rcount); > + rrl->rr_anon_rcount =3D 0; > + rrl->rr_linked_rcount =3D 0; > rrl->rr_writer_wanted =3D B_FALSE; > } > =20 > @@ -149,8 +148,8 @@ > mutex_destroy(&rrl->rr_lock); > cv_destroy(&rrl->rr_cv); > ASSERT(rrl->rr_writer =3D=3D NULL); > - refcount_destroy(&rrl->rr_anon_rcount); > - refcount_destroy(&rrl->rr_linked_rcount); > + rrl->rr_anon_rcount =3D 0; > + rrl->rr_linked_rcount =3D 0; > } > =20 > static void > @@ -159,26 +158,26 @@ > mutex_enter(&rrl->rr_lock); > #if !defined(DEBUG) && defined(_KERNEL) > if (!rrl->rr_writer && !rrl->rr_writer_wanted) { > - rrl->rr_anon_rcount.rc_count++; > + rrl->rr_anon_rcount++; > mutex_exit(&rrl->rr_lock); > return; > } > DTRACE_PROBE(zfs__rrwfastpath__rdmiss); > #endif > ASSERT(rrl->rr_writer !=3D curthread); > - ASSERT(refcount_count(&rrl->rr_anon_rcount) >=3D 0); > + ASSERT(rrl->rr_anon_rcount >=3D 0); > =20 > while (rrl->rr_writer || (rrl->rr_writer_wanted && > - refcount_is_zero(&rrl->rr_anon_rcount) && > + rrl->rr_anon_rcount =3D=3D 0 && > rrn_find(rrl) =3D=3D NULL)) > cv_wait(&rrl->rr_cv, &rrl->rr_lock); > =20 > if (rrl->rr_writer_wanted) { > /* may or may not be a re-entrant enter */ > rrn_add(rrl); > - (void) refcount_add(&rrl->rr_linked_rcount, tag); > + rrl->rr_linked_rcount++; > } else { > - (void) refcount_add(&rrl->rr_anon_rcount, tag); > + rrl->rr_anon_rcount++; > } > ASSERT(rrl->rr_writer =3D=3D NULL); > mutex_exit(&rrl->rr_lock); > @@ -190,8 +189,8 @@ > mutex_enter(&rrl->rr_lock); > ASSERT(rrl->rr_writer !=3D curthread); > =20 > - while (refcount_count(&rrl->rr_anon_rcount) > 0 || > - refcount_count(&rrl->rr_linked_rcount) > 0 || > + while (rrl->rr_anon_rcount > 0 || > + rrl->rr_linked_rcount > 0 || > rrl->rr_writer !=3D NULL) { > rrl->rr_writer_wanted =3D B_TRUE; > cv_wait(&rrl->rr_cv, &rrl->rr_lock); > @@ -224,22 +223,22 @@ > } > DTRACE_PROBE(zfs__rrwfastpath__exitmiss); > #endif > - ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) || > - !refcount_is_zero(&rrl->rr_linked_rcount) || > + ASSERT(rrl->rr_anon_rcount !=3D 0 || > + rrl->rr_linked_rcount !=3D 0 || > rrl->rr_writer !=3D NULL); > =20 > if (rrl->rr_writer =3D=3D NULL) { > - int64_t count; > + uint64_t count; > if (rrn_find_and_remove(rrl)) > - count =3D refcount_remove(&rrl->rr_linked_rcount, tag); > + count =3D rrl->rr_linked_rcount--; > else > - count =3D refcount_remove(&rrl->rr_anon_rcount, tag); > + count =3D rrl->rr_anon_rcount--; > if (count =3D=3D 0) > cv_broadcast(&rrl->rr_cv); > } else { > ASSERT(rrl->rr_writer =3D=3D curthread); > - ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) && > - refcount_is_zero(&rrl->rr_linked_rcount)); > + ASSERT(rrl->rr_anon_rcount =3D=3D 0 && > + rrl->rr_linked_rcount =3D=3D 0); > rrl->rr_writer =3D NULL; > cv_broadcast(&rrl->rr_cv); > } > @@ -255,8 +254,8 @@ > if (rw =3D=3D RW_WRITER) { > held =3D (rrl->rr_writer =3D=3D curthread); > } else { > - held =3D (!refcount_is_zero(&rrl->rr_anon_rcount) || > - !refcount_is_zero(&rrl->rr_linked_rcount)); > + held =3D (rrl->rr_anon_rcount !=3D 0 || > + rrl->rr_linked_rcount !=3D 0); > } > mutex_exit(&rrl->rr_lock); > =20 > Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (revisi= on 228798) > +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (workin= g copy) > @@ -31,10 +31,6 @@ > #include > #include > =20 > -#ifdef __cplusplus > -extern "C" { > -#endif > - > /* > * If the reference is held only by the calling function and not any > * particular object, use FTAG (which is a string) for the holder_tag. > @@ -42,68 +38,21 @@ > */ > #define FTAG ((char *)__func__) > =20 > -#ifdef ZFS_DEBUG > -typedef struct reference { > - list_node_t ref_link; > - void *ref_holder; > - uint64_t ref_number; > - uint8_t *ref_removed; > -} reference_t; > - > -typedef struct refcount { > - kmutex_t rc_mtx; > - list_t rc_list; > - list_t rc_removed; > - int64_t rc_count; > - int64_t rc_removed_count; > -} refcount_t; > - > -/* Note: refcount_t must be initialized with refcount_create() */ > - > -void refcount_create(refcount_t *rc); > -void refcount_destroy(refcount_t *rc); > -void refcount_destroy_many(refcount_t *rc, uint64_t number); > -int refcount_is_zero(refcount_t *rc); > -int64_t refcount_count(refcount_t *rc); > -int64_t refcount_add(refcount_t *rc, void *holder_tag); > -int64_t refcount_remove(refcount_t *rc, void *holder_tag); > -int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_= tag); > -int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *hold= er_tag); > -void refcount_transfer(refcount_t *dst, refcount_t *src); > - > -void refcount_sysinit(void); > -void refcount_fini(void); > - > -#else /* ZFS_DEBUG */ > - > -typedef struct refcount { > - uint64_t rc_count; > -} refcount_t; > - > -#define refcount_create(rc) ((rc)->rc_count =3D 0) > -#define refcount_destroy(rc) ((rc)->rc_count =3D 0) > -#define refcount_destroy_many(rc, number) ((rc)->rc_count =3D 0) > -#define refcount_is_zero(rc) ((rc)->rc_count =3D=3D 0) > -#define refcount_count(rc) ((rc)->rc_count) > -#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1) > -#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1) > +#define refcount_create(rc) refcount_init(rc, 0) > +#define refcount_destroy(rc) refcount_init(rc, 0) > +#define refcount_destroy_many(rc, number) refcount_init(rc, 0) > +#define refcount_is_zero(rc) ((*rc) =3D=3D 0) > +#define refcount_count(rc) (uint64_t)(*rc) > +#define refcount_add(rc, holder) refcount_add_many(rc, 1, holder) > +#define refcount_remove(rc, holder) refcount_remove_many(rc, 1, holder) > #define refcount_add_many(rc, number, holder) \ > - atomic_add_64_nv(&(rc)->rc_count, number) > + (uint64_t)(atomic_fetchadd_int(rc, number) + (number)) > #define refcount_remove_many(rc, number, holder) \ > - atomic_add_64_nv(&(rc)->rc_count, -number) > -#define refcount_transfer(dst, src) { \ > - uint64_t __tmp =3D (src)->rc_count; \ > - atomic_add_64(&(src)->rc_count, -__tmp); \ > - atomic_add_64(&(dst)->rc_count, __tmp); \ > -} > + (uint64_t)(atomic_fetchadd_int(rc, -(number)) - (number)) > +#define refcount_transfer(dst, src) \ > + atomic_add_int(dst, atomic_readandclear_int(src)) > =20 > #define refcount_sysinit() > #define refcount_fini() > =20 > -#endif /* ZFS_DEBUG */ > - > -#ifdef __cplusplus > -} > -#endif > - > #endif /* _SYS_REFCOUNT_H */ > Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (revisio= n 228798) > +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (working= copy) > @@ -33,7 +33,6 @@ > #endif > =20 > #include > -#include > =20 > /* > * A reader-writer lock implementation that allows re-entrant reads, but > @@ -53,8 +52,8 @@ > kmutex_t rr_lock; > kcondvar_t rr_cv; > kthread_t *rr_writer; > - refcount_t rr_anon_rcount; > - refcount_t rr_linked_rcount; > + uint64_t rr_anon_rcount; > + uint64_t rr_linked_rcount; > boolean_t rr_writer_wanted; > } rrwlock_t; > =20 > Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (revision 2= 28798) > +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (working co= py) > @@ -1,223 +0,0 @@ > -/* > - * CDDL HEADER START > - * > - * The contents of this file are subject to the terms of the > - * Common Development and Distribution License (the "License"). > - * You may not use this file except in compliance with the License. > - * > - * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE > - * or http://www.opensolaris.org/os/licensing. > - * See the License for the specific language governing permissions > - * and limitations under the License. > - * > - * When distributing Covered Code, include this CDDL HEADER in each > - * file and include the License file at usr/src/OPENSOLARIS.LICENSE. > - * If applicable, add the following below this CDDL HEADER, with the > - * fields enclosed by brackets "[]" replaced with your own identifying > - * information: Portions Copyright [yyyy] [name of copyright owner] > - * > - * CDDL HEADER END > - */ > -/* > - * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights re= served. > - */ > - > -#include > -#include > - > -#ifdef ZFS_DEBUG > - > -#ifdef _KERNEL > -int reference_tracking_enable =3D FALSE; /* runs out of memory too easil= y */ > -#else > -int reference_tracking_enable =3D TRUE; > -#endif > -int reference_history =3D 4; /* tunable */ > - > -static kmem_cache_t *reference_cache; > -static kmem_cache_t *reference_history_cache; > - > -void > -refcount_sysinit(void) > -{ > - reference_cache =3D kmem_cache_create("reference_cache", > - sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0); > - > - reference_history_cache =3D kmem_cache_create("reference_history_cache", > - sizeof (uint64_t), 0, NULL, NULL, NULL, NULL, NULL, 0); > -} > - > -void > -refcount_fini(void) > -{ > - kmem_cache_destroy(reference_cache); > - kmem_cache_destroy(reference_history_cache); > -} > - > -void > -refcount_create(refcount_t *rc) > -{ > - mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL); > - list_create(&rc->rc_list, sizeof (reference_t), > - offsetof(reference_t, ref_link)); > - list_create(&rc->rc_removed, sizeof (reference_t), > - offsetof(reference_t, ref_link)); > - rc->rc_count =3D 0; > - rc->rc_removed_count =3D 0; > -} > - > -void > -refcount_destroy_many(refcount_t *rc, uint64_t number) > -{ > - reference_t *ref; > - > - ASSERT(rc->rc_count =3D=3D number); > - while (ref =3D list_head(&rc->rc_list)) { > - list_remove(&rc->rc_list, ref); > - kmem_cache_free(reference_cache, ref); > - } > - list_destroy(&rc->rc_list); > - > - while (ref =3D list_head(&rc->rc_removed)) { > - list_remove(&rc->rc_removed, ref); > - kmem_cache_free(reference_history_cache, ref->ref_removed); > - kmem_cache_free(reference_cache, ref); > - } > - list_destroy(&rc->rc_removed); > - mutex_destroy(&rc->rc_mtx); > -} > - > -void > -refcount_destroy(refcount_t *rc) > -{ > - refcount_destroy_many(rc, 0); > -} > - > -int > -refcount_is_zero(refcount_t *rc) > -{ > - ASSERT(rc->rc_count >=3D 0); > - return (rc->rc_count =3D=3D 0); > -} > - > -int64_t > -refcount_count(refcount_t *rc) > -{ > - ASSERT(rc->rc_count >=3D 0); > - return (rc->rc_count); > -} > - > -int64_t > -refcount_add_many(refcount_t *rc, uint64_t number, void *holder) > -{ > - reference_t *ref; > - int64_t count; > - > - if (reference_tracking_enable) { > - ref =3D kmem_cache_alloc(reference_cache, KM_SLEEP); > - ref->ref_holder =3D holder; > - ref->ref_number =3D number; > - } > - mutex_enter(&rc->rc_mtx); > - ASSERT(rc->rc_count >=3D 0); > - if (reference_tracking_enable) > - list_insert_head(&rc->rc_list, ref); > - rc->rc_count +=3D number; > - count =3D rc->rc_count; > - mutex_exit(&rc->rc_mtx); > - > - return (count); > -} > - > -int64_t > -refcount_add(refcount_t *rc, void *holder) > -{ > - return (refcount_add_many(rc, 1, holder)); > -} > - > -int64_t > -refcount_remove_many(refcount_t *rc, uint64_t number, void *holder) > -{ > - reference_t *ref; > - int64_t count; > - > - mutex_enter(&rc->rc_mtx); > - ASSERT(rc->rc_count >=3D number); > - > - if (!reference_tracking_enable) { > - rc->rc_count -=3D number; > - count =3D rc->rc_count; > - mutex_exit(&rc->rc_mtx); > - return (count); > - } > - > - for (ref =3D list_head(&rc->rc_list); ref; > - ref =3D list_next(&rc->rc_list, ref)) { > - if (ref->ref_holder =3D=3D holder && ref->ref_number =3D=3D number) { > - list_remove(&rc->rc_list, ref); > - if (reference_history > 0) { > - ref->ref_removed =3D > - kmem_cache_alloc(reference_history_cache, > - KM_SLEEP); > - list_insert_head(&rc->rc_removed, ref); > - rc->rc_removed_count++; > - if (rc->rc_removed_count >=3D reference_history) { > - ref =3D list_tail(&rc->rc_removed); > - list_remove(&rc->rc_removed, ref); > - kmem_cache_free(reference_history_cache, > - ref->ref_removed); > - kmem_cache_free(reference_cache, ref); > - rc->rc_removed_count--; > - } > - } else { > - kmem_cache_free(reference_cache, ref); > - } > - rc->rc_count -=3D number; > - count =3D rc->rc_count; > - mutex_exit(&rc->rc_mtx); > - return (count); > - } > - } > - panic("No such hold %p on refcount %llx", holder, > - (u_longlong_t)(uintptr_t)rc); > - return (-1); > -} > - > -int64_t > -refcount_remove(refcount_t *rc, void *holder) > -{ > - return (refcount_remove_many(rc, 1, holder)); > -} > - > -void > -refcount_transfer(refcount_t *dst, refcount_t *src) > -{ > - int64_t count, removed_count; > - list_t list, removed; > - > - list_create(&list, sizeof (reference_t), > - offsetof(reference_t, ref_link)); > - list_create(&removed, sizeof (reference_t), > - offsetof(reference_t, ref_link)); > - > - mutex_enter(&src->rc_mtx); > - count =3D src->rc_count; > - removed_count =3D src->rc_removed_count; > - src->rc_count =3D 0; > - src->rc_removed_count =3D 0; > - list_move_tail(&list, &src->rc_list); > - list_move_tail(&removed, &src->rc_removed); > - mutex_exit(&src->rc_mtx); > - > - mutex_enter(&dst->rc_mtx); > - dst->rc_count +=3D count; > - dst->rc_removed_count +=3D removed_count; > - list_move_tail(&dst->rc_list, &list); > - list_move_tail(&dst->rc_removed, &removed); > - mutex_exit(&dst->rc_mtx); > - > - list_destroy(&list); > - list_destroy(&removed); > -} > - > -#endif /* ZFS_DEBUG */ > Index: sys/cddl/contrib/opensolaris/uts/common/Makefile.files > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/cddl/contrib/opensolaris/uts/common/Makefile.files (revision 2287= 98) > +++ sys/cddl/contrib/opensolaris/uts/common/Makefile.files (working copy) > @@ -55,7 +55,6 @@ > gzip.o \ > lzjb.o \ > metaslab.o \ > - refcount.o \ > sa.o \ > sha256.o \ > spa.o \ > Index: sys/sys/refcount.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/sys/refcount.h (revision 228798) > +++ sys/sys/refcount.h (working copy) > @@ -40,22 +40,24 @@ > #define KASSERT(exp, msg) /* */ > #endif > =20 > +typedef volatile u_int refcount_t; > + > static __inline void > -refcount_init(volatile u_int *count, u_int value) > +refcount_init(refcount_t *count, u_int value) > { > =20 > *count =3D value; > } > =20 > static __inline void > -refcount_acquire(volatile u_int *count) > +refcount_acquire(refcount_t *count) > { > =20 > atomic_add_acq_int(count, 1);=09 > } > =20 > static __inline int > -refcount_release(volatile u_int *count) > +refcount_release(refcount_t *count) > { > u_int old; > =20 --=20 ;s =3D; --Q68bSM7Ycu6FN28Q Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ref.diff" Content-Transfer-Encoding: quoted-printable Index: share/man/man9/refcount.9 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- share/man/man9/refcount.9 (revision 228798) +++ share/man/man9/refcount.9 (working copy) @@ -39,11 +39,11 @@ .In sys/param.h .In sys/refcount.h .Ft void -.Fn refcount_init "volatile u_int *count, u_int value" +.Fn refcount_init "refcount_t *count, u_int value" .Ft void -.Fn refcount_acquire "volatile u_int *count" +.Fn refcount_acquire "refcount_t *count" .Ft int -.Fn refcount_release "volatile u_int *count" +.Fn refcount_release "refcount_t *count" .Sh DESCRIPTION The .Nm Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (revision 2287= 98) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (working copy) @@ -23,7 +23,6 @@ * Use is subject to license terms. */ =20 -#include #include =20 /* @@ -81,7 +80,7 @@ { rrw_node_t *rn; =20 - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) + if (rrl->rr_linked_rcount =3D=3D 0) return (NULL); =20 for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { @@ -115,7 +114,7 @@ rrw_node_t *rn; rrw_node_t *prev =3D NULL; =20 - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) + if (rrl->rr_linked_rcount =3D=3D 0) return (B_FALSE); =20 for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { @@ -138,8 +137,8 @@ mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL); rrl->rr_writer =3D NULL; - refcount_create(&rrl->rr_anon_rcount); - refcount_create(&rrl->rr_linked_rcount); + rrl->rr_anon_rcount =3D 0; + rrl->rr_linked_rcount =3D 0; rrl->rr_writer_wanted =3D B_FALSE; } =20 @@ -149,8 +148,8 @@ mutex_destroy(&rrl->rr_lock); cv_destroy(&rrl->rr_cv); ASSERT(rrl->rr_writer =3D=3D NULL); - refcount_destroy(&rrl->rr_anon_rcount); - refcount_destroy(&rrl->rr_linked_rcount); + rrl->rr_anon_rcount =3D 0; + rrl->rr_linked_rcount =3D 0; } =20 static void @@ -159,26 +158,26 @@ mutex_enter(&rrl->rr_lock); #if !defined(DEBUG) && defined(_KERNEL) if (!rrl->rr_writer && !rrl->rr_writer_wanted) { - rrl->rr_anon_rcount.rc_count++; + rrl->rr_anon_rcount++; mutex_exit(&rrl->rr_lock); return; } DTRACE_PROBE(zfs__rrwfastpath__rdmiss); #endif ASSERT(rrl->rr_writer !=3D curthread); - ASSERT(refcount_count(&rrl->rr_anon_rcount) >=3D 0); + ASSERT(rrl->rr_anon_rcount >=3D 0); =20 while (rrl->rr_writer || (rrl->rr_writer_wanted && - refcount_is_zero(&rrl->rr_anon_rcount) && + rrl->rr_anon_rcount =3D=3D 0 && rrn_find(rrl) =3D=3D NULL)) cv_wait(&rrl->rr_cv, &rrl->rr_lock); =20 if (rrl->rr_writer_wanted) { /* may or may not be a re-entrant enter */ rrn_add(rrl); - (void) refcount_add(&rrl->rr_linked_rcount, tag); + rrl->rr_linked_rcount++; } else { - (void) refcount_add(&rrl->rr_anon_rcount, tag); + rrl->rr_anon_rcount++; } ASSERT(rrl->rr_writer =3D=3D NULL); mutex_exit(&rrl->rr_lock); @@ -190,8 +189,8 @@ mutex_enter(&rrl->rr_lock); ASSERT(rrl->rr_writer !=3D curthread); =20 - while (refcount_count(&rrl->rr_anon_rcount) > 0 || - refcount_count(&rrl->rr_linked_rcount) > 0 || + while (rrl->rr_anon_rcount > 0 || + rrl->rr_linked_rcount > 0 || rrl->rr_writer !=3D NULL) { rrl->rr_writer_wanted =3D B_TRUE; cv_wait(&rrl->rr_cv, &rrl->rr_lock); @@ -224,22 +223,22 @@ } DTRACE_PROBE(zfs__rrwfastpath__exitmiss); #endif - ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) || - !refcount_is_zero(&rrl->rr_linked_rcount) || + ASSERT(rrl->rr_anon_rcount !=3D 0 || + rrl->rr_linked_rcount !=3D 0 || rrl->rr_writer !=3D NULL); =20 if (rrl->rr_writer =3D=3D NULL) { - int64_t count; + uint64_t count; if (rrn_find_and_remove(rrl)) - count =3D refcount_remove(&rrl->rr_linked_rcount, tag); + count =3D rrl->rr_linked_rcount--; else - count =3D refcount_remove(&rrl->rr_anon_rcount, tag); + count =3D rrl->rr_anon_rcount--; if (count =3D=3D 0) cv_broadcast(&rrl->rr_cv); } else { ASSERT(rrl->rr_writer =3D=3D curthread); - ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) && - refcount_is_zero(&rrl->rr_linked_rcount)); + ASSERT(rrl->rr_anon_rcount =3D=3D 0 && + rrl->rr_linked_rcount =3D=3D 0); rrl->rr_writer =3D NULL; cv_broadcast(&rrl->rr_cv); } @@ -255,8 +254,8 @@ if (rw =3D=3D RW_WRITER) { held =3D (rrl->rr_writer =3D=3D curthread); } else { - held =3D (!refcount_is_zero(&rrl->rr_anon_rcount) || - !refcount_is_zero(&rrl->rr_linked_rcount)); + held =3D (rrl->rr_anon_rcount !=3D 0 || + rrl->rr_linked_rcount !=3D 0); } mutex_exit(&rrl->rr_lock); =20 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (revision= 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (working = copy) @@ -31,10 +31,6 @@ #include #include =20 -#ifdef __cplusplus -extern "C" { -#endif - /* * If the reference is held only by the calling function and not any * particular object, use FTAG (which is a string) for the holder_tag. @@ -42,68 +38,21 @@ */ #define FTAG ((char *)__func__) =20 -#ifdef ZFS_DEBUG -typedef struct reference { - list_node_t ref_link; - void *ref_holder; - uint64_t ref_number; - uint8_t *ref_removed; -} reference_t; - -typedef struct refcount { - kmutex_t rc_mtx; - list_t rc_list; - list_t rc_removed; - int64_t rc_count; - int64_t rc_removed_count; -} refcount_t; - -/* Note: refcount_t must be initialized with refcount_create() */ - -void refcount_create(refcount_t *rc); -void refcount_destroy(refcount_t *rc); -void refcount_destroy_many(refcount_t *rc, uint64_t number); -int refcount_is_zero(refcount_t *rc); -int64_t refcount_count(refcount_t *rc); -int64_t refcount_add(refcount_t *rc, void *holder_tag); -int64_t refcount_remove(refcount_t *rc, void *holder_tag); -int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_ta= g); -int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder= _tag); -void refcount_transfer(refcount_t *dst, refcount_t *src); - -void refcount_sysinit(void); -void refcount_fini(void); - -#else /* ZFS_DEBUG */ - -typedef struct refcount { - uint64_t rc_count; -} refcount_t; - -#define refcount_create(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy_many(rc, number) ((rc)->rc_count =3D 0) -#define refcount_is_zero(rc) ((rc)->rc_count =3D=3D 0) -#define refcount_count(rc) ((rc)->rc_count) -#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1) -#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1) +#define refcount_create(rc) refcount_init(rc, 0) +#define refcount_destroy(rc) refcount_init(rc, 0) +#define refcount_destroy_many(rc, number) refcount_init(rc, 0) +#define refcount_is_zero(rc) ((*rc) =3D=3D 0) +#define refcount_count(rc) (uint64_t)(*rc) +#define refcount_add(rc, holder) refcount_add_many(rc, 1, holder) +#define refcount_remove(rc, holder) refcount_remove_many(rc, 1, holder) #define refcount_add_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, number) + (uint64_t)(atomic_fetchadd_int(rc, number) + (number)) #define refcount_remove_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, -number) -#define refcount_transfer(dst, src) { \ - uint64_t __tmp =3D (src)->rc_count; \ - atomic_add_64(&(src)->rc_count, -__tmp); \ - atomic_add_64(&(dst)->rc_count, __tmp); \ -} + (uint64_t)(atomic_fetchadd_int(rc, -(number)) - (number)) +#define refcount_transfer(dst, src) \ + atomic_add_int(dst, atomic_readandclear_int(src)) =20 #define refcount_sysinit() #define refcount_fini() =20 -#endif /* ZFS_DEBUG */ - -#ifdef __cplusplus -} -#endif - #endif /* _SYS_REFCOUNT_H */ Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (revision = 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (working c= opy) @@ -33,7 +33,6 @@ #endif =20 #include -#include =20 /* * A reader-writer lock implementation that allows re-entrant reads, but @@ -53,8 +52,8 @@ kmutex_t rr_lock; kcondvar_t rr_cv; kthread_t *rr_writer; - refcount_t rr_anon_rcount; - refcount_t rr_linked_rcount; + uint64_t rr_anon_rcount; + uint64_t rr_linked_rcount; boolean_t rr_writer_wanted; } rrwlock_t; =20 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (revision 228= 798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (working copy) @@ -1,223 +0,0 @@ -/* - * CDDL HEADER START - * - * The contents of this file are subject to the terms of the - * Common Development and Distribution License (the "License"). - * You may not use this file except in compliance with the License. - * - * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE - * or http://www.opensolaris.org/os/licensing. - * See the License for the specific language governing permissions - * and limitations under the License. - * - * When distributing Covered Code, include this CDDL HEADER in each - * file and include the License file at usr/src/OPENSOLARIS.LICENSE. - * If applicable, add the following below this CDDL HEADER, with the - * fields enclosed by brackets "[]" replaced with your own identifying - * information: Portions Copyright [yyyy] [name of copyright owner] - * - * CDDL HEADER END - */ -/* - * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights rese= rved. - */ - -#include -#include - -#ifdef ZFS_DEBUG - -#ifdef _KERNEL -int reference_tracking_enable =3D FALSE; /* runs out of memory too easily = */ -#else -int reference_tracking_enable =3D TRUE; -#endif -int reference_history =3D 4; /* tunable */ - -static kmem_cache_t *reference_cache; -static kmem_cache_t *reference_history_cache; - -void -refcount_sysinit(void) -{ - reference_cache =3D kmem_cache_create("reference_cache", - sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0); - - reference_history_cache =3D kmem_cache_create("reference_history_cache", - sizeof (uint64_t), 0, NULL, NULL, NULL, NULL, NULL, 0); -} - -void -refcount_fini(void) -{ - kmem_cache_destroy(reference_cache); - kmem_cache_destroy(reference_history_cache); -} - -void -refcount_create(refcount_t *rc) -{ - mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL); - list_create(&rc->rc_list, sizeof (reference_t), - offsetof(reference_t, ref_link)); - list_create(&rc->rc_removed, sizeof (reference_t), - offsetof(reference_t, ref_link)); - rc->rc_count =3D 0; - rc->rc_removed_count =3D 0; -} - -void -refcount_destroy_many(refcount_t *rc, uint64_t number) -{ - reference_t *ref; - - ASSERT(rc->rc_count =3D=3D number); - while (ref =3D list_head(&rc->rc_list)) { - list_remove(&rc->rc_list, ref); - kmem_cache_free(reference_cache, ref); - } - list_destroy(&rc->rc_list); - - while (ref =3D list_head(&rc->rc_removed)) { - list_remove(&rc->rc_removed, ref); - kmem_cache_free(reference_history_cache, ref->ref_removed); - kmem_cache_free(reference_cache, ref); - } - list_destroy(&rc->rc_removed); - mutex_destroy(&rc->rc_mtx); -} - -void -refcount_destroy(refcount_t *rc) -{ - refcount_destroy_many(rc, 0); -} - -int -refcount_is_zero(refcount_t *rc) -{ - ASSERT(rc->rc_count >=3D 0); - return (rc->rc_count =3D=3D 0); -} - -int64_t -refcount_count(refcount_t *rc) -{ - ASSERT(rc->rc_count >=3D 0); - return (rc->rc_count); -} - -int64_t -refcount_add_many(refcount_t *rc, uint64_t number, void *holder) -{ - reference_t *ref; - int64_t count; - - if (reference_tracking_enable) { - ref =3D kmem_cache_alloc(reference_cache, KM_SLEEP); - ref->ref_holder =3D holder; - ref->ref_number =3D number; - } - mutex_enter(&rc->rc_mtx); - ASSERT(rc->rc_count >=3D 0); - if (reference_tracking_enable) - list_insert_head(&rc->rc_list, ref); - rc->rc_count +=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - - return (count); -} - -int64_t -refcount_add(refcount_t *rc, void *holder) -{ - return (refcount_add_many(rc, 1, holder)); -} - -int64_t -refcount_remove_many(refcount_t *rc, uint64_t number, void *holder) -{ - reference_t *ref; - int64_t count; - - mutex_enter(&rc->rc_mtx); - ASSERT(rc->rc_count >=3D number); - - if (!reference_tracking_enable) { - rc->rc_count -=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - return (count); - } - - for (ref =3D list_head(&rc->rc_list); ref; - ref =3D list_next(&rc->rc_list, ref)) { - if (ref->ref_holder =3D=3D holder && ref->ref_number =3D=3D number) { - list_remove(&rc->rc_list, ref); - if (reference_history > 0) { - ref->ref_removed =3D - kmem_cache_alloc(reference_history_cache, - KM_SLEEP); - list_insert_head(&rc->rc_removed, ref); - rc->rc_removed_count++; - if (rc->rc_removed_count >=3D reference_history) { - ref =3D list_tail(&rc->rc_removed); - list_remove(&rc->rc_removed, ref); - kmem_cache_free(reference_history_cache, - ref->ref_removed); - kmem_cache_free(reference_cache, ref); - rc->rc_removed_count--; - } - } else { - kmem_cache_free(reference_cache, ref); - } - rc->rc_count -=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - return (count); - } - } - panic("No such hold %p on refcount %llx", holder, - (u_longlong_t)(uintptr_t)rc); - return (-1); -} - -int64_t -refcount_remove(refcount_t *rc, void *holder) -{ - return (refcount_remove_many(rc, 1, holder)); -} - -void -refcount_transfer(refcount_t *dst, refcount_t *src) -{ - int64_t count, removed_count; - list_t list, removed; - - list_create(&list, sizeof (reference_t), - offsetof(reference_t, ref_link)); - list_create(&removed, sizeof (reference_t), - offsetof(reference_t, ref_link)); - - mutex_enter(&src->rc_mtx); - count =3D src->rc_count; - removed_count =3D src->rc_removed_count; - src->rc_count =3D 0; - src->rc_removed_count =3D 0; - list_move_tail(&list, &src->rc_list); - list_move_tail(&removed, &src->rc_removed); - mutex_exit(&src->rc_mtx); - - mutex_enter(&dst->rc_mtx); - dst->rc_count +=3D count; - dst->rc_removed_count +=3D removed_count; - list_move_tail(&dst->rc_list, &list); - list_move_tail(&dst->rc_removed, &removed); - mutex_exit(&dst->rc_mtx); - - list_destroy(&list); - list_destroy(&removed); -} - -#endif /* ZFS_DEBUG */ Index: sys/cddl/contrib/opensolaris/uts/common/Makefile.files =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/Makefile.files (revision 228798) +++ sys/cddl/contrib/opensolaris/uts/common/Makefile.files (working copy) @@ -55,7 +55,6 @@ gzip.o \ lzjb.o \ metaslab.o \ - refcount.o \ sa.o \ sha256.o \ spa.o \ Index: sys/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/refcount.h (revision 228798) +++ sys/sys/refcount.h (working copy) @@ -40,22 +40,24 @@ #define KASSERT(exp, msg) /* */ #endif =20 +typedef volatile u_int refcount_t; + static __inline void -refcount_init(volatile u_int *count, u_int value) +refcount_init(refcount_t *count, u_int value) { =20 *count =3D value; } =20 static __inline void -refcount_acquire(volatile u_int *count) +refcount_acquire(refcount_t *count) { =20 atomic_add_acq_int(count, 1);=09 } =20 static __inline int -refcount_release(volatile u_int *count) +refcount_release(refcount_t *count) { u_int old; =20 Index: share/man/man9/refcount.9 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- share/man/man9/refcount.9 (revision 228798) +++ share/man/man9/refcount.9 (working copy) @@ -39,11 +39,11 @@ .In sys/param.h .In sys/refcount.h .Ft void -.Fn refcount_init "volatile u_int *count, u_int value" +.Fn refcount_init "refcount_t *count, u_int value" .Ft void -.Fn refcount_acquire "volatile u_int *count" +.Fn refcount_acquire "refcount_t *count" .Ft int -.Fn refcount_release "volatile u_int *count" +.Fn refcount_release "refcount_t *count" .Sh DESCRIPTION The .Nm Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (revision= 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (working = copy) @@ -31,10 +31,6 @@ #include #include =20 -#ifdef __cplusplus -extern "C" { -#endif - /* * If the reference is held only by the calling function and not any * particular object, use FTAG (which is a string) for the holder_tag. @@ -42,68 +38,21 @@ */ #define FTAG ((char *)__func__) =20 -#ifdef ZFS_DEBUG -typedef struct reference { - list_node_t ref_link; - void *ref_holder; - uint64_t ref_number; - uint8_t *ref_removed; -} reference_t; - -typedef struct refcount { - kmutex_t rc_mtx; - list_t rc_list; - list_t rc_removed; - int64_t rc_count; - int64_t rc_removed_count; -} refcount_t; - -/* Note: refcount_t must be initialized with refcount_create() */ - -void refcount_create(refcount_t *rc); -void refcount_destroy(refcount_t *rc); -void refcount_destroy_many(refcount_t *rc, uint64_t number); -int refcount_is_zero(refcount_t *rc); -int64_t refcount_count(refcount_t *rc); -int64_t refcount_add(refcount_t *rc, void *holder_tag); -int64_t refcount_remove(refcount_t *rc, void *holder_tag); -int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_ta= g); -int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder= _tag); -void refcount_transfer(refcount_t *dst, refcount_t *src); - -void refcount_sysinit(void); -void refcount_fini(void); - -#else /* ZFS_DEBUG */ - -typedef struct refcount { - uint64_t rc_count; -} refcount_t; - -#define refcount_create(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy_many(rc, number) ((rc)->rc_count =3D 0) -#define refcount_is_zero(rc) ((rc)->rc_count =3D=3D 0) -#define refcount_count(rc) ((rc)->rc_count) -#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1) -#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1) +#define refcount_create(rc) refcount_init(rc, 0) +#define refcount_destroy(rc) refcount_init(rc, 0) +#define refcount_destroy_many(rc, number) refcount_init(rc, 0) +#define refcount_is_zero(rc) ((*rc) =3D=3D 0) +#define refcount_count(rc) (uint64_t)(*rc) +#define refcount_add(rc, holder) refcount_add_many(rc, 1, holder) +#define refcount_remove(rc, holder) refcount_remove_many(rc, 1, holder) #define refcount_add_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, number) + (uint64_t)(atomic_fetchadd_int(rc, number) + (number)) #define refcount_remove_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, -number) -#define refcount_transfer(dst, src) { \ - uint64_t __tmp =3D (src)->rc_count; \ - atomic_add_64(&(src)->rc_count, -__tmp); \ - atomic_add_64(&(dst)->rc_count, __tmp); \ -} + (uint64_t)(atomic_fetchadd_int(rc, -(number)) - (number)) +#define refcount_transfer(dst, src) \ + atomic_add_int(dst, atomic_readandclear_int(src)) =20 #define refcount_sysinit() #define refcount_fini() =20 -#endif /* ZFS_DEBUG */ - -#ifdef __cplusplus -} -#endif - #endif /* _SYS_REFCOUNT_H */ Index: sys/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/refcount.h (revision 228798) +++ sys/sys/refcount.h (working copy) @@ -40,22 +40,24 @@ #define KASSERT(exp, msg) /* */ #endif =20 +typedef volatile u_int refcount_t; + static __inline void -refcount_init(volatile u_int *count, u_int value) +refcount_init(refcount_t *count, u_int value) { =20 *count =3D value; } =20 static __inline void -refcount_acquire(volatile u_int *count) +refcount_acquire(refcount_t *count) { =20 atomic_add_acq_int(count, 1);=09 } =20 static __inline int -refcount_release(volatile u_int *count) +refcount_release(refcount_t *count) { u_int old; =20 --Q68bSM7Ycu6FN28Q-- --l76fUT7nc3MelDdI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- iQEbBAEBAgAGBQJO9A0wAAoJEJBXh4mJ2FR+RUEH9i3WwPABKGwt2gUFbB6P/upv Rs3nOi8u8srpykiEa7lWQ6Oe7HwcDh6nhdTd8OxCAfy8+eZu/C4nz/npFRy0d1Kr zH//wpEhSHom8odqVGGNMgerZrxCfHKxnji94zSUMSdY49PKBGKOia9aCzf7IJ6i nNRXaO36Ryt0Kpny9kBqKByLs9Zb7d0hrowRHwaepiaHZJ7IObKIi7Rpnw2daqsc +n9rxz4QVd2pmhtuib265Ci04BAkCrwO9GmVS6dSJh9jq1X3oYMSEMBsrFJnr948 IvAoImV+itrAMM3K9dRlERtMXSJZ4sHvSS66y8CREhOHwsmoU6nvQkPVCf8jPw== =d/j5 -----END PGP SIGNATURE----- --l76fUT7nc3MelDdI-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 08:22:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51DB61065670 for ; Fri, 23 Dec 2011 08:22:59 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.freebsd.org (Postfix) with ESMTP id D31078FC13 for ; Fri, 23 Dec 2011 08:22:58 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap4) with ESMTP (Nemesis) id 0MeBmY-1RJhZX2yhK-00Py1y; Fri, 23 Dec 2011 09:22:57 +0100 Message-ID: <4EF43A60.7080409@brockmann-consult.de> Date: Fri, 23 Dec 2011 09:22:56 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <20111025193302.GA30409@nargothrond.kdm.org> <20111026101602.GA9768@icarus.home.lan> <75BDE9FA-6130-4BB4-8518-275D68BB3E49@slu.se> <4EB7FAEF.30505@interlog.com> In-Reply-To: <4EB7FAEF.30505@interlog.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:DKhXJx/N5aqA5JKAC055Ui9NZ9SUcqjRkDAn7VyhfZF P87bb8Nv6p+LoeLivmsmCHwGuFN4umhbpSli6hRZsgahStbipz RnU8EnpYSIPCPw4tPsySxK0HYhec4aWtBjQs0hpkyZ/yiyAEnC SquAhD7F/xG9zYwqGr97Cy863wA4zoVvmOXdYzMbb4TvkROko+ 2ozL2oVQFkV6oZdReGwEgTtyz8FPETDG9DVVWPuz9I0vfEEC19 Ilt9w8Vz4SJj+mPhjBs9qQV/y/q6kZJYozvadfQACxdT7odGMv zfyDqeJnh5p+gXDDMKfJwHIHcGNapSHuPClT1FIx6xklB3vHAu uvatbQro/dNhofQtofdnzHdsSNxYhLqsGouSn5CIW Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 08:22:59 -0000 On 11/07/2011 04:36 PM, Douglas Gilbert wrote: > On 11-11-07 03:56 AM, Rich wrote: >> Observation - the LSI SAS expanders, in my experience, sometimes >> misbehave when there are drives which respond slower than some timeout >> to commands (as far as I've seen it's only SATA drives it does this >> for, but I don't have many SAS drives for comparison), leading to all >> further commands to that drive for a bit not working, and then what >> happens depending on the OS varies dramatically. >> >> If you could try without an expander (e.g. with 1->4 SAS->SATA fanout >> cables), you may be surprised (and/or annoyed) to find your life gets >> better. > > SAS-2 expanders are better than the original generation. > [LSI makes both.] SAS-2 added the CONFIGURE GENERAL SMP > function which contains various timeout tweaks for the > STP protocol (i.e. the protocol that tunnels (S)ATA > commands between a SAS HBA (initiator) and an expander). > > If you are using SAS-2 expanders and FreeBSD 9.0 then you > can fetch my smp_utils package and use the smp_conf_general > utility to change those timeout settings. If you have SAS-2 > expanders but an older version of FreeBSD then you will > need Solaris or Linux to run my smp_utils package in order > to change those timeout values on the expander. > > Doug Gilbert > > BTW smp_rep_general will show the current settings of those > STP timeouts. Doug, Thank you for your suggestion. I have a similar problem to Karli's, and your suggestion is next on my list, but I have some questions. If I boot off of a Linux/Solaris/FreeBSD 9 USB stick and run the tools, does the change persist on reboot? (I would assume yes since you suggested using Linux/Solaris) For FreeBSD 9, do I get the package from ports? If I run it from Linux, where do I get the package from? Thanks, Peter From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 09:29:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 118801065672; Fri, 23 Dec 2011 09:29:53 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 24E1C8FC16; Fri, 23 Dec 2011 09:29:51 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 167B42A28EB1; Fri, 23 Dec 2011 10:29:51 +0100 (CET) Date: Fri, 23 Dec 2011 10:29:51 +0100 From: Ed Schouten To: Jason Hellenthal Message-ID: <20111223092951.GX1771@hoeg.nl> References: <20111222214728.GV1771@hoeg.nl> <20111222230756.GW1771@hoeg.nl> <20111223051009.GA77353@DataIX.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="4hVWSOEjyjyaPny9" Content-Disposition: inline In-Reply-To: <20111223051009.GA77353@DataIX.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, pjd@freebsd.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 09:29:53 -0000 --4hVWSOEjyjyaPny9 Content-Type: multipart/mixed; boundary="bF8FQe8Tx5jntaYn" Content-Disposition: inline --bF8FQe8Tx5jntaYn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Jason, * Jason Hellenthal , 20111223 06:10: > Nice work. So all-in-all we need both patches concatenated like so...? No, just the last patch. But now that I looked through the diff, I see that I made a small logic error in rrwlock.c. count =3D rrl->rr_linked_rcount--; should be: count =3D --rrl->rr_linked_rcount; Therefore I have attached a new patch that should be the final version. So you only need the patch attached to this email. Sorry for the inconvenience! --=20 Ed Schouten WWW: http://80386.nl/ --bF8FQe8Tx5jntaYn Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="refcount.diff" Content-Transfer-Encoding: quoted-printable Index: share/man/man9/refcount.9 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- share/man/man9/refcount.9 (revision 228798) +++ share/man/man9/refcount.9 (working copy) @@ -39,11 +39,11 @@ .In sys/param.h .In sys/refcount.h .Ft void -.Fn refcount_init "volatile u_int *count, u_int value" +.Fn refcount_init "refcount_t *count, u_int value" .Ft void -.Fn refcount_acquire "volatile u_int *count" +.Fn refcount_acquire "refcount_t *count" .Ft int -.Fn refcount_release "volatile u_int *count" +.Fn refcount_release "refcount_t *count" .Sh DESCRIPTION The .Nm Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (revision 2287= 98) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c (working copy) @@ -23,7 +23,6 @@ * Use is subject to license terms. */ =20 -#include #include =20 /* @@ -81,7 +80,7 @@ { rrw_node_t *rn; =20 - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) + if (rrl->rr_linked_rcount =3D=3D 0) return (NULL); =20 for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { @@ -115,7 +114,7 @@ rrw_node_t *rn; rrw_node_t *prev =3D NULL; =20 - if (refcount_count(&rrl->rr_linked_rcount) =3D=3D 0) + if (rrl->rr_linked_rcount =3D=3D 0) return (B_FALSE); =20 for (rn =3D tsd_get(rrw_tsd_key); rn !=3D NULL; rn =3D rn->rn_next) { @@ -138,8 +137,8 @@ mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL); rrl->rr_writer =3D NULL; - refcount_create(&rrl->rr_anon_rcount); - refcount_create(&rrl->rr_linked_rcount); + rrl->rr_anon_rcount =3D 0; + rrl->rr_linked_rcount =3D 0; rrl->rr_writer_wanted =3D B_FALSE; } =20 @@ -149,8 +148,8 @@ mutex_destroy(&rrl->rr_lock); cv_destroy(&rrl->rr_cv); ASSERT(rrl->rr_writer =3D=3D NULL); - refcount_destroy(&rrl->rr_anon_rcount); - refcount_destroy(&rrl->rr_linked_rcount); + rrl->rr_anon_rcount =3D 0; + rrl->rr_linked_rcount =3D 0; } =20 static void @@ -159,26 +158,26 @@ mutex_enter(&rrl->rr_lock); #if !defined(DEBUG) && defined(_KERNEL) if (!rrl->rr_writer && !rrl->rr_writer_wanted) { - rrl->rr_anon_rcount.rc_count++; + rrl->rr_anon_rcount++; mutex_exit(&rrl->rr_lock); return; } DTRACE_PROBE(zfs__rrwfastpath__rdmiss); #endif ASSERT(rrl->rr_writer !=3D curthread); - ASSERT(refcount_count(&rrl->rr_anon_rcount) >=3D 0); + ASSERT(rrl->rr_anon_rcount >=3D 0); =20 while (rrl->rr_writer || (rrl->rr_writer_wanted && - refcount_is_zero(&rrl->rr_anon_rcount) && + rrl->rr_anon_rcount =3D=3D 0 && rrn_find(rrl) =3D=3D NULL)) cv_wait(&rrl->rr_cv, &rrl->rr_lock); =20 if (rrl->rr_writer_wanted) { /* may or may not be a re-entrant enter */ rrn_add(rrl); - (void) refcount_add(&rrl->rr_linked_rcount, tag); + rrl->rr_linked_rcount++; } else { - (void) refcount_add(&rrl->rr_anon_rcount, tag); + rrl->rr_anon_rcount++; } ASSERT(rrl->rr_writer =3D=3D NULL); mutex_exit(&rrl->rr_lock); @@ -190,8 +189,8 @@ mutex_enter(&rrl->rr_lock); ASSERT(rrl->rr_writer !=3D curthread); =20 - while (refcount_count(&rrl->rr_anon_rcount) > 0 || - refcount_count(&rrl->rr_linked_rcount) > 0 || + while (rrl->rr_anon_rcount > 0 || + rrl->rr_linked_rcount > 0 || rrl->rr_writer !=3D NULL) { rrl->rr_writer_wanted =3D B_TRUE; cv_wait(&rrl->rr_cv, &rrl->rr_lock); @@ -215,31 +214,30 @@ { mutex_enter(&rrl->rr_lock); #if !defined(DEBUG) && defined(_KERNEL) - if (!rrl->rr_writer && rrl->rr_linked_rcount.rc_count =3D=3D 0) { - rrl->rr_anon_rcount.rc_count--; - if (rrl->rr_anon_rcount.rc_count =3D=3D 0) + if (!rrl->rr_writer && rrl->rr_linked_rcount =3D=3D 0) { + if (--rrl->rr_anon_rcount =3D=3D 0) cv_broadcast(&rrl->rr_cv); mutex_exit(&rrl->rr_lock); return; } DTRACE_PROBE(zfs__rrwfastpath__exitmiss); #endif - ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) || - !refcount_is_zero(&rrl->rr_linked_rcount) || + ASSERT(rrl->rr_anon_rcount !=3D 0 || + rrl->rr_linked_rcount !=3D 0 || rrl->rr_writer !=3D NULL); =20 if (rrl->rr_writer =3D=3D NULL) { - int64_t count; + uint64_t count; if (rrn_find_and_remove(rrl)) - count =3D refcount_remove(&rrl->rr_linked_rcount, tag); + count =3D --rrl->rr_linked_rcount; else - count =3D refcount_remove(&rrl->rr_anon_rcount, tag); + count =3D --rrl->rr_anon_rcount; if (count =3D=3D 0) cv_broadcast(&rrl->rr_cv); } else { ASSERT(rrl->rr_writer =3D=3D curthread); - ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) && - refcount_is_zero(&rrl->rr_linked_rcount)); + ASSERT(rrl->rr_anon_rcount =3D=3D 0 && + rrl->rr_linked_rcount =3D=3D 0); rrl->rr_writer =3D NULL; cv_broadcast(&rrl->rr_cv); } @@ -255,8 +253,8 @@ if (rw =3D=3D RW_WRITER) { held =3D (rrl->rr_writer =3D=3D curthread); } else { - held =3D (!refcount_is_zero(&rrl->rr_anon_rcount) || - !refcount_is_zero(&rrl->rr_linked_rcount)); + held =3D (rrl->rr_anon_rcount !=3D 0 || + rrl->rr_linked_rcount !=3D 0); } mutex_exit(&rrl->rr_lock); =20 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (revision= 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h (working = copy) @@ -31,10 +31,6 @@ #include #include =20 -#ifdef __cplusplus -extern "C" { -#endif - /* * If the reference is held only by the calling function and not any * particular object, use FTAG (which is a string) for the holder_tag. @@ -42,68 +38,21 @@ */ #define FTAG ((char *)__func__) =20 -#ifdef ZFS_DEBUG -typedef struct reference { - list_node_t ref_link; - void *ref_holder; - uint64_t ref_number; - uint8_t *ref_removed; -} reference_t; - -typedef struct refcount { - kmutex_t rc_mtx; - list_t rc_list; - list_t rc_removed; - int64_t rc_count; - int64_t rc_removed_count; -} refcount_t; - -/* Note: refcount_t must be initialized with refcount_create() */ - -void refcount_create(refcount_t *rc); -void refcount_destroy(refcount_t *rc); -void refcount_destroy_many(refcount_t *rc, uint64_t number); -int refcount_is_zero(refcount_t *rc); -int64_t refcount_count(refcount_t *rc); -int64_t refcount_add(refcount_t *rc, void *holder_tag); -int64_t refcount_remove(refcount_t *rc, void *holder_tag); -int64_t refcount_add_many(refcount_t *rc, uint64_t number, void *holder_ta= g); -int64_t refcount_remove_many(refcount_t *rc, uint64_t number, void *holder= _tag); -void refcount_transfer(refcount_t *dst, refcount_t *src); - -void refcount_sysinit(void); -void refcount_fini(void); - -#else /* ZFS_DEBUG */ - -typedef struct refcount { - uint64_t rc_count; -} refcount_t; - -#define refcount_create(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy(rc) ((rc)->rc_count =3D 0) -#define refcount_destroy_many(rc, number) ((rc)->rc_count =3D 0) -#define refcount_is_zero(rc) ((rc)->rc_count =3D=3D 0) -#define refcount_count(rc) ((rc)->rc_count) -#define refcount_add(rc, holder) atomic_add_64_nv(&(rc)->rc_count, 1) -#define refcount_remove(rc, holder) atomic_add_64_nv(&(rc)->rc_count, -1) +#define refcount_create(rc) refcount_init(rc, 0) +#define refcount_destroy(rc) refcount_init(rc, 0) +#define refcount_destroy_many(rc, number) refcount_init(rc, 0) +#define refcount_is_zero(rc) ((*rc) =3D=3D 0) +#define refcount_count(rc) (uint64_t)(*rc) +#define refcount_add(rc, holder) refcount_add_many(rc, 1, holder) +#define refcount_remove(rc, holder) refcount_remove_many(rc, 1, holder) #define refcount_add_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, number) + (uint64_t)(atomic_fetchadd_int(rc, number) + (number)) #define refcount_remove_many(rc, number, holder) \ - atomic_add_64_nv(&(rc)->rc_count, -number) -#define refcount_transfer(dst, src) { \ - uint64_t __tmp =3D (src)->rc_count; \ - atomic_add_64(&(src)->rc_count, -__tmp); \ - atomic_add_64(&(dst)->rc_count, __tmp); \ -} + (uint64_t)(atomic_fetchadd_int(rc, -(number)) - (number)) +#define refcount_transfer(dst, src) \ + atomic_add_int(dst, atomic_readandclear_int(src)) =20 #define refcount_sysinit() #define refcount_fini() =20 -#endif /* ZFS_DEBUG */ - -#ifdef __cplusplus -} -#endif - #endif /* _SYS_REFCOUNT_H */ Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (revision = 228798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/rrwlock.h (working c= opy) @@ -33,7 +33,6 @@ #endif =20 #include -#include =20 /* * A reader-writer lock implementation that allows re-entrant reads, but @@ -53,8 +52,8 @@ kmutex_t rr_lock; kcondvar_t rr_cv; kthread_t *rr_writer; - refcount_t rr_anon_rcount; - refcount_t rr_linked_rcount; + uint64_t rr_anon_rcount; + uint64_t rr_linked_rcount; boolean_t rr_writer_wanted; } rrwlock_t; =20 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (revision 228= 798) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c (working copy) @@ -1,223 +0,0 @@ -/* - * CDDL HEADER START - * - * The contents of this file are subject to the terms of the - * Common Development and Distribution License (the "License"). - * You may not use this file except in compliance with the License. - * - * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE - * or http://www.opensolaris.org/os/licensing. - * See the License for the specific language governing permissions - * and limitations under the License. - * - * When distributing Covered Code, include this CDDL HEADER in each - * file and include the License file at usr/src/OPENSOLARIS.LICENSE. - * If applicable, add the following below this CDDL HEADER, with the - * fields enclosed by brackets "[]" replaced with your own identifying - * information: Portions Copyright [yyyy] [name of copyright owner] - * - * CDDL HEADER END - */ -/* - * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights rese= rved. - */ - -#include -#include - -#ifdef ZFS_DEBUG - -#ifdef _KERNEL -int reference_tracking_enable =3D FALSE; /* runs out of memory too easily = */ -#else -int reference_tracking_enable =3D TRUE; -#endif -int reference_history =3D 4; /* tunable */ - -static kmem_cache_t *reference_cache; -static kmem_cache_t *reference_history_cache; - -void -refcount_sysinit(void) -{ - reference_cache =3D kmem_cache_create("reference_cache", - sizeof (reference_t), 0, NULL, NULL, NULL, NULL, NULL, 0); - - reference_history_cache =3D kmem_cache_create("reference_history_cache", - sizeof (uint64_t), 0, NULL, NULL, NULL, NULL, NULL, 0); -} - -void -refcount_fini(void) -{ - kmem_cache_destroy(reference_cache); - kmem_cache_destroy(reference_history_cache); -} - -void -refcount_create(refcount_t *rc) -{ - mutex_init(&rc->rc_mtx, NULL, MUTEX_DEFAULT, NULL); - list_create(&rc->rc_list, sizeof (reference_t), - offsetof(reference_t, ref_link)); - list_create(&rc->rc_removed, sizeof (reference_t), - offsetof(reference_t, ref_link)); - rc->rc_count =3D 0; - rc->rc_removed_count =3D 0; -} - -void -refcount_destroy_many(refcount_t *rc, uint64_t number) -{ - reference_t *ref; - - ASSERT(rc->rc_count =3D=3D number); - while (ref =3D list_head(&rc->rc_list)) { - list_remove(&rc->rc_list, ref); - kmem_cache_free(reference_cache, ref); - } - list_destroy(&rc->rc_list); - - while (ref =3D list_head(&rc->rc_removed)) { - list_remove(&rc->rc_removed, ref); - kmem_cache_free(reference_history_cache, ref->ref_removed); - kmem_cache_free(reference_cache, ref); - } - list_destroy(&rc->rc_removed); - mutex_destroy(&rc->rc_mtx); -} - -void -refcount_destroy(refcount_t *rc) -{ - refcount_destroy_many(rc, 0); -} - -int -refcount_is_zero(refcount_t *rc) -{ - ASSERT(rc->rc_count >=3D 0); - return (rc->rc_count =3D=3D 0); -} - -int64_t -refcount_count(refcount_t *rc) -{ - ASSERT(rc->rc_count >=3D 0); - return (rc->rc_count); -} - -int64_t -refcount_add_many(refcount_t *rc, uint64_t number, void *holder) -{ - reference_t *ref; - int64_t count; - - if (reference_tracking_enable) { - ref =3D kmem_cache_alloc(reference_cache, KM_SLEEP); - ref->ref_holder =3D holder; - ref->ref_number =3D number; - } - mutex_enter(&rc->rc_mtx); - ASSERT(rc->rc_count >=3D 0); - if (reference_tracking_enable) - list_insert_head(&rc->rc_list, ref); - rc->rc_count +=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - - return (count); -} - -int64_t -refcount_add(refcount_t *rc, void *holder) -{ - return (refcount_add_many(rc, 1, holder)); -} - -int64_t -refcount_remove_many(refcount_t *rc, uint64_t number, void *holder) -{ - reference_t *ref; - int64_t count; - - mutex_enter(&rc->rc_mtx); - ASSERT(rc->rc_count >=3D number); - - if (!reference_tracking_enable) { - rc->rc_count -=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - return (count); - } - - for (ref =3D list_head(&rc->rc_list); ref; - ref =3D list_next(&rc->rc_list, ref)) { - if (ref->ref_holder =3D=3D holder && ref->ref_number =3D=3D number) { - list_remove(&rc->rc_list, ref); - if (reference_history > 0) { - ref->ref_removed =3D - kmem_cache_alloc(reference_history_cache, - KM_SLEEP); - list_insert_head(&rc->rc_removed, ref); - rc->rc_removed_count++; - if (rc->rc_removed_count >=3D reference_history) { - ref =3D list_tail(&rc->rc_removed); - list_remove(&rc->rc_removed, ref); - kmem_cache_free(reference_history_cache, - ref->ref_removed); - kmem_cache_free(reference_cache, ref); - rc->rc_removed_count--; - } - } else { - kmem_cache_free(reference_cache, ref); - } - rc->rc_count -=3D number; - count =3D rc->rc_count; - mutex_exit(&rc->rc_mtx); - return (count); - } - } - panic("No such hold %p on refcount %llx", holder, - (u_longlong_t)(uintptr_t)rc); - return (-1); -} - -int64_t -refcount_remove(refcount_t *rc, void *holder) -{ - return (refcount_remove_many(rc, 1, holder)); -} - -void -refcount_transfer(refcount_t *dst, refcount_t *src) -{ - int64_t count, removed_count; - list_t list, removed; - - list_create(&list, sizeof (reference_t), - offsetof(reference_t, ref_link)); - list_create(&removed, sizeof (reference_t), - offsetof(reference_t, ref_link)); - - mutex_enter(&src->rc_mtx); - count =3D src->rc_count; - removed_count =3D src->rc_removed_count; - src->rc_count =3D 0; - src->rc_removed_count =3D 0; - list_move_tail(&list, &src->rc_list); - list_move_tail(&removed, &src->rc_removed); - mutex_exit(&src->rc_mtx); - - mutex_enter(&dst->rc_mtx); - dst->rc_count +=3D count; - dst->rc_removed_count +=3D removed_count; - list_move_tail(&dst->rc_list, &list); - list_move_tail(&dst->rc_removed, &removed); - mutex_exit(&dst->rc_mtx); - - list_destroy(&list); - list_destroy(&removed); -} - -#endif /* ZFS_DEBUG */ Index: sys/cddl/contrib/opensolaris/uts/common/Makefile.files =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/Makefile.files (revision 228798) +++ sys/cddl/contrib/opensolaris/uts/common/Makefile.files (working copy) @@ -55,7 +55,6 @@ gzip.o \ lzjb.o \ metaslab.o \ - refcount.o \ sa.o \ sha256.o \ spa.o \ Index: sys/sys/refcount.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/sys/refcount.h (revision 228798) +++ sys/sys/refcount.h (working copy) @@ -40,22 +40,24 @@ #define KASSERT(exp, msg) /* */ #endif =20 +typedef volatile u_int refcount_t; + static __inline void -refcount_init(volatile u_int *count, u_int value) +refcount_init(refcount_t *count, u_int value) { =20 *count =3D value; } =20 static __inline void -refcount_acquire(volatile u_int *count) +refcount_acquire(refcount_t *count) { =20 atomic_add_acq_int(count, 1);=09 } =20 static __inline int -refcount_release(volatile u_int *count) +refcount_release(refcount_t *count) { u_int old; =20 --bF8FQe8Tx5jntaYn-- --4hVWSOEjyjyaPny9 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO9EoOAAoJEG5e2P40kaK7LJoP+gMI+7cQ9d1Xqm4OuEt2KO+g UeCrcWOrepNVwDQ0QdNQ58WFlPnwttMH9lGWpTb6Kl7JhWCmOmMfUDYBAUIlkvpZ ikHEePdQq9MxAm5lXJgmZFyDQX4v1WcTNMdnS1TdtI/Xo2g2P+roZJ6PTfqkeLgr E2ov/00FNTWkSNV5K0gd4rPqDYxsvpRTL4358+kOUV1+D7Z0pvMnNmUNP0bC7x8j 9GqtZl1MnhDypWN/v+HyCa+s2hYAiCa1IOsKmT50yzlNcrY3AQkm7ANnuQTaSuZl rAs5y0zPkG47DIMRBNeCz2pB/GZE0oFiLcmOLLDtk81Sp4cZ15ye7WWT+x5i5vCo IJvZdBdgKJa/MbHXzPJ0IgdO7CoPIE5QAwBcMBy1mFrbqp3CrwAsH/k85iU/xJO9 +2cdhLHHpptBrckhZG0N3drpt5Lzw4ZoUwScpz99eKCLFintqdRhDqLg/wLPQlDB HIcjIgaCM1LAAKsR0bgdhtpnJGJ/etd1qBCb4pJSY4eAVqi3efNPPBaoxXFWBT9E CXYdWPbvpNkxKJOtzPT134iQPp6btJd7c21qSD2BnLGVI16U5QW20/acmwTEXomg fJFwULvkNTZL1mQLcuR1KUwjpz7UmbpLJutgBsnswcydwVNDWn3UEzhZO0hd5X/Z 2K+m+vtqh8w8/KyvV1w1 =+LTU -----END PGP SIGNATURE----- --4hVWSOEjyjyaPny9-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 11:29:52 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA9451065670 for ; Fri, 23 Dec 2011 11:29:52 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 5B9C28FC08 for ; Fri, 23 Dec 2011 11:29:52 +0000 (UTC) Received: from localhost (58.wheelsystems.com [83.12.187.58]) by mail.dawidek.net (Postfix) with ESMTPSA id 71DB23D9; Fri, 23 Dec 2011 12:29:49 +0100 (CET) Date: Fri, 23 Dec 2011 12:28:46 +0100 From: Pawel Jakub Dawidek To: Ed Schouten Message-ID: <20111223112846.GA1679@garage.freebsd.pl> References: <20111222214728.GV1771@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+HP7ph2BbKc20aGI" Content-Disposition: inline In-Reply-To: <20111222214728.GV1771@hoeg.nl> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 11:29:52 -0000 --+HP7ph2BbKc20aGI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 22, 2011 at 10:47:28PM +0100, Ed Schouten wrote: > The reason why I'm emailing this to fs@, is because this change breaks > one of the existing file system drivers, namely ZFS. Solaris also > implements a refcount_t, but unlike FreeBSD's, it has a more complex API > and is 64-bits in size. Still, I suspect it's hard to overflow a 32-bit > reference counter, right? Even if it is, we can fix this in the long run > by making refcount_t a truly opaque object of type u_long. >=20 > Can any of you ZFS user please try the following patch? Do any of you > object if I commit it to SVN and merge it in a couple of months from > now? Ed, what is the purpose of the patch exactly? Is there no way to keep ZFS as it is? Will it stop compile? Even with -std=3Dc99? You changing here vendor code and we still don't want to do that if avoidable, as we want to share code with IllumOS. Unless I see strong reasons why this is unavoidable, I do object. If that was our own code, reducing it would be definiately welcome, but because we share the code, we will just grow the diff against other ZFS versions out there. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --+HP7ph2BbKc20aGI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk70Ze4ACgkQForvXbEpPzRm9ACfaa5XOqHJPeUOZgXo8LdV0YzA UGIAn3De/RYA7ct4TlMlZUVy4cT1LmZh =87F9 -----END PGP SIGNATURE----- --+HP7ph2BbKc20aGI-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 11:59:32 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 881DE106564A; Fri, 23 Dec 2011 11:59:32 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 24FEB8FC17; Fri, 23 Dec 2011 11:59:32 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 636302A28D20; Fri, 23 Dec 2011 12:59:31 +0100 (CET) Date: Fri, 23 Dec 2011 12:59:31 +0100 From: Ed Schouten To: Pawel Jakub Dawidek Message-ID: <20111223115931.GY1771@hoeg.nl> References: <20111222214728.GV1771@hoeg.nl> <20111223112846.GA1679@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KGRu7WxDW0E86NnQ" Content-Disposition: inline In-Reply-To: <20111223112846.GA1679@garage.freebsd.pl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 11:59:32 -0000 --KGRu7WxDW0E86NnQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Pawel, * Pawel Jakub Dawidek , 20111223 12:28: > On Thu, Dec 22, 2011 at 10:47:28PM +0100, Ed Schouten wrote: > > The reason why I'm emailing this to fs@, is because this change breaks > > one of the existing file system drivers, namely ZFS. Solaris also > > implements a refcount_t, but unlike FreeBSD's, it has a more complex API > > and is 64-bits in size. Still, I suspect it's hard to overflow a 32-bit > > reference counter, right? Even if it is, we can fix this in the long run > > by making refcount_t a truly opaque object of type u_long. > >=20 > > Can any of you ZFS user please try the following patch? Do any of you > > object if I commit it to SVN and merge it in a couple of months from > > now? >=20 > Ed, what is the purpose of the patch exactly? Is there no way to keep > ZFS as it is? Will it stop compile? Even with -std=3Dc99? >=20 > You changing here vendor code and we still don't want to do that if > avoidable, as we want to share code with IllumOS. Unless I see strong > reasons why this is unavoidable, I do object. If that was our own code, > reducing it would be definiately welcome, but because we share the code, > we will just grow the diff against other ZFS versions out there. The problem is that the patch adds refcount_t to , while the ZFS code has its own refcount_t which has a similar purpose. This causes a compilation error because of redefinitions. I could simply remove the #include_next , but the problem then becomes that if we start using refcount_t in our own sources (e.g. in struct session in ), we end up having structure size mismatches between ZFS and the rest of the kernel. I could also pick a different name, like refcnt_t, but that would make little sense. We would then have functions starting with refcount_*, operating on a refcnt_t. Of course, I share your opinion that we should prevent unneeded changes to vendor code, but we shouldn't deviate from our own naming scheme in the process. It must be mentioned that this approach may lead to a decrease in code eventually. The Solaris refcount API doesn't look too bad. Maybe we could eventually incorporate some of its facets into our own API, potentially making the CDDL refcount.h superfluous. --=20 Ed Schouten WWW: http://80386.nl/ --KGRu7WxDW0E86NnQ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO9G0jAAoJEG5e2P40kaK7fJwP/iWLxdnt5T3wlLAnWuXJp24a MUeavJEGrxbMCiheYFC7BgqVJ11fY+l2rhcM1BSDFBJxIHQbuq9EMIKH6hleCZ2j hmrcNE9WuWP2Bu5ZEEgiIjwtK5K9NU4S3RqF5qVPc3uNFMlIoAxlGtzUyCxTasUA 1m7fpPawsmrDjRH6UtaVKXw9hRjw2Qb/1eAx+vFwEeBiznuEuwH39jpeHkciyHcx ORULb4GilEd986m/2WSKIpa549of5PGPDX6wjBBBXm69tPni2HbsMVHEl+gKJkLp 3fefpYSvFPHbR9UeP5I9sd7tjas2T+EFONSXppkfcU+P/thiyFPvU8OIp1ZlaZzp jNmOY4zmzs0bNpg55Gc8TNbtVb++xp/6VkpEI2xt6F0yqR16XqcZmouQpywaq//H y9Pv7+4k3v0wQGP2fRVMgWaQNK13sOH4qWP7SxHeMiaRfWRoA+EqxSx6UfoqCBZi YZHo13fN+nSYygWHvh+H8LBaqPbk2O+qeSJA3DKhtSbMUmKX/GCT6IxGuGhD6cgh E4XBmUP9HK6hyE214ElWN2rXo/qxYVkbZjG/SGp2vWGou01H8ChTjvcsGmb1JOwY V5xYtvSTVThMNH79K2GauzTKQZmVux/EH7Penb2okT/LdQSZSyIDFy/HvnjsBvbL 4WqW4vrYjAW/w/TKKL5i =nh8a -----END PGP SIGNATURE----- --KGRu7WxDW0E86NnQ-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 12:20:06 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A04671065672; Fri, 23 Dec 2011 12:20:06 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 5AA8C8FC0A; Fri, 23 Dec 2011 12:20:06 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id C0DDC2A28E91; Fri, 23 Dec 2011 13:20:05 +0100 (CET) Date: Fri, 23 Dec 2011 13:20:05 +0100 From: Ed Schouten To: Pawel Jakub Dawidek Message-ID: <20111223122005.GZ1771@hoeg.nl> References: <20111222214728.GV1771@hoeg.nl> <20111223112846.GA1679@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W5Yj4BWVzBU0tpZR" Content-Disposition: inline In-Reply-To: <20111223112846.GA1679@garage.freebsd.pl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 12:20:06 -0000 --W5Yj4BWVzBU0tpZR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Pawel, * Pawel Jakub Dawidek , 20111223 12:28: > You changing here vendor code and we still don't want to do that if > avoidable, as we want to share code with IllumOS. To clarify, the changes made to rrwlock.* are likely applicable to IllumOS as well. It effectively fixes misuse of the API, where refcount_t is dereferenced, but also optimizes the code not to use atomics on the counters, as they are already protected by a mutex. --=20 Ed Schouten WWW: http://80386.nl/ --W5Yj4BWVzBU0tpZR Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO9HH1AAoJEG5e2P40kaK7/qoQAKJm/TOW1NFWBqlY9mozcgdU 9VeIHNNr8iqeE7I9XZDu/UiXWzTXyLk0P3FdvpHAIHlwwf7dogiAwu/1fsnltcX6 0pe6tdQpA1sKWuKcCqdTJjT7OTvcP23n2VYv4JFah/FBirh7TdbLEVCqs0VAospr Amw8VLdK7RMMToer71OwI0zDo9jdXvttXOZz9XxIvRtHw1G9tTKUIzJejf0y8wjh zaG8pMW7uqROOIkNRMjA1fxue0uJfjQP4Fg7w/q6MYX44sVyOdgKf6jLin11zTVx 5H7QLapxSl9GUpraZMsu1zMZxQWRmfDDEhLX8Bna/EIlzBROV+BTkmzQCOvHs4zN wRr0uOY+zc/6BHm25jtHqOUKrGQQWdcDKeksxFqlVvntheaVXtW7NcUArVItmUCM hLq0Qz5+ysw7sH/BSRFZmNJLjfCg90UDmYXdBjrPDnZdRB+VdY/ybOtXw612nzC1 jP2b8fI14Xk+pASS9zejulZKYikhLjZQjXbN1N/jsUGTknClWnjMwESN9+sStXFU ZMDheXooaSU+nHMxD6H/PsemfupbHPrffWTnD4TB9g+cccH/HOAOC0f1VS2as2G7 i3vKEKZblh23VE1T8iNcwEN9grS8RCG181QLcVsHRI6PNTQbTgmYyB2eUdmE+uwI ZFMLBaBWyy+T0dufHn28 =CRTY -----END PGP SIGNATURE----- --W5Yj4BWVzBU0tpZR-- From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 13:34:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 063DF106566B; Fri, 23 Dec 2011 13:34:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id CE6708FC13; Fri, 23 Dec 2011 13:34:09 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 6712D46B09; Fri, 23 Dec 2011 08:34:09 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F1287B90E; Fri, 23 Dec 2011 08:34:08 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Date: Fri, 23 Dec 2011 08:34:08 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <20111222214728.GV1771@hoeg.nl> In-Reply-To: <20111222214728.GV1771@hoeg.nl> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201112230834.08198.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 23 Dec 2011 08:34:09 -0500 (EST) Cc: Ed Schouten , pjd@freebsd.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 13:34:10 -0000 On Thursday, December 22, 2011 4:47:28 pm Ed Schouten wrote: > Hi all, > > As some of you may know, the upcoming C standard has support for atomic > operations. Looking at how they implemented it, it seems that they tried > to keep it somehow compatible with existing compiler standards. For > example, one could implement a poor-man's version of it as follows: > > | #define _Atomic(T) struct { pthread_mutex_t m; T v; } > | > | #define ATOMIC_VAR_INIT(value) { PTHREAD_MUTEX_INITIALIZER, (value) } > | > | #define atomic_store(object, value) do { > | pthread_mutex_lock(&(object)->m); > | (object)->v = (value); > | pthread_mutex_unlock(&(object)->m); > | } while(0) > | > | ... > > Voila; atomics! > > Just out of curiosity, I did some experiments with this, where I have a > that works with both Clang and GCC (except ARM and MIPS, > for some reason). My first test subject: . Hmm, are we really aiming to replace with the C1X API instead? FWIW, I thought about making a refcount_t when first writing but bde@ talked me out of it. Bruce is of the general opinion that we should avoid adding new *_t typedefs when we can help it. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Fri Dec 23 14:08:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7930C106566B; Fri, 23 Dec 2011 14:08:56 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 193BD8FC08; Fri, 23 Dec 2011 14:08:56 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 4D8702A28D20; Fri, 23 Dec 2011 15:08:55 +0100 (CET) Date: Fri, 23 Dec 2011 15:08:55 +0100 From: Ed Schouten To: John Baldwin Message-ID: <20111223140855.GB1771@hoeg.nl> References: <20111222214728.GV1771@hoeg.nl> <201112230834.08198.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="X4WsLVyvVA2qFUe+" Content-Disposition: inline In-Reply-To: <201112230834.08198.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, pjd@freebsd.org Subject: Re: Changing refcount(9) to use a refcount_t X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2011 14:08:56 -0000 --X4WsLVyvVA2qFUe+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * John Baldwin , 20111223 14:34: > Hmm, are we really aiming to replace with the C1X API > instead? That depends on the success and adoption of the API, right? If it turns out Clang and GCC do a very good job at adopting it properly and in a timely fashion, why stick to ? Looking at the specification, it doesn't seem to be that bad. Compared to what we have in , it seems to support more operations (e.g. a regular exchange, instead of our readandclear) and practically allows any type of barrier for any operation. > FWIW, I thought about making a refcount_t when first writing =20 > but bde@ talked me out of it. Bruce is of the general opinion that we sh= ould > avoid adding new *_t typedefs when we can help it. But in this case there is a valid reason why we should typedef it: portability. It allows us to implement reference counting any way we like. Whether it is packing it together with a mutex, using GCC inline assembly, C1X operations, etc. I'm not saying we should switch to C1X atomics in the nearby future, but it wouldn't hurt to make any preparations that allow us to experiment with it in due time. --=20 Ed Schouten WWW: http://80386.nl/ --X4WsLVyvVA2qFUe+ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO9It3AAoJEG5e2P40kaK7OVwP/2hgRJ0Pdfgogsk8I2h1mjOS AGaSeiRrNyvSmO+N36Bz7wfFhmMt8vjMihol6hh//5TG+q/rq0eZuWOPqnSZXqCa AUVpVd33/C+ikXtNMOS3aF/GNQuZyk5tJ/UdwO3x9KZzy4Rl6Mpqq5C1cbesyNW6 pW9TbX/0YN92IfYgkC770wrCqmz6aVkb8369UWwFP1agdcIFLWvU0881LKOG6dnI 9pAYqHoRtEdtH7QUlo/AQ85W5kPBtzVMemvFZvvC8YCJB4VCSrP9R7/tXCEqf2z4 DPLlN4LL84Td73L6n0jQxVYjd9iXGczwru7UNZfWqlcjx8MtSVV4RAiUogMlB+cT 6vs6b5mY9ehGm10gN9w0OdGCFdsJmwNJTIvLsIPQWZ0poqAONAVYlKvV1H9SyHKw WGy9sbmaOead+OrCf5fD26LX4iA5Abl0d0SAJmSfdVQ90WH6ZzTG/wngzxbrzTeM TmhmaaJDhgJWLCqbUzmF6Gm2LjEhK0PmUy/dw4eYzZjk+SdlvReqhwP5GSmErvf0 7/gJjih9Dpm0DL03025Yyht86IzjHUW7x2t54r1TTglc2imm9WnspLjaWmQ/4GHR z7Z5DQuiymTkrHqT2jtRAhRgGz+1W9aR7Za3LlgU9+QnGIcOtH1xMtbYT9xcRQWi n3CuapJS/HRA5IroZjyY =f6MT -----END PGP SIGNATURE----- --X4WsLVyvVA2qFUe+--